List of known and new Pfam domains in Plasmodium yoelii


PF00004 - AAA (Pfam link)

Interpro entry IPR003959 : ATPase, AAA-type, core (Interpro link)

Pfam description:
AAA family proteins often perform chaperone-like functions that assist in the assembly, operation, or disassembly of protein complexes.

Interpro description:

AAA ATPases (ATPases Associated with diverse cellular Activities) form a large protein family and play a number of roles in the cell including cell-cycle regulation, protein proteolysis and disaggregation, organelle biogenesis and intracellular transport. Some of them function as molecular chaperones, subunits of proteolytic complexes or independent proteases (FtsH, Lon). They also act as DNA helicases and transcription factors..

AAA ATPases belong to the AAA+ superfamily of ringshaped P-loop NTPases, which act via the energy-dependent unfolding of macromolecules. There are six major clades of AAA domains (proteasome subunits, metalloproteases, domains D1 and D2 of ATPases with two AAA domains, the MSP1/katanin/spastin group and BCS1 and it homologues), as well as a number of deeply branching minor clades.

They assemble into oligomeric assemblies (often hexamers) that form a ring-shaped structure with a central pore. These proteins produce a molecular motor that couples ATP binding and hydrolysis to changes in conformational states that act upon a target substrate, either translocating or remodelling it.

They are found in all living organisms and share the common feature of the presence of a highly conserved AAA domain called the AAA module. This domain is responsible for ATP binding and hydrolysis. It contains 200-250 residues, among them there are two classical motifs, Walker A (GX4GKT) and Walker B (HyDE).

The functional variety seen between AAA ATPases is in part due to their extensive number of accessory domains and factors, and to their variable organisation within oligomeric assemblies, in addition to changes in key functional residues within the ATPase domain itself.

More information about these proteins can be found at Protein of the Month: AAA ATPases.

Proteins where this domain is known:
PY00142    PY00544    PY00565    PY00657    PY00672    PY00768    PY00809    PY01493    PY01547    PY02067    PY02232    PY02401    PY02578    PY02590    PY02805    PY02897    PY03371    PY03639    PY03797    PY04255    PY04378    PY04402    PY04458    PY04496    PY05070    PY05153    PY05283    PY05342    PY05364    PY05628    PY05787    PY05838    PY06430   

Proteins where this domain has been detected by our approach:
PY01468    PY01741   


PF00005 - ABC_tran (Pfam link)

Interpro entry IPR003439 : ABC transporter-like (Interpro link)

Pfam description:
ABC transporters for a large family of proteins responsible for translocation of a variety of compounds across biological membranes. ABC transporters are the largest family of proteins in many completely sequenced bacteria. ABC transporters are composed of two copies of this domain and two copies of a transmembrane domain Pfam:PF00664. These four domains may belong to a single polypeptide as in Swiss:P13569, or belong in different polypeptide chains.

Interpro description:

ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.

ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain.

The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site.

The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly beta-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel beta-sheet of armI by a two-fold axis.

The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette. More than 50 subfamilies have been described based on a phylogenetic and functional classification; (for further information see http://www.tcdb.org/tcdb/index.php?tc=3.A.1).

On the basis of sequence similarities a family of related ATP-binding proteins has been characterised.

The proteins belonging to this family also contain one or two copies of the 'A' consensus sequence or the 'P-loop'.

Proteins where this domain is known:
PY00207    PY00245    PY01826    PY02551    PY03961    PY04219    PY05035    PY06050    PY06054    PY06462    PY06546    PY06911    PY07089   


PF00006 - ATP-synt_ab (Pfam link)

Interpro entry IPR000194 : ATPase, F1/V1/A1 complex, alpha/beta subunit, nucleotide-binding (Interpro link)

Pfam description:
This family includes the ATP synthase alpha and beta subunits, the ATP synthase associated with flagella and the termination factor Rho.

Interpro description:

ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

This entry represents the alpha and beta subunits found in the F1, V1, and A1 complexes of F-, V- and A-ATPases, respectively (sometimes called the A and B subunits in V- and A-ATPases), as well as flagellar ATPase and the termination factor Rho. The F-ATPases (or F1F0-ATPases), V-ATPases (or V1V0-ATPases) and A-ATPases (or A1A0-ATPases) are composed of two linked complexes: the F1, V1 or A1 complex contains the catalytic core that synthesizes/hydrolyses ATP, and the F0, V0 or A0 complex that forms the membrane-spanning pore. The F-, V- and A-ATPases all contain rotary motors, one that drives proton translocation across the membrane and one that drives ATP synthesis/hydrolysis .

In F-ATPases, there are three copies each of the alpha and beta subunits that form the catalytic core of the F1 complex, while the remaining F1 subunits (gamma, delta, epsilon) form part of the stalks. There is a substrate-binding site on each of the alpha and beta subunits, those on the beta subunits being catalytic, while those on the alpha subunits are regulatory. The alpha and beta subunits form a cylinder that is attached to the central stalk. The alpha/beta subunits undergo a sequence of conformational changes leading to the formation of ATP from ADP, which are induced by the rotation of the gamma subunit, itself is driven by the movement of protons through the F0 complex C subunit.

In V- and A-ATPases, the alpha/A and beta/B subunits of the V1 or A1 complex are homologous to the alpha and beta subunits in the F1 complex of F-ATPases, except that the alpha subunit is catalytic and the beta subunit is regulatory.

The alpha/A and beta/B subunits can each be divided into three regions, or domains, centred around the ATP-binding pocket, and based on structure and function. The central domain contains the nucleotide-binding residues that make direct contact with the ADP/ATP molecule.

More information about this protein can be found at Protein of the Month: ATP Synthases.

Proteins where this domain is known:
PY00963    PY01556    PY05102    PY05971   


PF00008 - EGF (Pfam link)

Interpro entry IPR006209 : (Interpro link)

Pfam description:
There is no clear separation between noise and signal. Pfam:PF00053 is very similar, but has 8 instead of 6 conserved cysteines. Includes some cytokine receptors. The EGF domain misses the N-terminus regions of the Ca2+ binding EGF domains (this is the main reason of discrepancy between swiss-prot domain start/end and Pfam). The family is hard to model due to many similar but different sub-types of EGF domains. Pfam certainly misses a number of EGF domains.

Interpro description:
A sequence of about thirty to forty amino-acid residues long found in the sequence of epidermal growth factor (EGF) has been shown to be present, in a more or less conserved form, in a large number of other, mostly animal proteins. The list of proteins currently known to contain one or more copies of an EGF-like pattern is large and varied. The functional significance of EGF domains in what appear to be unrelated proteins is not yet clear. However, a common feature is that these repeats are found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase). The EGF domain includes six cysteine residues which have been shown (in EGF) to be involved in disulphide bonds. The main structure is a two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet. Subdomains between the conserved cysteines vary in length.

Proteins where this domain has been detected by our approach:
PY05748   


PF00009 - GTP_EFTU (Pfam link)

Interpro entry IPR000795 : Protein synthesis factor, GTP-binding (Interpro link)

Pfam description:
This domain contains a P-loop motif, also found in several other families such as Pfam:PF00071, Pfam:PF00025 and Pfam:PF00063. Elongation factor Tu consists of three structural domains, this plus two C-terminal beta barrel domains.

Interpro description:
Elongation factors belong to a family of proteins that promote the GTP-dependent binding of aminoacyl tRNA to the A site of ribosomes during protein biosynthesis, and catalyse the translocation of the synthesised protein chain from the A to the P site. The proteins are all relatively similar in the vicinity of their C-termini, and are also highly similar to a range of proteins that includes the nodulation Q protein from Rhizobium meliloti (Sinorhizobium meliloti), bacterial tetracycline resistance proteins and the omnipotent suppressor protein 2 from yeast.

In both prokaryotes and eukaryotes, there are three distinct types of elongation factors, EF-1alpha (EF-Tu), which binds GTP and an aminoacyl-tRNAand delivers the latter to the A site of ribosomes; EF-1beta (EF-Ts), which interacts with EF-1a/EF-Tu to displace GDP and thus allows the regeneration of GTP-EF-1a; and EF-2 (EF-G), which binds GTP and peptidyl-tRNA and translocates the latter from the A site to the P site. In EF-1-alpha, a specific region has been shown to be involved in a conformational change mediated by the hydrolysis of GTP to GDP. This region is conserved in both EF-1alpha/EF-Tu as well as EF-2/EF-G and thus seems typical for GTP-dependent proteins which bind non-initiator tRNAs to the ribosome. The GTP-binding protein synthesis factor family also includes the eukaryotic peptide chain release factor GTP-binding subunits and prokaryotic peptide chain release factor 3 (RF-3); the prokaryotic GTP-binding protein lepA and its homolog in yeast (GUF1) and Caenorhabditis elegans (ZK1236.1); yeast HBS1; rat statin S1; and the prokaryotic selenocysteine-specific elongation factor selB.

Proteins where this domain is known:
PY00361    PY00362    PY00420    PY00960    PY01864    PY02337    PY02338    PY02627    PY02880    PY03311    PY03426    PY04028    PY04385    PY04706    PY05356    PY05361    PY05417    PY05837    PY06134    PY06191    PY07561   


PF00011 - HSP20 (Pfam link)

Interpro entry IPR002068 : (Interpro link)

Interpro description:

Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by inducing the synthesis of proteins collectively known as heat-shock proteins (hsp). Amongst them is a family of proteins with an average molecular weight of 20 Kd, known as the hsp20 proteins. These seem to act as chaperones that can protect other proteins against heat-induced denaturation and aggregation. Hsp20 proteins seem to form large heterooligomeric aggregates. Structurally, this family is characterised by the presence of a conserved C-terminal domain of about 100 residues.

Proteins where this domain is known:
PY00566   


PF00012 - HSP70 (Pfam link)

Interpro entry IPR013126 : (Interpro link)

Pfam description:
Hsp70 chaperones help to fold many proteins. Hsp70 assisted folding involves repeated cycles of substrate binding and release. Hsp70 activity is ATP dependent. Hsp70 proteins are made up of two regions: the amino terminus is the ATPase domain and the carboxyl terminus is the substrate binding region.

Interpro description:

Heat shock proteins, Hsp70 chaperones help to fold many proteins. Hsp70 assisted folding involves repeated cycles of substrate binding and release. Hsp70 activity is ATP dependent. Hsp70 proteins are made up of two regions: the amino terminus is the ATPase domain and the carboxyl terminus is the substrate binding region.

Hsp70 proteins have an average molecular weight of 70 kDa. In most species,there are many proteins that belong to the hsp70 family. Some of these are only expressed under stress conditions (strictly inducible), while some are present in cells under normal growth conditions and are not heat-inducible (constitutive or cognate). Hsp70 proteins can be found in different cellular compartments(nuclear, cytosolic, mitochondrial, endoplasmic reticulum, for example).

Proteins where this domain is known:
PY05001    PY05063    PY05402    PY06158    PY06981   


PF00013 - KH_1 (Pfam link)

Interpro entry IPR018111 : K Homology, type 1, subgroup (Interpro link)

Pfam description:
KH motifs can bind RNA in vitro . Autoantibodies to Nova, a KH domain protein, cause paraneoplastic opsoclonus ataxia.

Interpro description:

The K homology (KH) domain was first identified in the human heterogeneous nuclear ribonucleoprotein (hnRNP) K. It is a domain of around 70 amino acids that is present in a wide variety of quite diverse nucleic acid-binding proteins. It has been shown to bind RNA. Like many other RNA-binding motifs, KH motifs are found in one or multiple copies (14 copies in chicken vigilin) and, at least for hnRNP K (three copies) and FMR-1 (two copies), each motif is necessary for in vitro RNA binding activity, suggesting that they may function cooperatively or, in the case of single KH motif proteins (for example, Mer1p), independently.

According to structural analysis the KH domain can be separated in two groups. The first group or type-1 contain a beta-alpha-alpha-beta-beta-alpha structure, whereas in the type-2 the two last beta-sheet are located in the N terminal part of the domain (alpha-beta-beta-alpha-alpha-beta). Sequence similarity between these two folds are limited to a short region (VIGXXGXXI) in the RNA binding motif. This motif is located between helice 1 and 2 in type-1 and between helice 2 and 3 in type-2. Proteins known to contain a type-1 KH domain include bacterial polyribonucleotide nucleotidyltransferases; vertebrate fragile X mental retardation protein 1 (FMR1); eukaryotic heterogeneous nuclear ribonucleoprotein K (hnRNP K), one of at least 20 major proteins that are part of hnRNP particles in mammalian cells; mammalian poly(rC) binding proteins; Artemia salina glycine-rich protein GRP33; yeast PAB1-binding protein 2 (PBP2); vertebrate vigilin; and human high-density lipoprotein binding protein (HDL-binding protein).

More information about these proteins can be found at Protein of the Month: RNA Exosomes.

Proteins where this domain is known:
PY03523    PY03646    PY04200    PY04454    PY04901   

Proteins where this domain has been detected by our approach:
PY05239   


PF00020 - TNFR_c6 (Pfam link)

Interpro entry IPR001368 : TNFR/CD27/30/40/95 cysteine-rich region (Interpro link)

Interpro description:

A number of proteins, some of which are known to be receptors for growth factors have been found to contain a cysteine-rich domain at the N-terminal region that can be subdivided into four (or in some cases, three) repeats containing six conserved cysteines all of which are involved in intrachain disulphide bonds.

CD27 (also called S152 or T14) mediates a co-stimulatory signal for T and B cell activation and is involved in murine T cell development. Tyrosine-phosphorylation of ZAP-70 following CD27 ligation of T cells has been reported, but not confirmed independently. CD30 was originally identified as Ki-1, an antigen expressed on Reed-Sternberg cells in Hodgkin's lymphomas and other non-Hodgkin's lymphomas, particularly diffuse large-cell lymphoma and immunoblastic lymphoma. CD30 has pleiotropic effects on CD30-positive lymphoma cell lines ranging from cell proliferation to cell death. It is thought to be involved in negative selection of T-cells in the thymus and is involved in TCR-mediated cell death. CD30 is a member of the TNFR family of molecules, activate NFkB through interaction with TRAF2 and TRAF5. CD40 (Bp50) plays a central role in the regulation of cell-mediated immunity as well as antibody mediated immunity. It is central to T cell dependent (TD)-responses and may influence survival of B cell lymphomas.

CD95 (also called APO-1, fas antigen, Fas tumour necrosis factor receptor superfamily, member 6, TNFRSF6 or apoptosis antigen 1, APT1) is expressed, typically at high levels, on activated T and B cells. It is involved in the mediation of apoptosis-inducing signals.

Other proteins known to belong to this family are, tumour Necrosis Factor type I and type II receptors (TNFR), Rabbit fibroma virus soluble TNF receptor (protein T2), lymphotoxin alpha/beta receptor, low-affinity nerve growth factor receptor (LA-NGFR) (p75), T-cell antigen OX40, Wsl-1, a receptor (for a yet undefined ligand) that mediates apoptosis and Vaccinia virus protein A53 (SalF19R).

CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at Protein Reviews On The Web (http://mpr.nci.nih.gov/prow/).

Proteins where this domain has been detected by our approach:
PY05272   


PF00022 - Actin (Pfam link)

Interpro entry IPR004000 : Actin/actin-like (Interpro link)

Interpro description:

Actin is a ubiquitous protein involved in the formation of filaments that are major components of the cytoskeleton. These filaments interact with myosin to produce a sliding effect, which is the basis of muscular contraction and many aspects of cell motility, including cytokinesis. Each actin protomer binds one molecule of ATP and has one high affinity site for either calcium or magnesium ions, as well as several low affinity sites. Actin exists as a monomer in low salt concentrations, but filaments form rapidly as salt concentration rises, with the consequent hydrolysis of ATP. Actin from many sources forms a tight complex with deoxyribonuclease (DNase I) although the significance of this is still unknown. The formation of this complex results in the inhibition of DNase I activity, and actin loses its ability to polymerise. It has been shown that an ATPase domain of actin shares similarity with ATPase domains of hexokinase and hsp70 proteins.

In vertebrates there are three groups of actin isoforms: alpha, beta and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exists in most cell types as components of the cytoskeleton and as mediators of internal cell motility. In plants there are many isoforms which are probably involved in a variety of functions such as cytoplasmic streaming, cell shape determination, tip growth, graviperception, cell wall deposition, etc.

Recently some divergent actin-like proteins have been identified in several species. These proteins include centractin (actin-RPV) from mammals, fungi yeast ACT5, Neurospora crassa ro-4) and Pneumocystis carinii, which seems to be a component of a multi-subunit centrosomal complex involved in microtubule based vesicle motility (this subfamily is known as ARP1); ARP2 subfamily, which includes chicken ACTL, Saccharomyces cerevisiae ACT2, Drosophila melanogaster 14D and Caenorhabditis elegans actC; ARP3 subfamily, which includes actin 2 from mammals, Drosophila 66B, yeast ACT4 and Schizosaccharomyces pombe act2; and ARP4 subfamily, which includes yeast ACT3 and Drosophila 13E.

Proteins where this domain is known:
PY00021    PY00853    PY00875    PY00904    PY01545    PY01744    PY02240    PY02457    PY04087    PY04148   


PF00023 - Ank (Pfam link)

Interpro entry IPR002110 : (Interpro link)

Pfam description:
There\'s no clear separation between noise and signal on the HMM search Ankyrin repeats generally consist of a beta, alpha, alpha, beta order of secondary structures. The repeats associate to form a higher order structure.

Interpro description:

The ankyrin repeat is one of the most common protein-protein interaction motifs in nature. Ankyrin repeats are tandemly repeated modules of about 33 amino acids. They occur in a large number of functionally diverse proteins mainly from eukaryotes. The few known examples from prokaryotes and viruses may be the result of horizontal gene transfers. The repeat has been found in proteins of diverse function such as transcriptional initiators, cell-cycle regulators, cytoskeletal, ion transporters and signal transducers. The ankyrin fold appears to be defined by its structure rather than its function since there is no specific sequence or structure which is universally recognised by it.

The conserved fold of the ankyrin repeat unit is known from several crystal and solution structures. Each repeat folds into a helix-loop-helix structure with a beta-hairpin/loop region projecting out from the helices at a 90o angle. The repeats stack together to form an L-shaped structure.

Proteins where this domain is known:
PY00018    PY00520    PY01883    PY01999    PY02709    PY02739    PY03147    PY04078    PY04642    PY04674    PY05663    PY07037   

Proteins where this domain has been detected by our approach:
PY04030    PY05461   


PF00024 - PAN_1 (Pfam link)

Interpro entry IPR003014 : (Interpro link)

Pfam description:
The PAN domain contains a conserved core of three disulphide bridges. In some members of the family there is an additional fourth disulphide bridge the links the N and C termini of the domain. The domain is found in diverse proteins, in some they mediate protein-protein interactions, in others they mediate protein-carbohydrate interactions.

Interpro description:

It has been shown that, the N-terminal N domains of members of the plasminogen/hepatocyte growth factor family, the apple domains of the plasma prekallikrein/coagulation factor XI family, and domains of various nematode proteins belong to the same module superfamily, the PAN module. PAN contains a conserved core of three disulphide bridges. In some members of the family there is an additional fourth disulphide bridge that links the N and C termini of the domain. The domain is found in diverse proteins, in some the domain mediates protein-protein interactions, in others it mediates protein-carbohydrate interactions.

Proteins where this domain has been detected by our approach:
PY02498   


PF00025 - Arf (Pfam link)

Interpro entry IPR006689 : ARF/SAR superfamily (Interpro link)

Pfam description:
Pfam combines a number of different Prosite families together

Interpro description:

The small ADP ribosylation factor (Arf) GTP-binding proteins are major regulators of vesicle biogenesis in intracellular traffic. They are the founding members of a growing family that includes Arl (Arf-like), Arp (Arf-related proteins) and the remotely related Sar (Secretion-associated and Ras-related) proteins. Arf proteins cycle between inactive GDP-bound and active GTP-bound forms that bind selectively to effectors. The classical structural GDP/GTP switch is characterised by conformational changes at the so-called switch 1 and switch 2 regions, which bind tightly to the gamma-phosphate of GTP but poorly or not at all to the GDP nucleotide. Structural studies of Arf1 and Arf6 have revealed that although these proteins feature the switch 1 and 2 conformational changes, they depart from other small GTP-binding proteins in that they use an additional, unique switch to propagate structural information from one side of the protein to the other.

The GDP/GTP structural cycles of human Arf1 and Arf6 feature a unique conformational change that affects the beta2Âbeta3 strands connecting switch 1 and switch 2 (interswitch) and also the amphipathic helical N-terminus. In GDP-bound Arf1 and Arf6, the interswitch is retracted and forms a pocket to which the N-terminal helix binds, the latter serving as a molecular hasp to maintain the inactive conformation. In the GTP-bound form of these proteins, the interswitch undergoes a two-residue register shift that pulls switch 1 and switch 2 ÂupÂ, restoring an active conformation that can bind GTP. In this conformation, the interswitch projects out of the protein and extrudes the N-terminal hasp by occluding its binding pocket.

Proteins where this domain is known:
PY00881    PY03241    PY04367    PY04572    PY05471   


PF00026 - Asp (Pfam link)

Interpro entry IPR001461 : Peptidase A1 (Interpro link)

Pfam description:
Aspartyl (acid) proteases include pepsins, cathepsins, and renins. Two-domain structure, probably arising from ancestral duplication. This family does not include the retroviral nor retrotransposon proteases (Pfam:PF00077), which are much smaller and appear to be homologous to a single domain of the eukaryotic asp proteases.

Interpro description:

In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

Aspartic endopeptidases of vertebrate, fungal and retroviral origin have been characterised. More recently, aspartic endopeptidases associated with the processing of bacterial type 4 prepilin and archaean preflagellin have been described.

Structurally, aspartic endopeptidases are bilobal enzymes, each lobe contributing a catalytic Asp residue, with an extended active site cleft localised between the two lobes of the molecule. One lobe has probably evolved from the other through a gene duplication event in the distant past. In modern-day enzymes, although the three-dimensional structures are very similar, the amino acid sequences are more divergent, except for the catalytic site motif, which is very conserved. The presence and position of disulphide bridges are other conserved features of aspartic peptidases. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

This group of aspartic peptidases belong to MEROPS peptidase family A1 (pepsin family, clan AA). The type example is pepsin A from Homo sapiens (Human) .

More than 70 aspartic peptidases, from all from eukaryotic organisms, have been identified. These include pepsins, cathepsins, and renins. The enzymes are synthesised with signal peptides, and the proenzymes are secreted or passed into the lysosomal/endosomal system, where acidification leads to autocatalytic activation.

Most members of the pepsin family specifically cleave bonds in peptides that are at least six residues in length, with hydrophobic residues in both the P1 and P1' positions. Crystallography has shown the active site to form a groove across the junction of the two lobes, with an extended loop projecting over the cleft to form an 11-residue flap, which encloses substrates and inhibitors within the active site. Specificity is determined by several hydrophobic residues surrounding the catalytic aspartates, and by three residues in the flap. Cysteine residues are well conserved within the pepsin family, pepsin itself containing three disulphide loops. The first loop is found in all but the fungal enzymes, and is usually around five residues in length, but is longer in barrierpepsin and candidapepsin; the second loop is also small and found only in the animal enzymes; and the third loop is the largest, found in all members of the family, except for the cysteine-free polyporopepsin. The loops are spread unequally throughout the two lobes, suggesting that they formed after the initial gene duplication and fusion event.

This family does not include the retroviral nor retrotransposon aspartic proteases which are much smaller and appear to be homologous to the single domain aspartic proteases.

Proteins where this domain is known:
PY00469    PY00470    PY01268    PY01716    PY02004    PY02085    PY03145    PY06692    PY06899   


PF00027 - cNMP_binding (Pfam link)

Interpro entry IPR000595 : (Interpro link)

Interpro description:
Proteins that bind cyclic nucleotides (cAMP or cGMP) share a structural domain of about 120 residues. The best studied of these proteins is the prokaryotic catabolite gene activator (also known as the cAMP receptor protein) (gene crp) where such a domain is known to be composed of three alpha-helices and a distinctive eight-stranded, antiparallel beta-barrel structure. There are six invariant amino acids in this domain, three of which are glycine residues that are thought to be essential for maintenance of the structural integrity of the beta-barrel. cAMP- and cGMP-dependent protein kinases (cAPK and cGPK) contain two tandem copies of the cyclic nucleotide-binding domain. The cAPK's are composed of two different subunits, a catalytic chain and a regulatory chain, which contains both copies of the domain. The cGPK's are single chain enzymes that include the two copies of the domain in their N-terminal section. Vertebrate cyclic nucleotide-gated ion-channels also contain this domain. Two such cations channels have been fully characterised, one is found in rod cells where it plays a role in visual signal transduction.

Proteins where this domain is known:
PY02304    PY03451    PY05448   


PF00032 - Cytochrom_B_C (Pfam link)

Interpro entry IPR005798 : Cytochrome b/b6, C-terminal (Interpro link)

Interpro description:

In the mitochondrion of eukaryotes and in aerobic prokaryotes, cytochrome b is a component of respiratory chain complex III - also known as the bc1 complex or ubiquinol-cytochrome c reductase. In plant chloroplasts and cyanobacteria, there is a analogous protein, cytochrome b6, a component of the plastoquinone-plastocyanin reductase , also known as the b6f complex.

Cytochrome b/b6 is an integral membrane protein of approximately 400 amino acid residues that probably has 8 transmembrane segments. In plants and cyanobacteria, cytochrome b6 consists of two subunits encoded by the petB and petD genes. The sequence of petB is colinear with the N-terminal part of mitochondrial cytochrome b, while petD corresponds to the C-terminal part. Cytochrome b/b6 non-covalently binds two haem groups, known as b562 and b566. Four conserved histidine residues are postulated to be the ligands of the iron atoms of these two haem groups.

Apart from regions around some of the histidine haem ligands, there are a few conserved regions in the sequence of b/b6. The best conserved of these regions includes an invariant P-E-W triplet which lies in the loop that separates the fifth and sixth transmembrane segments. It seems to be important for electron transfer at the ubiquinone redox site - called Qz or Qo (where o stands for outside) - located on the outer side of the membrane. This entry is the C-terminus of these proteins.

Proteins where this domain is known:
PY00148    PY00774    PY00780   


PF00033 - Cytochrom_B_N (Pfam link)

Interpro entry IPR005797 : Cytochrome b/b6, N-terminal (Interpro link)

Interpro description:

In the mitochondrion of eukaryotes and in aerobic prokaryotes, cytochrome b is a component of respiratory chain complex III - also known as the bc1 complex or ubiquinol-cytochrome c reductase. In plant chloroplasts and cyanobacteria, there is a analogous protein, cytochrome b6, a component of the plastoquinone-plastocyanin reductase , also known as the b6f complex.

Cytochrome b/b6 is an integral membrane protein of approximately 400 amino acid residues that probably has 8 transmembrane segments. In plants and cyanobacteria, cytochrome b6 consists of two subunits encoded by the petB and petD genes. The sequence of petB is colinear with the N-terminal part of mitochondrial cytochrome b, while petD corresponds to the C-terminal part. Cytochrome b/b6 non-covalently binds two haem groups, known as b562 and b566. Four conserved histidine residues are postulated to be the ligands of the iron atoms of these two haem groups.

Apart from regions around some of the histidine haem ligands, there are a few conserved regions in the sequence of b/b6. The best conserved of these regions includes an invariant P-E-W triplet which lies in the loop that separates the fifth and sixth transmembrane segments. It seems to be important for electron transfer at the ubiquinone redox site - called Qz or Qo (where o stands for outside) - located on the outer side of the membrane. This entry is the N-terminus of these proteins.

Proteins where this domain is known:
PY00148    PY00774    PY00780   


PF00034 - Cytochrom_C (Pfam link)

Interpro entry IPR003088 : Cytochrome c, class I (Interpro link)

Pfam description:
The Pfam entry does not include all Prosite members. The cytochrome 556 and cytochrome c\' families are not included.

Interpro description:

Cytochromes c (cytC) can be defined as electron-transfer proteins having one or several haem c groups, bound to the protein by one or, more generally, two thioether bonds involving sulphydryl groups of cysteine residues. The fifth haem iron ligand is always provided by a histidine residue. CytC possess a wide range of properties and function in a large number of different redox processes.

Ambler recognised four classes of cytC.

Class I includes the low-spin soluble cytC of mitochondria and bacteria, with the haem-attachment site towards the N-terminus, and the sixth ligand provided by a methionine residue about 40 residues further on towards the C-terminus. On the basis of sequence similarity, class I cytC were further subdivided into five classes, IA to IE. Class IB includes the eukaryotic mitochondrial cytC and prokaryotic 'short' cyt c2 exemplified by Rhodopila globiformis cyt c2; class IA includes 'long' cyt c2, such as Rhodospirillum rubrum cyt c2 and Aquaspirillum itersonii cyt c-550, which have several extra loops by comparison with class IB cytC.

Proteins where this domain is known:
PY02807    PY05430   


PF00035 - dsrm (Pfam link)

Interpro entry IPR001159 : Double-stranded RNA binding (Interpro link)

Pfam description:
Sequences gathered for seed by HMM_iterative_training Putative motif shared by proteins that bind to dsRNA. At least some DSRM proteins seem to bind to specific RNA targets. Exemplified by Staufen, which is involved in localisation of at least five different mRNAs in the early Drosophila embryo. Also by interferon-induced protein kinase in humans, which is part of the cellular response to dsRNA.

Interpro description:
The DsRBD domain is found in a variety of RNA-binding proteins with different structures and exhibiting a diversity of functions. It is involved in localisation of at least five different mRNAs in the early Drosophila embryo and by interferon-induced protein kinase in humans, which is part of the cellular response to dsRNA.

Proteins where this domain has been detected by our approach:
PY00683   


PF00036 - efhand (Pfam link)

Interpro entry IPR018248 : (Interpro link)

Pfam description:
The EF-hands can be divided into two classes: signaling proteins and buffering/transport proteins. The first group is the largest and includes the most well-known members of the family such as calmodulin, troponin C and S100B. These proteins typically undergo a calcium-dependent conformational change which opens a target binding site. The latter group is represented by calbindin D9k and do not undergo calcium dependent conformational changes.

Interpro description:
Many calcium-binding proteins belong to the same evolutionary family and share a type of calcium-binding domain known as the EF-hand. This type of domain consists of a twelve residue loop flanked on both side by a twelve residue alpha-helical domain. In an EF-hand loop the calcium ion is coordinated in a pentagonal bipyramidal configuration. The six residues involved in the binding are in positions 1, 3, 5, 7, 9 and 12; these residues are denoted by X, Y, Z, -Y, -X and -Z. The invariant Glu or Asp at position 12 provides two oxygens for liganding Ca (bidentate ligand).

Proteins where this domain is known:
PY00015    PY00029    PY01623    PY01677    PY02109    PY02398    PY03298    PY03778    PY05050    PY05615    PY05880    PY06527    PY06908    PY07025    PY07071   

Proteins where this domain has been detected by our approach:
PY00524    PY00567    PY02433    PY02573    PY03528    PY04265    PY05191    PY05811    PY05855    PY06394   


PF00037 - Fer4 (Pfam link)

Interpro entry IPR001450 : 4Fe-4S ferredoxin, iron-sulphur binding, subgroup (Interpro link)

Pfam description:
Superfamily includes proteins containing domains which bind to iron-sulfur clusters. Members include bacterial ferredoxins, various dehydrogenases, and various reductases. Structure of the domain is an alpha-antiparallel beta sandwich.

Interpro description:

Ferredoxins are iron-sulphur proteins that mediate electron transfer in a range of metabolic reactions; they fall into several subgroups according to the nature of their iron-sulphur cluster(s). One group, originally found in bacteria, has been termed "bacterial-type", in which the active centre is a 4Fe-4S cluster. 4Fe-4S ferredoxins may in turn be subdivided into further groups, based on their sequence properties. Most contain at least one conserved domain, including four Cys residues that bind to a 4Fe-4S centre.

During the evolution of bacterial-type ferredoxins, intrasequence gene duplication, transposition and fusion events occured, resulting in the appearance of proteins with multiple iron-sulphur centres: e.g. dicluster- type (2[4Fe-4S]) and polyferredoxins, iron-sulphur subunits of bacterial succinate dehydrogenase/fumarate reductase, formate hydrogenlyase and formate dehydrogenase complexes, pyruvate-flavodoxin oxidoreductase, NADH:ubiquinone reductase and others. In some bacterial ferredoxins, one of the duplicated domains has lost one or more of the four conserved Cys residues. These domains have either lost their iron-sulphur binding property, or bind to a 3Fe-4S centre instead of a 4Fe-4S centre. 3D structures are now known both for a number of monocluster-type and dicluster-type 4Fe-4S ferredoxins.

CAUTION: PRINTS signature in the current entry is known to miss protein matches and should be updated in the near future.

Proteins where this domain is known:
PY04219   

Proteins where this domain has been detected by our approach:
PY04921   


PF00043 - GST_C (Pfam link)

Interpro entry IPR004046 : (Interpro link)

Pfam description:
GST conjugates reduced glutathione to a variety of targets including S-crystallin from squid, the eukaryotic elongation factor 1-gamma, the HSP26 family of stress-related proteins and auxin-regulated proteins in plants. Stringent starvation proteins in E. coli are also included in the alignment but are not known to have GST activity. The glutathione molecule binds in a cleft between N and C-terminal domains. The catalytically important residues are proposed to reside in the N-terminal domain. In plants, GSTs are encoded by a large gene family (48 GST genes in Arabidopsis) and can be divided into the phi, tau, theta, zeta, and lambda classes.

Interpro description:

In eukaryotes, glutathione S-transferases (GSTs) participate in the detoxification of reactive electrophilic compounds by catalysing their conjugation to glutathione. The GST domain is also found in S-crystallins from squid, and proteins with no known GST activity, such as eukaryotic elongation factors 1-gamma and the HSP26 family of stress-related proteins, which include auxin-regulated proteins in plants and stringent starvation proteins in Escherichia coli. The major lens polypeptide of cephalopods is also a GST.

Bacterial GSTs of known function often have a specific, growth-supporting role in biodegradative metabolism: epoxide ring opening and tetrachlorohydroquinone reductive dehalogenation are two examples of the reactions catalysed by these bacterial GSTs. Some regulatory proteins, like the stringent starvation proteins, also belong to the GST family. GST seems to be absent from Archaea in which gamma-glutamylcysteine substitute to glutathione as major thiol.

Glutathione S-transferases form homodimers, but in eukaryotes can also form heterodimers of the A1 and A2 or YC1 and YC2 subunits. The homodimeric enzymes display a conserved structural fold. Each monomer is composed of a distinct N-terminal sub-domain, which adopts the thioredoxin fold, and a C-terminal all-helical sub-domain. This entry is the C-terminal domain.

Proteins where this domain is known:
PY05088    PY06160    PY07121   


PF00044 - Gp_dh_N (Pfam link)

Interpro entry IPR000173 : Glyceraldehyde 3-phosphate dehydrogenase (Interpro link)

Pfam description:
GAPDH is a tetrameric NAD-binding enzyme involved in glycolysis and glyconeogenesis. N-terminal domain is a Rossmann NAD(P) binding fold.

Interpro description:

Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) plays an important role in glycolysis and gluconeogenesis by reversibly catalysing the oxidation and phosphorylation of D-glyceraldehyde-3-phosphate to 1,3-diphospho-glycerate. The enzyme exists as a tetramer of identical subunits, each containing 2 conserved functional domains: an NAD-binding domain, and a highly conserved catalytic domain. The enzyme has been found to bind to actin and tropomyosin, and may thus have a role in cytoskeleton assembly. Alternatively, the cytoskeleton may provide a framework for precise positioning of the glycolytic enzymes, thus permitting efficient passage of metabolites from enzyme to enzyme.

GAPDH displays diverse non-glycolytic functions as well, its role depending upon its subcellular location. For instance, the translocation of GAPDH to the nucleus acts as a signalling mechanism for programmed cell death, or apoptosis. The accumulation of GAPDH within the nucleus is involved in the induction of apoptosis, where GAPDH functions in the activation of transcription. The presence of GAPDH is associated with the synthesis of pro-apoptotic proteins like BAX, c-JUN and GAPDH itself.

GAPDH has been implicated in certain neurological diseases: GAPDH is able to bind to the gene products from neurodegenerative disorders such as Huntington's disease, Alzheimer's disease, Parkinson's disease and Machado-Joseph disease through stretches encoded by their CAG repeats. Abnormal neuronal apoptosis is associated with these diseases. Propargylamines such as deprenyl increase neuronal survival by interfering with apoptosis signalling pathways via their binding to GAPDH, which decreases the synthesis of pro-apoptotic proteins.

Proteins where this domain is known:
PY03280   


PF00051 - Kringle (Pfam link)

Interpro entry IPR000001 : (Interpro link)

Pfam description:
Kringle domains have been found in plasminogen, hepatocyte growth factors, prothrombin, and apolipoprotein A. Structure is disulfide-rich, nearly all-beta.

Interpro description:
Kringles are autonomous structural domains, found throughout the blood clotting and fibrinolytic proteins. Kringle domains are believed to play a role in binding mediators (e.g., membranes, other proteins or phospholipids), and in the regulation of proteolytic activity. Kringle domains are characterised by a triple loop, 3-disulphide bridge structure, whose conformation is defined by a number of hydrogen bonds and small pieces of anti-parallel beta-sheet. They are found in a varying number of copies in some plasma proteins including prothrombin and urokinase-type plasminogen activator, which are serine proteases belonging to MEROPS peptidase family S1A.

Proteins where this domain is known:
PY00337   


PF00053 - Laminin_EGF (Pfam link)

Interpro entry IPR002049 : (Interpro link)

Pfam description:
This family is like Pfam:PF00008 but has 8 conserved cysteines instead of 6.

Interpro description:
Laminins are the major noncollagenous components of basement membranes that mediate cell adhesion, growth migration, and differentiation. They are composed of distinct but related alpha, beta and gamma chains. The three chains form a cross-shaped molecule that consist of a long arm and three short globular arms. The long arm consist of a coiled coil structure contributed by all three chains and cross-linked by interchain disulphide bonds. Beside different types of globular domains each subunit contains, in its first half, consecutive repeats of about 60 amino acids in length that include eight conserved cysteines . The tertiary structure of this domain is remotely similar in its N-terminal to that of the EGF-like module (see. It is known as a 'LE' or 'laminin-type EGF-like' domain. The number of copies of the LE domain in the different forms of laminins is highly variable; from 3 up to 22 copies have been found. A schematic representation of the topology of the four disulphide bonds in the LE domain is shown below.
In mouse laminin gamma-1 chain, the seventh LE domain has been shown to be the only one that binds with a high affinity to nidogen. The binding-sites are located on the surface within the loops C1-C3 and C5-C6 . Long consecutive arrays of LE domains in laminins form rod-like elements of limited flexibility, which determine the spacing in the formation of laminin networks of basement membranes.

Proteins where this domain has been detected by our approach:
PY00150   


PF00056 - Ldh_1_N (Pfam link)

Interpro entry IPR001236 : Lactate/malate dehydrogenase (Interpro link)

Pfam description:
L-lactate dehydrogenases are metabolic enzymes which catalyse the conversion of L-lactate to pyruvate, the last step in anaerobic glycolysis. L-2-hydroxyisocaproate dehydrogenases are also members of the family. Malate dehydrogenases catalyse the interconversion of malate to oxaloacetate. The enzyme participates in the citric acid cycle. L-lactate dehydrogenase is also found as a lens crystallin in bird and crocodile eyes. N-terminus (this family) is a Rossmann NAD-binding fold. C-terminus is an unusual alpha+beta fold.

Interpro description:

L-lactate dehydrogenases are metabolic enzymes which catalyse the conversion of L-lactate to pyruvate, the last step in anaerobic glycolysis. L-lactate dehydrogenase is also found as a lens crystallin in bird and crocodile eyes. L-2-hydroxyisocaproate dehydrogenases are also members of the family. Malate dehydrogenases catalyse the interconversion of malate to oxaloacetate. The enzyme participates in the citric acid cycle.

Proteins where this domain is known:
PY03376    PY03885    PY03922   


PF00063 - Myosin_head (Pfam link)

Interpro entry IPR001609 : Myosin head, motor region (Interpro link)

Interpro description:

Muscle contraction is caused by sliding between the thick and thin filaments of the myofibril. Myosin is a major component of thick filaments and exists as a hexamer of 2 heavy chains, 2 alkali light chains, and 2 regulatory light chains. The heavy chain can be subdivided into the N-terminal globular head and the C-terminal coiled-coil rod-like tail, although some forms have a globular region in their C-terminal. There are many cell-specific isoforms of myosin heavy chains, coded for by a multi-gene family. Myosin interacts with actin to convert chemical energy, in the form of ATP, to mechanical energy. The 3-D structure of the head portion of myosin has been determined and a model for actin-myosin complex has been constructed.

The globular head is well conserved, some highly-conserved regions possibly relating to functional and structural domains. The rod-like tail starts with an invariant proline residue, and contains many repeats of a 28 residue region, interrupted at 4 regularly-spaced points known as skip residues. Although the sequence of the tail is not well conserved, the chemical character is, hydrophobic, charged and skip residues occuring in a highly ordered and repeated fashion.

Proteins where this domain is known:
PY00345    PY00529    PY01039    PY01085    PY01232    PY02134    PY04789   


PF00069 - Pkinase (Pfam link)

Interpro entry IPR017442 : Serine/threonine protein kinase-related (Interpro link)

Interpro description:

Protein kinases are a group of enzymes that possess a catalytic subunit which transfers the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. The enzymes fall into two broad classes, characterised with respect to substrate specificity: serine/threonine specific and tyrosine specific.

Protein kinase function has been evolutionarily conserved from Escherichia coli to human. Protein kinases play a role in a mulititude of cellular processes, including division, proliferation, apoptosis, and differentiation. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins.

The catalytic subunits of protein kinases are highly conserved, and several structures have been solved, leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases.

Eukaryotic protein kinases are enzymes that belong to a very extensive family of proteins which share a conserved catalytic core common with both serine/threonine and tyrosine protein kinases. There are a number of conserved regions in the catalytic domain of protein kinases. In the N-terminal extremity of the catalytic domain there is a glycine-rich stretch of residues in the vicinity of a lysine residue, which has been shown to be involved in ATP binding. In the central part of the catalytic domain there is a conserved aspartic acid residue which is important for the catalytic activity of the enzyme. This entry includes protein kinases from eukaryotes and viruses and may include some bacterial hits too.

Proteins where this domain is known:
PY00029    PY00054    PY00095    PY00154    PY00403    PY00717    PY00731    PY00761    PY00762    PY00790    PY01077    PY01719    PY01945    PY02109    PY02176    PY02304    PY02339    PY02340    PY02455    PY02456    PY02490    PY02503    PY02676    PY02791    PY02823    PY02877    PY02926    PY02983    PY03298    PY03317    PY03319    PY03456    PY03956    PY04005    PY04013    PY04026    PY04198    PY04265    PY04620    PY04848    PY04849    PY04971    PY05048    PY05139    PY05235    PY05330    PY05545    PY05614    PY05890    PY05975    PY06011    PY06048    PY06391    PY06394    PY06527    PY06538    PY06553    PY06554    PY06573    PY06724    PY06752    PY06874    PY06885    PY06928    PY06967    PY07390   

Proteins where this domain has been detected by our approach:
PY04471   


PF00070 - Pyr_redox (Pfam link)

Interpro entry IPR001327 : Pyridine nucleotide-disulphide oxidoreductase, NAD-binding region (Interpro link)

Pfam description:
This family includes both class I and class II oxidoreductases and also NADH oxidases and peroxidases. This domain is actually a small NADH binding domain within a larger FAD binding domain.

Interpro description:

This entry describes a small NADH binding domain within a larger FAD binding domain described by It is found in both class I and class II oxidoreductases.

FAD flavoproteins belonging to the family of pyridine nucleotide-disulphide oxidoreductases (glutathione reductase, trypanothione reductase, lipoamide dehydrogenase, mercuric reductase, thioredoxin reductase, alkyl hydroperoxide reductase) share sequence similarity with a number of other flavoprotein oxidoreductases, in particular with ferredoxin-NAD+ reductases involved in oxidative metabolism of a variety of hydrocarbons (rubredoxin reductase, putidaredoxin reductase, terpredoxin reductase, ferredoxin-NAD+ reductase components of benzene 1,2-dioxygenase, toluene 1,2-dioxygenase, chlorobenzene dioxygenase, biphenyl dioxygenase), NADH oxidase and NADH peroxidase. Comparison of the crystal structures of human glutathione reductase and Escherichia coli thioredoxin reductase reveals different locations of their active sites, suggesting that the enzymes diverged from an ancestral FAD/NAD(P)H reductase and acquired their disulphide reductase activities independently.

Despite functional similarities, oxidoreductases of this family show no sequence similarity with adrenodoxin reductases and flavoprotein pyridine nucleotide cytochrome reductases (FPNCR). Assuming that disulphide reductase activity emerged later, during divergent evolution, the family can be referred to as FAD-dependent pyridine nucleotide reductases, FADPNR.

To date, 3D structures of glutathione reductase, thioredoxin reductase, mercuric reductase, lipoamide dehydrogenase, trypanothione reductase and NADH peroxidase have been solved. The enzymes share similar tertiary structures based on a doubly-wound alpha/beta fold, but the relative orientations of their FAD- and NAD(P)H-binding domains may vary significantly. By contrast with the FPNCR family, the folds of the FAD- and NAD(P)H-binding domains are similar, suggesting that the domains evolved by gene duplication.

Proteins where this domain is known:
PY00573    PY01204    PY01431    PY02397    PY03419    PY03719    PY04793   


PF00071 - Ras (Pfam link)

Interpro entry IPR013753 : (Interpro link)

Pfam description:
Includes sub-families Ras, Rab, Rac, Ral, Ran, Rap Ypt1 and more. Shares P-loop motif with GTP_EFTU, arf and myosin_head. See Pfam:PF00009 Pfam:PF00025, Pfam:PF00063. As regards Rab GTPases, these are important regulators of vesicle formation, motility and fusion. They share a fold in common with all Ras GTPases: this is a six-stranded beta-sheet surrounded by five alpha-helices.

Interpro description:

Many members of the Ras superfamily of GTPases have been implicated in the regulation of hematopoietic cells, with roles in growth, survival, differentiation, cytokine production, chemotaxis, vesicle-trafficking, and phagocytosis. The Ras superfamily of proteins now includes over 150 small GTPases (distinguished from the large, heterotrimeric GTPases, the G-proteins). It comprises six subfamilies, the Ras, Rho, Ran, Rab, Arf, and Kir/Rem/Rad subfamilies. They exhibit remarkable overall amino acid identities, especially in the regions interacting with the guanine nucleotide exchange factors that catalyze their activation.

Proteins where this domain is known:
PY00721    PY01029    PY01075    PY01824    PY02796    PY02819    PY02876    PY04253    PY04308    PY04604    PY05254    PY07141   


PF00075 - RnaseH (Pfam link)

Interpro entry IPR002156 : Ribonuclease H (Interpro link)

Pfam description:
RNase H digests the RNA strand of an RNA/DNA hybrid. Important enzyme in retroviral replication cycle, and often found as a domain associated with reverse transcriptases. Structure is a mixed alpha+beta fold with three a/b/a layers.

Interpro description:

The RNase H domain is responsible for hydrolysis of the RNA portion of RNA x DNA hybrids, and this activity requires the presence of divalent cations (Mg2+ or Mn2+) that bind its active site. This domain is a part of a large family of homologous RNase H enzymes of which the RNase HI protein from Escherichia coli is the best characterised. Secondary structure predictions for the enzymes from E. coli, yeast, human liver and diverse retroviruses (such as Rous sarcoma virus and the Foamy viruses) supported, in every case, the five beta-strands (1 to 5) and four or five alpha-helices (A, B/C, D, E) that have been identified by crystallography in the RNase H domain of Human immunodeficiency virus 1 (HIV-1) reverse transcriptase and in E. coli RNase H. Reverse transcriptase (RT) is a modular enzyme carrying polymerase and ribonuclease H (RNase H) activities in separable domains. Reverse transcriptase (RT) converts the single-stranded RNA genome of a retrovirus into a double-stranded DNA copy for integration into the host genome. This process requires ribonuclease H as well as RNA- and DNA-directed DNA polymerase activities.

Retroviral RNase H is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. Bacterial RNase Hcatalyses endonucleolytic cleavage to 5'-phosphomonoester acting on RNA-DNA hybrids.

The 3D structure of the RNase H domain from diverse bacteria and retroviruses has been solved. All have four beta strands and four to five alpha helices. The E. coli RNase H1 protein binds a single Mg2+ ion cofactor in the active site of the enzyme. The divalent cation is bound by the carboxyl groups of four acidic residues, Asp-10, Glu-48, Asp-70, and Asp-134. The first three acidic residues are highly conserved in all bacterial and retroviral RNase H sequences.

Proteins where this domain is known:
PY07014    PY07288   


PF00076 - RRM_1 (Pfam link)

Interpro entry IPR000504 : RNA recognition motif, RNP-1 (Interpro link)

Pfam description:
The RRM motif is probably diagnostic of an RNA binding protein. RRMs are found in a variety of RNA binding proteins, including various hnRNP proteins, proteins implicated in regulation of alternative splicing, and protein components of snRNPs. The motif also appears in a few single stranded DNA binding proteins. The RRM structure consists of four strands and two helices arranged in an alpha/beta sandwich, with a third helix present during RNA binding in some cases The C-terminal beta strand (4th strand) and final helix are hard to align and have been omitted in the SEED alignment The LA proteins (Swiss:P05455) have a N terminus rrm which is included in the seed. There is a second region towards the C terminus that has some features of a rrm but does not appear to have the important structural core of a rrm. The LA proteins (Swiss:P05455) are one of the main autoantigens in Systemic lupus erythematosus (SLE), an autoimmune disease.

Interpro description:

Many eukaryotic proteins containing one or more copies of a putative RNA-binding domain of about 90 amino acids are known to bind single-stranded RNAs. The largest group of single strand RNA-binding proteins is the eukaryotic RNA recognition motif (RRM) family that contains an eight amino acid RNP-1 consensus sequence. RRM proteins have a variety of RNA binding preferences and functions, and include heterogeneous nuclear ribonucleoproteins (hnRNPs), proteins implicated in regulation of alternative splicing (SR, U2AF, Sxl), protein components of small nuclear ribonucleoproteins (U1 and U2 snRNPs), and proteins that regulate RNA stability and translation (PABP, La, Hu). The RRM in heterodimeric splicing factor U2 snRNP auxiliary factor (U2AF) appears to have two RRM-like domains with specialised features for protein recognition. The motif also appears in a few single stranded DNA binding proteins.

The typical RRM consists of four anti-parallel beta-strands and two alpha-helices arranged in a beta-alpha-beta-beta-alpha-beta fold with side chains that stack with RNA bases. Specificity of RNA binding is determined by multiple contacts with surrounding amino acids. A third helix is present during RNA binding in some cases. The RRM is reviewed in a number of publications.

Proteins where this domain is known:
PY00009    PY00080    PY00123    PY00512    PY00540    PY00605    PY00947    PY00973    PY01048    PY01167    PY01195    PY01202    PY01477    PY01659    PY01748    PY01794    PY01815    PY02124    PY02204    PY02603    PY02680    PY02814    PY02887    PY03158    PY03159    PY03171    PY03224    PY03354    PY03520    PY03562    PY03602    PY03627    PY04111    PY04163    PY04169    PY04347    PY04361    PY04393    PY04528    PY04699    PY04813    PY05280    PY05398    PY05435    PY05533    PY05537    PY05556    PY05612    PY05765    PY05866    PY06963    PY07239    PY07349    PY07541   

Proteins where this domain has been detected by our approach:
PY00795    PY01028    PY01620    PY03466    PY03630    PY04677    PY04822   


PF00077 - RVP (Pfam link)

Interpro entry IPR018061 : (Interpro link)

Pfam description:
Single domain aspartyl proteases from retroviruses, retrotransposons, and badnaviruses (plant dsDNA viruses). These proteases are generally part of a larger polyprotein; usually pol, more rarely gag. Retroviral proteases appear to be homologous to a single domain of the two-domain eukaryotic aspartyl proteases such as pepsins, cathepsins, and renins (Pfam:PF00026).

Interpro description:

In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

Aspartic endopeptidases of vertebrate, fungal and retroviral origin have been characterised. More recently, aspartic endopeptidases associated with the processing of bacterial type 4 prepilin and archaean preflagellin have been described.

Structurally, aspartic endopeptidases are bilobal enzymes, each lobe contributing a catalytic Asp residue, with an extended active site cleft localised between the two lobes of the molecule. One lobe has probably evolved from the other through a gene duplication event in the distant past. In modern-day enzymes, although the three-dimensional structures are very similar, the amino acid sequences are more divergent, except for the catalytic site motif, which is very conserved. The presence and position of disulphide bridges are other conserved features of aspartic peptidases. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

This group of aspartic peptidases belong to the MEROPS peptidase family A2 (retropepsin family, clan AA), subfamily A2A. The family includes the single domain aspartic proteases from retroviruses, retrotransposons, and badnaviruses (plant dsDNA viruses).

Retroviral aspartyl protease is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins.

Proteins where this domain is known:
PY07841   

Proteins where this domain has been detected by our approach:
PY07288   


PF00078 - RVT_1 (Pfam link)

Interpro entry IPR000477 : RNA-directed DNA polymerase (reverse transcriptase) (Interpro link)

Pfam description:
A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovirus. Reverse transcriptases occur in a variety of mobile elements, including retrotransposons, retroviruses, group II introns, bacterial msDNAs, hepadnaviruses, and caulimoviruses.

Interpro description:
The use of an RNA template to produce DNA, for integration into the host genome and exploitation of a host cell, is a strategy employed in the replication of retroid elements, such as the retroviruses and bacterial retrons. The enzyme catalysing polymerisation is an RNA-directed DNA-polymerase, or reverse trancriptase (RT). Reverse transcriptase occurs in a variety of mobile elements, including retrotransposons, retroviruses, group II introns, bacterial msDNAs, hepadnaviruses, and caulimoviruses.

Retroviral reverse transcriptase is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. The discovery of retroelements in the prokaryotes raises intriguing questions concerning their roles in bacteria and the origin and evolution of reverse transcriptases and whether the bacterial reverse transcriptases are older than eukaryotic reverse transcriptases.

Proteins where this domain is known:
PY06363    PY07288    PY07613    PY07669    PY07841   


PF00081 - Sod_Fe_N (Pfam link)

Interpro entry IPR001189 : Manganese and iron superoxide dismutase (Interpro link)

Pfam description:
superoxide dismutases (SODs) catalyse the conversion of superoxide radicals to hydrogen peroxide and molecular oxygen. Three evolutionarily distinct families of SODs are known, of which the Mn/Fe-binding family is one. In humans, there is a cytoplasmic Cu/Zn SOD, and a mitochondrial Mn/Fe SOD. N-terminal domain is a long alpha antiparallel hairpin. A small fragment of YTRE_LEPBI matches well - sequencing error?

Interpro description:

Superoxide dismutases (SODs) catalyse the conversion of superoxide radicals to molecular oxygen. Their function is to destroy the radicals that are normally produced within cells and are toxic to biological systems. Three evolutionarily distinct families of SODs are known, of which the Mn/Fe-binding family is one. This family includes both single metal-binding SODs and cambialistic SOD, which can bind either Mn or Fe. Fe/MnSODs are ubiquitous enzymes that are responsible for the majority of SOD activity in prokaryotes, fungi, blue-green algae and mitochondria. Fe/MnSODs are found as homodimers or homotetramers.

The structure of Fe/MnSODs can be divided into two domains, an alpha N-terminal domain and an alpha/beta C-terminal domain, connected by a loop. The structure of the N-terminal domain consists of a two helices in an antiparallel hairpin, with a left-handed twist. The structure of the C-terminal domain is of the alpha/beta type, and consists of a three-stranded antiparallel beta-sheet in the order 213, along with four helices in the arrangement alpha/beta(2)/alpha/beta/alpha(2).

Proteins where this domain is known:
PY05422   


PF00082 - Peptidase_S8 (Pfam link)

Interpro entry IPR000209 : Peptidase S8 and S53, subtilisin, kexin, sedolisin (Interpro link)

Pfam description:
Subtilases are a family of serine proteases. They appear to have independently and convergently evolved an Asp/Ser/His catalytic triad, like that found in the trypsin serine proteases (see Pfam:PF00089). Structure is an alpha/beta fold containing a 7-stranded parallel beta sheet, order 2314567.

Interpro description:

Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases.

Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base. The geometric orientations of the catalytic residues are similar between families, despite different protein folds. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC).

In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

This group of serine peptidases belong to the MEROPS peptidase families S8 (subfamilies S8A (subtilisin) and S8B (kexin)) and S53 (sedolisin) both of which are members of clan SB.

The subtilisin family is the second largest serine protease family characterised to date. Over 200 subtilises are presently known, more than 170 of which with their complete amino acid sequence. It is widespread, being found in eubacteria, archaebacteria, eukaryotes and viruses. The vast majority of the family are endopeptidases, although there is an exopeptidase, tripeptidyl peptidase. Structures have been determined for several members of the subtilisin family: they exploit the same catalytic triad as the chymotrypsins, although the residues occur in a different order (HDS in chymotrypsin and DHS in subtilisin), but the structures show no other similarity. Some subtilisins are mosaic proteins, and others contain N- and C-terminal extensions that show no sequence similarity to any other known protein. Based on sequence homology, a subdivision into six families has been proposed.

The proprotein-processing endopeptidases kexin, furin and related enzymes form a distinct subfamily known as the kexin subfamily (S8B). These preferentially cleave C-terminally to paired basic amino acids. Members of this subfamily can be identified by subtly different motifs around the active site. Members of the kexin family, along with endopeptidases R, T and K from the yeast Tritirachium and cuticle-degrading peptidase from Metarhizium, require thiol activation. This can be attributed to the presence of Cys-173 near to the active histidine.Only 1 viral member of the subtilisin family is known, a 56-kDa protease from herpes virus 1, which infects the channel catfish.

Sedolisins (serine-carboxyl peptidases) are proteolytic enzymes whose fold resembles that of subtilisin; however, they are considerably larger, with the mature catalytic domains containing approximately 375 amino acids. The defining features of these enzymes are a unique catalytic triad, Ser-Glu-Asp, as well as the presence of an aspartic acid residue in the oxyanion hole. High-resolution crystal structures have now been solved for sedolisin from Pseudomonas sp. 101, as well as for kumamolisin from a thermophilic bacterium, Bacillus sp. MN-32. Mutations in the human gene leads to a fatal neurodegenerative disease.

Proteins where this domain is known:
PY01222    PY04329   


PF00083 - Sugar_tr (Pfam link)

Interpro entry IPR005828 : General substrate transporter (Interpro link)

Interpro description:

Recent genome-sequencing data and a wealth of biochemical and molecular genetic investigations have revealed the occurrence of dozens of families of primary and secondary transporters. Two such families have been found to occur ubiquitously in all classifications of living organisms. These are the ATP-binding cassette (ABC) superfamily and the major facilitator superfamily (MFS), also called the uniporter-symporter-antiporter family. While ABC family permeases are in general multicomponent primary active transporters, capable of transporting both small molecules and macromolecules in response to ATP hydrolysis the MFS transporters are single-polypeptide secondary carriers capable only of transporting small solutes in response to chemiosmotic ion gradients. Although well over 100 families of transporters have now been recognized and classified, the ABC superfamily and MFS account for nearly half of the solute transporters encoded within the genomes of microorganisms. They are also prevalent in higher organisms. The importance of these two families of transport systems to living organisms can therefore not be overestimated.

The MFS was originally believed to function primarily in the uptake of sugars but subsequent studies revealed that drug efflux systems, Krebs cycle metabolites, organophosphate:phosphate exchangers, oligosaccharide:H1 symport permeases, and bacterial aromatic acid permeases were all members of the MFS. These observations led to the probability that the MFS is far more widespread in nature and far more diverse in function than had been thought previously. 17 subgroups of the MFS have been identified.

Evidence suggests that the MFS permeases arose by a tandem intragenic duplication event in the early prokaryotes. This event generated a 2-transmembrane-spanner (TMS) protein topology from a primordial 6-TMS unit. Surprisingly, all currently recognized MFS permeases retain the two six-TMS units within a single polypeptide chain, although in 3 of the 17 MFS families, an additional two TMSs are found. Moreover, the well-conserved MFS specific motif between TMS2 and TMS3 and the related but less well conserved motif between TMS8 and TMS9 prove to be a characteristic of virtually all of the more than 300 MFS proteins identified.

Proteins where this domain is known:
PY00899   


PF00085 - Thioredoxin (Pfam link)

Interpro entry IPR013766 : Thioredoxin domain (Interpro link)

Pfam description:
Thioredoxins are small enzymes that participate in redox reactions, via the reversible oxidation of an active centre disulfide bond. Some members with only the active site are not separated from the noise.

Interpro description:

Thioredoxins are small disulphide-containing redox proteins that have been found in all the kingdoms of living organisms. Thioredoxin serves as a general protein disulphide oxidoreductase. It interacts with a broad range of proteins by a redox mechanism based on reversible oxidation of two cysteine thiol groups to a disulphide, accompanied by the transfer of two electrons and two protons. The net result is the covalent interconversion of a disulphide and a dithiol. In the NADPH-dependent protein disulphide reduction, thioredoxin reductase (TR) catalyses the reduction of oxidised thioredoxin (trx) by NADPH using FAD and its redox-active disulphide; reduced thioredoxin then directly reduces the disulphide in the substrate protein .

Thioredoxin is present in prokaryotes and eukaryotes and the sequence around the redox-active disulphide bond is well conserved. All thioredoxins contain a cis-proline located in a loop preceding beta-strand 4, which makes contact with the active site cysteines, and is important for stability and function. Thioredoxin belongs to a structural family that includes glutaredoxin, glutathione peroxidase, bacterial protein disulphide isomerase DsbA, and the N-terminal domain of glutathione transferase. Thioredoxins have a beta-alpha unit preceding the motif common to all these proteins.

A number of eukaryotic proteins contain domains evolutionary related to thioredoxin, most of them are protein disulphide isomerases (PDI). PDI is an endoplasmic reticulum multi-functional enzyme that catalyses the formation and rearrangement of disulphide bonds during protein folding. All PDI contains two or three (ERp72) copies of the thioredoxin domain, each of which contributes to disulphide isomerase activity, but which are functionally non-equivalent. Moreover, PDI exhibits chaperone-like activity towards proteins that contain no disulphide bonds, i.e. behaving independently of its disulphide isomerase activity. The various forms of PDI which are currently known are:

Bacterial proteins that act as thiol:disulphide interchange proteins that allows disulphide bond formation in some periplasmic proteins also contain a thioredoxin domain. These proteins are:

This entry represents the thioredoxin domain.

Proteins where this domain is known:
PY00638    PY03715    PY04185    PY04296    PY06242    PY06980    PY07174   

Proteins where this domain has been detected by our approach:
PY00223    PY05335   


PF00090 - TSP_1 (Pfam link)

Interpro entry IPR000884 : (Interpro link)

Interpro description:

Thrombospondins are multimeric multidomain glycoproteins that function at cell surfaces and in the extracellular matrix milieu. They act as regulators of cell interactions in vertebrates. They are divided into two subfamilies, A and B, according to their overall molecular organisation. The subgroup A proteins TSP-1 and -2 contain an N-terminal domain, a VWFC domain , three TSP1 repeats, three EGF-like domains, TSP3 repeats and a C-terminal domain. They are assembled as trimer. The subgroup B thrombospondins, designated TSP-3, -4, and COMP (cartilage oligomeric matrix protein, also designated TSP-5) are distinct in that they contain unique N-terminal regions, lack the VWFC domain and TSP1 repeats, contain four copies of EGF-like domains, and are assembled as pentamers . EGF, TSP3 repeats and the C-terminal domain are thus the hallmark of a thrombospondin.

This repeat was first described in 1986 by Lawler and Hynes. It was found in the thrombospondin protein where it is repeated 3 times. Now a number of proteins involved in the complement pathway (properdin, C6, C7, C8A, C8B, C9) as well as extracellular matrix protein like mindin, F-spondin, SCO-spondin and even the circumsporozoite surface protein 2 and TRAP proteins of Plasmodium contain one or more instance of this repeat. It has been involved in cell-cell interraction, inhibition of angiogenesis and apoptosis.

The intron-exon organisation of the properdin gene confirms the hypothesis that the repeat might have evolved by a process involving exon shuffling. A study of properdin structure provides some information about the structure of the thrombospondin type I repeat.

Proteins where this domain is known:
PY03052    PY03168    PY04732    PY04858    PY07092   

Proteins where this domain has been detected by our approach:
PY01499    PY02498   


PF00091 - Tubulin (Pfam link)

Interpro entry IPR003008 : Tubulin/FtsZ, GTPase (Interpro link)

Pfam description:
This family includes the tubulin alpha, beta and gamma chains, as well as the bacterial FtsZ family of proteins. Members of this family are involved in polymer formation. FtsZ is the polymer-forming protein of bacterial cell division. It is part of a ring in the middle of the dividing cell that is required for constriction of cell membrane and cell envelope to yield two daughter cells. FtsZ and tubulin are GTPases. FtsZ can polymerise into tubes, sheets, and rings in vitro and is ubiquitous in eubacteria and archaea. Tubulin is the major component of microtubules.

Interpro description:

This domain is found in all tubulin chains, as well as the bacterial FtsZ family of proteins. These proteins are involved in polymer formation. Tubulin is the major component of microtubules, while FtsZ is the polymer-forming protein of bacterial cell division, it is part of a ring in the middle of the dividing cell that is required for constriction of cell membrane and cell envelope to yield two daughter cells. FtsZ and tubulin are GTPases, this entry is the GTPase domain. FtsZ can polymerise into tubes, sheets, and rings in vitro and is ubiquitous in bacteria and archaea.

Proteins where this domain is known:
PY00808    PY01155    PY01830    PY04063    PY05711    PY05777   


PF00092 - VWA (Pfam link)

Interpro entry IPR002035 : (Interpro link)

Interpro description:
The von Willebrand factor is a large multimeric glycoprotein found in blood plasma. Mutant forms are involved in the aetiology of bleeding disorders . In von Willebrand factor, the type A domain (vWF) is the prototype for a protein superfamily. The vWF domain is found in various plasma proteins: complement factors B, C2, CR3 and CR4; the integrins (I-domains); collagen types VI, VII, XII and XIV; and other extracellular proteins. Although the majority of VWA-containing proteins are extracellular, the most ancient ones present in all eukaryotes are all intracellular proteins involved in functions such as transcription, DNA repair, ribosomal and membrane transport and the proteasome. A common feature appears to be involvement in multiprotein complexes. Proteins that incorporate vWF domains participate in numerous biological events (e.g. cell adhesion, migration, homing, pattern formation, and signal transduction), involving interaction with a large array of ligands. A number of human diseases arise from mutations in VWA domains. Secondary structure prediction from 75 aligned vWF sequences has revealed a largely alternating sequence of alpha-helices and beta-strands. Fold recognition algorithms were used to score sequence compatibility with a library of known structures: the vWF domain fold was predicted to be a doubly-wound, open, twisted beta-sheet flanked by alpha-helices. 3D structures have been determined for the I-domains of integrins CD11b (with bound magnesium) and CD11a (with bound manganese). The domain adopts a classic alpha/beta Rossmann fold and contains an unusual metal ion coordination site at its surface. It has been suggested that this site represents a general metal ion-dependent adhesion site (MIDAS) for binding protein ligands. The residues constituting the MIDAS motif in the CD11b and CD11a I-domains are completely conserved, but the manner in which the metal ion is coordinated differs slightly.

Proteins where this domain is known:
PY01499    PY03052    PY04858    PY07681   


PF00096 - zf-C2H2 (Pfam link)

Interpro entry IPR007087 : Zinc finger, C2H2-type (Interpro link)

Pfam description:
The C2H2 zinc finger is the classical zinc finger domain. The two conserved cysteines and histidines co-ordinate a zinc ion. The following pattern describes the zinc finger. #-X-C-X(1-5)-C-X3-#-X5-#-X2-H-X(3-6) Where X can be any amino acid, and numbers in brackets indicate the number of residues. The positions marked # are those that are important for the stable fold of the zinc finger. The final position can be either his or cys. The C2H2 zinc finger is composed of two short beta strands followed by an alpha helix. The amino terminal part of the helix binds the major groove in DNA binding zinc fingers. The accepted consensus binding sequence for Sp1 is usually defined by the asymmetric hexanucleotide core GGGCGG but this sequence does not include, among others, the GAG (=CTC) repeat that constitutes a high-affinity site for Sp1 binding to the wt1 promoter.

Interpro description:

C2H2-type (classical) zinc fingers (Znf) were the first class to be characterised. They contain a short beta hairpin and an alpha helix (beta/beta/alpha structure), where a single zinc atom is held in place by Cys(2)His(2) (C2H2) residues in a tetrahedral array. C2H2 Znf's can be divided into three groups based on the number and pattern of fingers: triple-C2H2 (binds single ligand), multiple-adjacent-C2H2 (binds multiple ligands), and separated paired-C2H2. C2H2 Znf's are the most common DNA-binding motifs found in eukaryotic transcription factors, and have also been identified in prokaryotes. Transcription factors usually contain several Znf's (each with a conserved beta/beta/alpha structure) capable of making multiple contacts along the DNA, where the C2H2 Znf motifs recognise DNA sequences by binding to the major groove of DNA via a short alpha-helix in the Znf, the Znf spanning 3-4 bases of the DNA. C2H2 Znf's can also bind to RNA and protein targets.

Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

(Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

This entry represents the classical C2H2 type zinc finger domain.

More information about these proteins can be found at Protein of the Month: Zinc Fingers.

Proteins where this domain is known:
PY01286    PY03615    PY05522   

Proteins where this domain has been detected by our approach:
PY00909    PY01986    PY02536    PY03396    PY05576    PY05938   


PF00097 - zf-C3HC4 (Pfam link)

Pfam description:
The C3HC4 type zinc-finger (RING finger) is a cysteine-rich domain of 40 to 60 residues that coordinates two zinc ions, and has the consensus sequence: C-X2-C-X(9-39)-C-X(1-3)-H-X(2-3)-C-X2-C-X(4-48)-C-X2-C where X is any amino acid. Many proteins containing a RING finger play a key role in the ubiquitination pathway.

Proteins where this domain is known:
PY00003    PY00025    PY00197    PY00652    PY00969    PY01408    PY01709    PY01765    PY01907    PY02287    PY02581    PY02949    PY02950    PY03501    PY03640    PY04641    PY05143    PY05938    PY06314    PY06576    PY06662    PY06836    PY06983   

Proteins where this domain has been detected by our approach:
PY00030    PY00512    PY00648    PY00764    PY01730    PY02739    PY03290    PY03832   


PF00098 - zf-CCHC (Pfam link)

Interpro entry IPR001878 : Zinc finger, CCHC-type (Interpro link)

Pfam description:
The zinc knuckle is a zinc binding motif composed of the the following CX2CX4HX4C where X can be any amino acid. The motifs are mostly from retroviral gag proteins (nucleocapsid). Prototype structure is from HIV. Also contains members involved in eukaryotic gene regulation, such as C. elegans GLH-1. Structure is an 18-residue zinc finger.

Interpro description:

Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

(Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

This entry represents the CysCysHisCys (CCHC) type zinc finger domains, and have the sequence:

where X can be any amino acid, and number indicates the number of residues. These 18 residues CCHC zinc finger domains are mainly found in the nucleocapsid protein of retroviruses. It is required for viral genome packaging and for early infection process. It is also found in eukaryotic proteins involved in RNA binding or single-stranded DNA binding.

More information about these proteins can be found at Protein of the Month: Zinc Fingers.

Proteins where this domain has been detected by our approach:
PY01284    PY03466    PY04200   


PF00106 - adh_short (Pfam link)

Interpro entry IPR002198 : Short-chain dehydrogenase/reductase SDR (Interpro link)

Pfam description:
This family contains a wide variety of dehydrogenases.

Interpro description:
The short-chain dehydrogenases/reductases family (SDR) is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterised was Drosophila alcohol dehydrogenase, this family used to be called 'insect-type', or 'short-chain' alcohol dehydrogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least 2 domains, the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains.

Proteins where this domain is known:
PY02416    PY05567   


PF00107 - ADH_zinc_N (Pfam link)

Interpro entry IPR013149 : Alcohol dehydrogenase, zinc-binding (Interpro link)

Interpro description:
Alcohol dehydrogenase (ADH) catalyzes the reversible oxidation of alcohols to their corresponding acetaldehyde or ketone with the concomitant reduction of NAD:
 alcohol + NAD = aldehyde or ketone + NADH 
Currently three structurally and catalytically different types of alcohol dehydrogenases are known:
  1. Zinc-containing 'long-chain' alcohol dehydrogenases.
  2. Insect-type, or 'short-chain' alcohol dehydrogenases.
  3. Iron-containing alcohol dehydrogenases.
Zinc-containing ADH's are dimeric or tetrameric enzymes that bind two atoms of zinc per subunit. One of the zinc atom is essential for catalytic activity while the other is not. Both zinc atoms are coordinated by either cysteine or histidine residues; the catalytic zinc is coordinated by two cysteines and one histidine. Zinc-containing ADH's are found in bacteria, mammals, plants, and in fungi. In many species there is more than one isozyme (for example, humans have at least six isozymes, yeast have three, etc.). A number of other zinc-dependent dehydrogenases are closely related to zinc ADH and are included in this family.

In addition, this family includes NADP-dependent quinone oxidoreductase, an enzyme found in bacteria (gene qor), in yeast and in mammals where, in some species such as rodents, it has been recruited as an eye lens protein and is known as zeta-crystallin . The sequence of quinone oxidoreductase is distantly related to that other zinc-containing alcohol dehydrogenases and it lacks the zinc-ligand residues. The torpedo fish and mammalian synaptic vesicle membrane protein vat-1 is related to qor.

This entry represents the cofactor-binding domain of these enzymes, which is normally found towards the C-terminus. Structural studies indicate that it forms a classical Rossman fold that reversibly binds NAD(H).

Proteins where this domain is known:
PY04242   


PF00108 - Thiolase_N (Pfam link)

Interpro entry IPR002155 : (Interpro link)

Pfam description:
Thiolase is reported to be structurally related to beta-ketoacyl synthase (Pfam:PF00109), and also chalcone synthase.

Interpro description:

Two different types of thiolase are found both in eukaryotes and in prokaryotes: acetoacetyl-CoA thiolase and 3-ketoacyl-CoA thiolase. 3-ketoacyl-CoA thiolase (also called thiolase I) has a broad chain-length specificity for its substrates and is involved in degradative pathways such as fatty acid beta-oxidation. Acetoacetyl-CoA thiolase (also called thiolase II) is specific for the thiolysis of acetoacetyl-CoA and involved in biosynthetic pathways such as poly beta-hydroxybutyrate synthesis or steroid biogenesis.

In eukaryotes, there are two forms of 3-ketoacyl-CoA thiolase: one located in the mitochondrion and the other in peroxisomes.

There are two conserved cysteine residues important for thiolase activity. The first located in the N-terminal section of the enzymes is involved in the formation of an acyl-enzyme intermediate; the second located at the C-terminal extremity is the active site base involved in deprotonation in the condensation reaction.

Mammalian nonspecific lipid-transfer protein (nsL-TP) (also known as sterol carrier protein 2) is a protein which seems to exist in two different forms: a 14 Kd protein (SCP-2) and a larger 58 Kd protein (SCP-x). The former is found in the cytoplasm or the mitochondria and is involved in lipid transport; the latter is found in peroxisomes. The C-terminal part of SCP-x is identical to SCP-2 while the N-terminal portion is evolutionary related to thiolases.

Proteins where this domain is known:
PY01991   


PF00109 - ketoacyl-synt (Pfam link)

Interpro entry IPR014030 : (Interpro link)

Pfam description:
The structure of beta-ketoacyl synthase is similar to that of the thiolase family (Pfam::PF00108) and also chalcone synthase. The active site of beta-ketoacyl synthase is located between the N and C-terminal domains. The N-terminal domain contains most of the structures involved in dimer formation and also the active site cysteine.

Interpro description:

Beta-ketoacyl-ACP synthase(KAS) is the enzyme that catalyzes the condensation of malonyl-ACP with the growing fatty acid chain. It is found as a component of a number of enzymatic systems, including fatty acid synthetase (FAS), which catalyzes the formation of long-chain fatty acids from acetyl-CoA, malonyl-CoA and NADPH; the multi-functional 6-methysalicylic acid synthase (MSAS) from Penicillium patulum, which is involved in the biosynthesis of a polyketide antibiotic; polyketide antibiotic synthase enzyme systems; Emericella nidulans multifunctional protein Wa, which is involved in the biosynthesis of conidial green pigment; Rhizobium nodulation protein nodE, which probably acts as a beta-ketoacyl synthase in the synthesis of the nodulation Nod factor fatty acyl chain; and yeast mitochondrial protein CEM1. The condensation reaction is a two step process, first the acyl component of an activated acyl primer is transferred to a cysteine residue of the enzyme and is then condensed with an activated malonyl donor with the concomitant release of carbon dioxide.

This entry represents the N-terminal domain of beta-ketoacyl-ACP synthases.

Proteins where this domain is known:
PY04452   


PF00111 - Fer2 (Pfam link)

Interpro entry IPR001041 : Ferredoxin (Interpro link)

Pfam description:
Several members of the Prosite family are not included since they only contain the active site.

Interpro description:

The ferredoxin protein family are electron carrier proteins with an iron-sulphur cofactor that act in a wide variety of metabolic reactions. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron-sulphur cluster(s) and according to sequence similarities.

This entry represents members of the 2Fe-2S ferredoxin family that have a general core structure consisting of beta(2)-alpha-beta(2), which includes putidaredoxin and terpredoxin, and adrenodoxin. They are proteins of around one hundred amino acids with four conserved cysteine residues to which the 2Fe-2S cluster is ligated. This conserved region is also found as a domain in various metabolic enzymes and in multidomain proteins, such as aldehyde oxidoreductase (N-terminal), xanthine oxidase (N-terminal), phthalate dioxygenase reductase (C-terminal), succinate dehydrogenase iron-sulphur protein (N-terminal), and methane monooxygenase reductase (N-terminal).

Proteins where this domain is known:
PY03801    PY04921    PY07468   


PF00112 - Peptidase_C1 (Pfam link)

Interpro entry IPR000668 : Peptidase C1A, papain C-terminal (Interpro link)

Interpro description:

In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad.

This group of proteins belong to the peptidase family C1, sub-family C1A (papain family, clan CA). It includes proteins classed as non-peptidase homologs. These are have either been shown experimentally to lack peptidase activity or lack one or more of the active site residues.

The papain family has a wide variety of activities, including broad-range (papain) and narrow-range endo-peptidases, aminopeptidases, dipeptidyl peptidases and enzymes with both exo- and endo-peptidase activity. Members of the papain family are widespread, found in baculovirus, eubacteria, yeast, and practically all protozoa, plants and mammals. The proteins are typically lysosomal or secreted, and proteolytic cleavage of the propeptide is required for enzyme activation, although bleomycin hydrolase is cytosolic in fungi and mammals. Papain-like cysteine proteinases are essentially synthesised as inactive proenzymes (zymogens) with N-terminal propeptide regions. The activation process of these enzymes includes the removal of propeptide regions. The propeptide regions serve a variety of functions in vivo and in vitro. The pro-region is required for the proper folding of the newly synthesised enzyme, the inactivation of the peptidase domain and stabilisation of the enzyme against denaturing at neutral to alkaline pH conditions. Amino acid residues within the pro-region mediate their membrane association, and play a role in the transport of the proenzyme to lysosomes. Among the most notable features of propeptides is their ability to inhibit the activity of their cognate enzymes and that certain propeptides exhibit high selectivity for inhibition of the peptidases from which they originate.

The catalytic residues of papain are Cys-25 and His-159, other important residues being Gln-19, which helps form the 'oxyanion hole', and Asn-175, which orientates the imidazole ring of His-159.

Proteins where this domain is known:
PY00109    PY00291    PY00292    PY00293    PY00783    PY01068    PY01568    PY02062    PY02063    PY02150    PY05365   


PF00113 - Enolase_C (Pfam link)

Interpro entry IPR000941 : Enolase (Interpro link)

Interpro description:

Enolase (2-phospho-D-glycerate hydrolase) is an essential glycolytic enzyme that catalyses the interconversion of 2-phosphoglycerate and phosphoenolpyruvate. In vertebrates, there are 3 different, tissue-specific isoenzymes, designated alpha, beta and gamma. Alpha is present in most tissues, beta is localised in muscle tissue, and gamma is found only in nervous tissue. The functional enzyme exists as a dimer of any 2 isoforms. In immature organs and in adult liver, it is usually an alpha homodimer, in adult skeletal muscle, a beta homodimer, and in adult neurons, a gamma homodimer. In developing muscle, it is usually an alpha/beta heterodimer, and in the developing nervous system, an alpha/gamma heterodimer. The tissue specific forms display minor kinetic differences. Tau-crystallin, one of the major lens proteins in some fish, reptiles and birds, has been shown to be evolutionary related to enolase.

Neuron-specific enolase is released in a variety of neurological diseases, such as multiple sclerosis and after seizures or acute stroke. Several tumour cells have also been found positive for neuron-specific enolase. Beta-enolase deficiency is associated with glycogenosis type XIII defect.

Proteins where this domain is known:
PY06644   


PF00115 - COX1 (Pfam link)

Interpro entry IPR000883 : Cytochrome c oxidase, subunit I (Interpro link)

Interpro description:
Cytochrome c oxidase is a key enzyme in aerobic metabolism. Proton pumping haem-copper oxidases represent the terminal, energy-transfer enzymes of respiratory chains in prokaryotes and eukaryotes. The CuB-haem a3 (or haem o) binuclear centre, associated with the largest subunit I of cytochrome c and ubiquinol oxidases, is directly involved in the coupling between dioxygen reduction and proton pumping. Some terminal oxidases generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner membrane (eukaryotes).

The enzyme complex consists of 3-4 subunits (prokaryotes) up to 13 polypeptides (mammals) of which only the catalytic subunit (equivalent to mammalian subunit I (CO I)) is found in all haem-copper respiratory oxidases. The presence of a bimetallic centre (formed by a high-spin haem and copper B) as well as a low-spin haem, both ligated to six conserved histidine residues near the outer side of four transmembrane spans within CO I is common to all family members. In contrast to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases. The enzyme complexes vary in haem and copper composition, substrate type and substrate affinity. The different respiratory oxidases allow the cells to customize their respiratory systems according to a variety of environmental growth conditions.

It has been shown that eubacterial quinol oxidase was derived from cytochrome c oxidase in Gram-positive bacteria and that archaebacterial quinol oxidase has an independent origin. A considerable amount of evidence suggests that proteobacteria (Purple bacteria) acquired quinol oxidase through a lateral gene transfer from Gram-positive bacteria.

Nitric oxide reductase (NOR) exists in denitrifying species of archae and eubacteria and is a heterodimer of cytochromes b and c. Phenazine methosulphate can act as acceptor. The prosite signature in this entry recognises the haem-copper site of the nitric oxidases.

Proteins where this domain is known:
PY00149    PY00775   


PF00116 - COX2 (Pfam link)

Interpro entry IPR002429 : Cytochrome c oxidase subunit II C-terminal (Interpro link)

Interpro description:

Cytochrome c oxidase is an oligomeric enzymatic complex which is a component of the respiratory chain and is involved in the transfer of electrons from cytochrome c to oxygen. In eukaryotes this enzyme complex is located in the mitochondrial inner membrane; in aerobic prokaryotes it is found in the plasma membrane. The number of polypeptides in the complex ranges from 3-4 (prokaryotes), up to 13(mammals).

Subunit 2 (CO II) transfers the electrons from cytochrome c to the catalytic subunit 1. It contains two adjacent transmembrane regions in its N-terminus and the major part of the protein is exposed to the periplasmic or to the mitochondrial intermembrane space, respectively. CO II provides the substrate-binding site and contains a copper centre called Cu(A), probably the primary acceptor in cytochrome c oxidase. An exception is the corresponding subunit of the cbb3-type oxidase which lacks the copper A redox-centre. Several bacterial CO II have a C-terminal extension that contains a covalently bound haem c.

Proteins where this domain is known:
PY01261   


PF00117 - GATase (Pfam link)

Interpro entry IPR000991 : Glutamine amidotransferase class-I, C-terminal (Interpro link)

Interpro description:

Glutamine amidotransferase (GATase) activity involves the removal of the ammonia group from a glutamate molecule and its subsequent transfer to a specific substrate, thus creating a new carbon-nitrogen group on the substrate. This activity is found in a range of biosynthetic enzymes, including glutamine amidotransferase, anthranilate synthase component II, p-aminobenzoate, and glutamine-dependent carbamoyl-transferase (CPSase). Glutamine amidotransferase (GATase) domains can occur either as single polypeptides, as in glutamine amidotransferases, or as domains in a much larger multifunctional synthase protein, such as CPSase. On the basis of sequence similarities two classes of GATase domains have been identified, class-I (also known as trpG-type) and class-II (also known as purF-type). Class-I GATase domains are defined by a conserved catalytic triad consisting of cysteine, histidine and glutamate. Class-I GPTase domains have been found in the following enzymes, the second component of anthranilate synthase and 4-amino-4-deoxychorismate (ADC) synthase; CTP synthase; GMP synthase; glutamine-dependent carbamoyl-phosphate synthase; phosphoribosylformylglycinamidine synthase II; and the histidine amidotransferase hisH.

These signatures also detect peptidases belonging to MEROPS peptidase family C26 (gamma-glutamyl hydrolase), and non-peptidase homologs belonging to family C56 (PfpI endopeptidase) both of which are members of clan PC(C). Other members of family C56 are found in

Proteins where this domain is known:
PY00624    PY03479    PY04548    PY04781   


PF00118 - Cpn60_TCP1 (Pfam link)

Interpro entry IPR002423 : Chaperonin Cpn60/TCP-1 (Interpro link)

Pfam description:
This family includes members from the HSP60 chaperone family and the TCP-1 (T-complex protein) family.

Interpro description:

Partially folded polypeptide chains, either newly made by ribosomes or emerging from mature proteins unfolded by stress, run the risk of aggregating with one another to the detriment of the organism. Folding of newly synthesised polypeptides in the crowded cellular environment requires the assistance of molecular chaperone proteins, such as the large bacterial chaperonins GroEL and GroES.

GroEL and GroES prevent aggregation by encapsulating individual chains within the so-called 'Anfinsen cage' provided by the GroEL-GroES complex, where they can fold in isolation from one another. GroEL consists of two heptameric rings of identical ATPase subunits stacked back to back, containing a cage in each ring. Each subunit consists of three domains. The equatorial domain contains the nucleotide binding site and is connected by a flexible intermediate domain with the apical domain. The latter presents several hydrophobic amino-acid side chains at the top of the ring, orientated towards the cavity of the cage. These side chains are involved in binding either a partially folded polypeptide chain or a single molecule of GroES.

The assembly of proteins has been thought to be the sole result of properties inherent in the primary sequence of polypeptides themselves. In some cases, however, structural information from other protein molecules is required for correct folding and subsequent assembly into oligomers. These 'helper' molecules are referred to as molecular chaperones, a subfamily of which are the chaperonins, which include 10 kDa and 60 kDa proteins. These are found in abundance in prokaryotes, chloroplasts and mitochondria. They are required for normal cell growth (as demonstrated by the fact that no temperature sensitive mutants for the chaperonin genes can be found in the temperature range 20 to 43 degrees centigrade), and are stress-induced, acting to stabilise or protect disassembled polypeptides under heat-shock conditions.

The 10 kDa chaperonin (cpn10 - or groES in bacteria) exists as a ring-shaped oligomer of between 6 to 8 identical subunits, whereas the 60 kDa chaperonin (cpn60 - or groEL in bacteria) forms a structure comprising 2 stacked rings, each ring containing 7 identical subunits. These ring structures assemble by self-stimulation in the presence of Mg2+-ATP. The cpn10 and cpn60 oligomers also require Mg2+-ATP in order to interact to form a functional complex, although the mechanism of this interaction is as yet unknown. This chaperonin complex is essential for the correct folding and assembly of polypeptides into oligomeric structures, of which the chaperonins themselves are not a part. The binding of cpn10 to cpn60 inhibits the weak ATPase activity of cpn60.

The 60 kDa form of chaperonin is the immunodominant antigen of patients with Legionnaire's disease, and is thought to play a role in the protection of the Legionella bacteria from oxygen radicals within macrophages. This hypothesis is based on the finding that the cpn60 gene is upregulated in response to hydrogen peroxide, a source of oxygen radicals. Cpn60 has also been found to display strong antigenicity in many bacterial species, and has the potential for inducing immune protection against unrelated bacterial infections. The RuBisCO subunit binding protein (which has been implicated in the assembly of RuBisCO) and cpn60 have been found to be evolutionary homologues, the RuBisCO subunit binding protein having the C-terminal Gly-Gly-Met repeat found in all bacterial cpn60 sequences. Although the precise function of this repeat is unknown, it is thought to be important as it is also found in 70 kDa heat-shock proteins. The crystal structure of Escherichia coli GroEL has been resolved to 2.8A. The TCP-1 family of proteins act as molecular chaperones for tubulin, actin and probably some other proteins. They are weakly, but significantly, related to the cpn60/groEL chaperonin family.

Proteins where this domain is known:
PY00598    PY01933    PY02258    PY02937    PY03629    PY03932    PY04614    PY04757    PY04792    PY06837    PY07275    PY07337   


PF00120 - Gln-synt_C (Pfam link)

Interpro entry IPR008146 : Glutamine synthetase, catalytic region (Interpro link)

Interpro description:

Glutamine synthetase (GS) plays an essential role in the metabolism of nitrogen by catalyzing the condensation of glutamate and ammonia to form glutamine.

There seem to be three different classes of GS:

While the three classes of GS's are clearly structurally related, the sequence similarities are not so extensive.

Proteins where this domain is known:
PY04688   


PF00121 - TIM (Pfam link)

Interpro entry IPR000652 : Triosephosphate isomerase (Interpro link)

Interpro description:

Triosephosphate isomerase (TIM) is the glycolytic enzyme that catalyses the reversible interconversion of glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. TIM plays an important role in several metabolic pathways and is essential for efficient energy production. It is a dimer of identical subunits, each of which is made up of about 250 amino-acid residues. A glutamic acid residue is involved in the catalytic mechanism. The sequence around the active site residue is perfectly conserved in all known TIM's. Deficiencies in TIM are associated with haemolytic anaemia coupled with a progressive, severe neurological disorder.

Proteins where this domain is known:
PY00756    PY04306   


PF00122 - E1-E2_ATPase (Pfam link)

Interpro entry IPR008250 : ATPase, P-type, ATPase-associated region (Interpro link)

Interpro description:

ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

P-ATPases (sometime known as E1-E2 ATPases) are found in bacteria and in a number of eukaryotic plasma membranes and organelles. P-ATPases function to transport a variety of different compounds, including ions and phospholipids, across a membrane using ATP hydrolysis for energy. There are many different classes of P-ATPases, each of which transports a specific type of ion: H+, Na+, K+, Mg2+, Ca2+, Ag+ and Ag2+, Zn2+, Co2+, Pb2+, Ni2+, Cd2+, Cu+ and Cu2+. P-ATPases can be composed of one or two polypeptides, and can usually assume two main conformations called E1 and E2.

This entry represents an ATPase-associated region found in P-type ATPases. P-type (or E1-E2-type) ATPases that form an aspartyl phosphate intermediate in the course of ATP hydrolysis, can be divided into 4 major groups: (1) Ca2+-transporting ATPases; (2) Na+/K+- and gastric H+/K+-transporting ATPases; (3) plasma membrane H+-transporting ATPases (proton pumps) of plants, fungi and lower eukaryotes; and (4) all bacterial P-type ATPases, except the g2+-ATPase of Salmonella typhimurium, which is more similar to the eukaryotic sequences. However, great variety of sequence analysis methods results in diversity of classification.

More information about this protein can be found at Protein of the Month: ATP Synthases.

Proteins where this domain is known:
PY00066    PY01447    PY01853    PY03970    PY04047    PY04459    PY05776    PY06372    PY06483   

Proteins where this domain has been detected by our approach:
PY00968   


PF00125 - Histone (Pfam link)

Interpro entry IPR007125 : Histone core (Interpro link)

Interpro description:

The core histones together with some other DNA binding proteins appear to form a superfamily defined by a common fold and distant sequence similarities, . Some proteins contain local homology domains related to the histone fold.

Proteins where this domain is known:
PY00436    PY00496    PY00826    PY01762    PY02616    PY05073    PY05076   


PF00130 - C1_1 (Pfam link)

Interpro entry IPR002219 : Protein kinase C, phorbol ester/diacylglycerol binding (Interpro link)

Pfam description:
This domain is also known as the Protein kinase C conserved region 1 (C1) domain.

Interpro description:

Diacylglycerol (DAG) is an important second messenger. Phorbol esters (PE) are analogues of DAG and potent tumour promoters that cause a variety of physiological changes when administered to both cells and tissues. DAG activates a family of serine/threonine protein kinases, collectively known as protein kinase C (PKC). Phorbol esters can directly stimulate PKC. The N-terminal region of PKC, known as C1, has been shown to bind PE and DAG in a phospholipid and zinc-dependent fashion. The C1 region contains one or two copies (depending on the isozyme of PKC) of a cysteine-rich domain, which is about 50 amino-acid residues long, and which is essential for DAG/PE-binding. The DAG/PE-binding domain binds two zinc ions; the ligands of these metal ions are probably the six cysteines and two histidines that are conserved in this domain.

Proteins where this domain has been detected by our approach:
PY00684   


PF00132 - Hexapep (Pfam link)

Interpro entry IPR001451 : Bacterial transferase hexapeptide repeat (Interpro link)

Interpro description:

A variety of bacterial transferases contain a repeat structure composed of tandem repeats of a [LIV]-G-X(4) hexapeptide, which, in the tertiary structure of LpxA (UDP N-acetylglucosamine acyltransferase), has been shown to form a left-handed parallel beta helix. A number of different transferase protein families contain this repeat, such as galactoside acetyltransferase-like proteins, the gamma-class of carbonic anhydrases, and tetrahydrodipicolinate-N-succinlytransferases (DapD), the latter containing an extra N-terminal 3-helical domain.

Proteins where this domain is known:
PY02019    PY02047    PY05717    PY05992   


PF00133 - tRNA-synt_1 (Pfam link)

Interpro entry IPR002300 : Aminoacyl-tRNA synthetase, class Ia (Interpro link)

Pfam description:
Other tRNA synthetase sub-families are too dissimilar to be included.

Interpro description:

The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

Proteins where this domain is known:
PY00514    PY01849    PY05778    PY07371   

Proteins where this domain has been detected by our approach:
PY02181   


PF00134 - Cyclin_N (Pfam link)

Interpro entry IPR006671 : (Interpro link)

Pfam description:
Cyclins regulate cyclin dependent kinases (CDKs). Swiss:P22674 is a Uracil-DNA glycosylase that is related to other cyclins. Cyclins contain two domains of similar all-alpha fold, of which this family corresponds with the N-terminal domain.

Interpro description:

Cyclins are eukaryotic proteins that play an active role in controlling nuclear cell division cycles, and regulate cyclin dependent kinases (CDKs). Cyclins, together with the p34 (cdc2) or cdk2 kinases, form the Maturation Promoting Factor (MPF). There are two main groups of cyclins, G1/S cyclins, which are essential for the control of the cell cycle at the G1/S (start) transition, and G2/M cyclins, which are essential for the control of the cell cycle at the G2/M (mitosis) transition. G2/M cyclins accumulate steadily during G2 and are abruptly destroyed as cells exit from mitosis (at the end of the M-phase). In most species, there are multiple forms of G1 and G2 cyclins. For example, in vertebrates, there are two G2 cyclins, A and B, and at least three G1 cyclins, C, D, and E.

Cyclin homologues have been found in various viruses, including Saimiriine herpesvirus 2 (Herpesvirus saimiri) and Human herpesvirus 8 (HHV-8) (Kaposi's sarcoma-associated herpesvirus). These viral homologues differ from their cellular counterparts in that the viral proteins have gained new functions and eliminated others to harness the cell and benefit the virus.

Cyclins contain two domains of similar all-alpha fold, of which this entry is associated with the N-terminal domain.

Proteins where this domain is known:
PY00225    PY05616   


PF00136 - DNA_pol_B (Pfam link)

Interpro entry IPR006134 : DNA-directed DNA polymerase, family B, conserved region (Interpro link)

Pfam description:
This region of DNA polymerase B appears to consist of more than one structural domain, possibly including elongation, DNA-binding and dNTP binding activities.

Interpro description:

DNA is the biological information that instructs cells how to exist in an ordered fashion: accurate replication is thus one of the most important events in the life cycle of a cell. This function is performed by DNA- directed DNA-polymerases by adding nucleotide triphosphate (dNTP) residues to the 5'-end of the growing chain of DNA, using a complementary DNA chain as a template. Small RNA molecules are generally used as primers for chain elongation, although terminal proteins may also be used for the de novo synthesis of a DNA chain. Even though there are 2 different methods of priming, these are mediated by 2 very similar polymerases classes, A and B, with similar methods of chain elongation. A number of DNA polymerases have been grouped under the designation of DNA polymerase family B. Six regions of similarity (numbered from I to VI) are found in all or a subset of the B family polymerases. The most conserved region (I) includes a conserved tetrapeptide with two aspartate residues. Its function is not yet known, however, it has been suggested that it may be involved in binding a magnesium ion. All sequences in the B family contain a characteristic DTDS motif, and possess many functional domains, including a 5'-3' elongation domain, a 3'-5' exonuclease domain, a DNA binding domain, and binding domains for both dNTP's and pyrophosphate.

This region of DNA polymerase B appears to consist of more than one structural domain, possibly including elongation, DNA-binding and dNTP binding activities.

Proteins where this domain is known:
PY00203    PY05353    PY06115   


PF00137 - ATP-synt_C (Pfam link)

Interpro entry IPR002379 : ATPase, F0/V0 complex, subunit C (Interpro link)

Interpro description:

ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

The F-ATPases (or F1F0-ATPases) and V-ATPases (or V1V0-ATPases) are each composed of two linked complexes: the F1 or V1 complex contains the catalytic core that synthesizes/hydrolyses ATP, and the F0 or V0 complex that forms the membrane-spanning pore. The F- and V-ATPases all contain rotary motors, one that drives proton translocation across the membrane and one that drives ATP synthesis/hydrolysis .

This entry represents subunit C (also called subunit 9, or proteolipid in F-ATPases, or the 16 kDa proteolipid in V-ATPases) found in the F0 or V0 complex of F- and V-ATPases, respectively. In F-ATPases, ten C subunits form an oligomeric ring that makes up the F0 rotor. The flux of protons through the ATPase channel drives the rotation of the C subunit ring, which in turn is coupled to the rotation of the F1 complex gamma subunit rotor due to the permanent binding between the gamma and epsilon subunits of F1 and the C subunit ring of F0. The sequential protonation and deprotonation of Asp61 of subunit C is coupled to the stepwise movement of the rotor.

In V-ATPases, there are three proteolipid subunits (c, c and cÂÂ) that form part of the proton-conducting pore, each containing a buried glutamic acid residue that is essential for proton transport, and together they form a hexameric ring spanning the membrane.

More information about this protein can be found at Protein of the Month: ATP Synthases.

Proteins where this domain is known:
PY03813    PY05899    PY06066   


PF00145 - DNA_methylase (Pfam link)

Interpro entry IPR001525 : C-5 cytosine-specific DNA methylase (Interpro link)

Interpro description:
C-5 cytosine-specific DNA methylases (C5 Mtase) are enzymes that specifically methylate the C-5 carbon of cytosines in DNA to produce C5-methylcytosine. In mammalian cells, cytosine-specific methyltransferases methylate certain CpG sequences, which are believed to modulate gene expression and cell differentiation. In bacteria, these enzymes are a component of restriction-modification systems and serve as valuable tools for the manipulation of DNA. The structure of HhaI methyltransferase (M.HhaI) has been resolved to 2.5 A: the molecule folds into 2 domains - a larger catalytic domain containing catalytic and cofactor binding sites, and a smaller DNA recognition domain.

Proteins where this domain is known:
PY02095   


PF00149 - Metallophos (Pfam link)

Interpro entry IPR004843 : Metallophosphoesterase (Interpro link)

Pfam description:
This family includes a diverse range of phosphoesterases, including protein phosphoserine phosphatases, nucleotidases, sphingomyelin phosphodiesterases and 2\'-3\' cAMP phosphodiesterases as well as nucleases such as bacterial SbcD Swiss:P13457 or yeast MRE11 Swiss:P32829. The most conserved regions in this superfamily centre around the metal chelating residues.

Interpro description:

Protein phosphorylation plays a central role in the regulation of cell functions, causing the activation or inhibition of many enzymes involved in various biochemical pathways. Kinases and phosphatases are the enzymes responsible for this, and may themselves be subject to control through the action of hormones and growth factors. Serine/threonine (S/T) phosphatases catalyse the dephosphorylation of phosphoserine and phosphothreonine residues. In mammalian tissues four different types of PP have been identified and are known as PP1, PP2A, PP2B and PP2C. Except for PP2C, these enzymes are evolutionary related. The catalytic regions of the proteins are well conserved and have a slow mutation rate, suggesting that major changes in these regions are highly detrimental.

The metallo-phosphoesterase motif is found in a large number of proteins invoved in phosphoryation. These include serine/threonine phosphatases, DNA polymerase, exonucleases, and other phosphatases.

Proteins where this domain is known:
PY00016    PY00448    PY00915    PY00926    PY02284    PY02315    PY02573    PY03645    PY04404    PY04559    PY04697    PY06177    PY06431    PY06605    PY07156   


PF00152 - tRNA-synt_2 (Pfam link)

Interpro entry IPR004364 : Aminoacyl-tRNA synthetase, class II (D, K and N) (Interpro link)

Interpro description:

The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

This entry includes the asparagine, aspartic acid and lysine tRNA synthetases.

Proteins where this domain is known:
PY00115    PY01511    PY01996    PY02504    PY03253    PY05639    PY05658   


PF00153 - Mito_carr (Pfam link)

Interpro entry IPR018108 : (Interpro link)

Interpro description:

A variety of substrate carrier proteins that are involved in energy transfer are found in the inner mitochondrial membrane or integral to the membrane of other eukaryotic organelles such as the peroxisome. Such proteins include: ADP, ATP carrier protein (ADP/ATP translocase); 2-oxoglutarate/malate carrier protein; phosphate carrier protein; tricarboxylate transport protein (or citrate transport protein); Graves disease carrier protein; yeast mitochondrial proteins MRS3 and MRS4; yeast mitochondrial FAD carrier protein; and many others. Structurally, these proteins can consist of up to three tandem repeats of a domain of approximately 100 residues, each domain containing two transmembrane regions.

Proteins where this domain is known:
PY00463    PY01172    PY01494    PY02084    PY03882    PY04289    PY04545    PY04714    PY06734    PY07502    PY07641   


PF00155 - Aminotran_1_2 (Pfam link)

Interpro entry IPR004839 : Aminotransferase, class I and II (Interpro link)

Interpro description:
Aminotransferases share certain mechanistic features with other pyridoxal-phosphate dependent enzymes, such as the covalent binding of the pyridoxal-phosphate group to a lysine residue. On the basis of sequence similarity, these various enzymes can be grouped into class I and class II. This entry includes proteins from both subfamilies.

Proteins where this domain is known:
PY00385    PY00897    PY05459   


PF00156 - Pribosyltran (Pfam link)

Interpro entry IPR000836 : Phosphoribosyltransferase (Interpro link)

Pfam description:
This family includes a range of diverse phosphoribosyl transferase enzymes. This family includes: Adenine phosphoribosyltransferase EC:2.4.2.7, Swiss:P07672. Hypoxanthine-guanine-xanthine phosphoribosyltransferase Swiss:P51900. Hypoxanthine phosphoribosyltransferase EC:2.4.2.8 Swiss:P36766. Ribose-phosphate pyrophosphokinase i EC:2.7.6.1 Swiss:P09329. Amidophosphoribosyltransferase EC:2.4.2.14 Swiss:P00496. Orotate phosphoribosyltransferase EC:2.4.2.10 Swiss:P11172. Uracil phosphoribosyltransferase EC:2.4.2.9 Swiss:P25532. Xanthine-guanine phosphoribosyltransferase EC:2.4.2.22 Swiss:P00501.

Interpro description:

The name PRT comes from phosphoribosyltransferase (PRTase) enzymes, which carry out phosphoryl transfer reactions on 5-phosphoribosyl-alpha1-pyrophosphate PRPP, an activated form of ribose-5-phosphate. Members of Phosphoribosyltransferase (PRT) are catalytic and are regulatory proteins involved in nucleotide synthesis and salvage. This includes a range of diverse phosphoribosyl transferase enzymes including adenine phosphoribosyltransferase; hypoxanthine-guanine-xanthine phosphoribosyltransferase; hypoxanthine phosphoribosyltransferase; ribose-phosphate pyrophosphokinase; amidophosphoribosyltransferase; orotate phosphoribosyltransferase;uracil phosphoribosyltransferase; and xanthine-guanine phosphoribosyltransferase .

Not all PRT proteins are enzymes. For example, in some bacteria PRT proteins regulate the expression of purine and pyrimidine synthetic genes.

Members of PRT are defined by the protein fold and by a short 13-residue sequence motif, The motif consists of four hydrophobic amino acids, two acidic amino acids and seven amino acids of variable character, usually including glycine and threonine. The motif has been predicted to be a PRPP-binding site in advance of structural information. Apart of this motif, different PRT proteins have a low level of sequence identity, less than 15%. The PRT sequence motif is only found in PRTases from the nucleotide synthesis and salvage pathways. Other PRTases, from the tryptophan, histidine and nicotinamide synthetic and salvage pathways, lack the PRT sequence motif and appear to be unrelated to each other and unrelated to the PRT family.

Proteins where this domain is known:
PY00920    PY01467    PY03478   


PF00158 - Sigma54_activat (Pfam link)

Interpro entry IPR002078 : RNA polymerase sigma factor 54, interaction (Interpro link)

Interpro description:
Some bacterial regulatory proteins activate the expression of genes from promoters recognized by core RNA polymerase associated with the alternative sigma-54 factor. These have a conserved domain of about 230 residues involved in the ATP-dependent interaction with sigma-54. About half of the proteins in which this domain is found (algB, dcdT, flbD, hoxA, hupR1, hydG, ntrC, pgtA and pilR) belong to signal transduction two-component systems and possess a domain that can be phosphorylated by a sensor-kinase protein in their N-terminal section. Almost all of these proteins possess a helix-turn-helix DNA-binding domain in their C-terminal section. The domain which interacts with the sigma-54 factor has an ATPase activity. This may be required to promote a conformational change necessary for the interaction. The domain contains an atypical ATP-binding motif A (P-loop) as well as a form of motif B. The two ATP-binding motifs are located in the N-terminal section of the domain.

Proteins where this domain has been detected by our approach:
PY05628   


PF00160 - Pro_isomerase (Pfam link)

Interpro entry IPR002130 : Peptidyl-prolyl cis-trans isomerase, cyclophilin-type (Interpro link)

Pfam description:
The peptidyl-prolyl cis-trans isomerases, also known as cyclophilins, share this domain of about 109 amino acids. Cyclophilins have been found in all organisms studied so far and catalyse peptidyl-prolyl isomerisation during which the peptide bond preceding proline (the peptidyl-prolyl bond) is stabilised in the cis conformation. Mammalian cyclophilin A (CypA) is a major cellular target for the immunosuppressive drug cyclosporin A (CsA). Other roles for cyclophilins may include chaperone and cell signalling function.

Interpro description:

Cyclophilin is the major high-affinity binding protein in vertebrates for the immunosuppressive drug cyclosporin A (CSA), but is also found in other organisms. It exhibits a peptidyl-prolyl cis-trans isomerase activity (PPIase or rotamase). PPIase is an enzyme that accelerates protein folding by catalysing the cis-trans isomerisation of proline imidic peptide bonds in oligopeptides. It is probable that CSA mediates some of its effects via an forming a tight complex with cyclophilin that inhibits the phosphatase activity of calcineurin. Cyclophilin A is a cytosolic and highly abundant protein. The protein belongs to a family of isozymes, including cyclophilins B and C, and natural killer cell cyclophilin-related protein. Major isoforms have been found throughout the cell, including the ER, and some are even secreted. The sequences of the different forms of cyclophilin-type PPIases are well conserved.

  • Note: FKBP's, a family of proteins that bind the immunosuppressive drug FK506, are also PPIases, but their sequence is not at all related to that of cyclophilin.
  • Proteins where this domain is known:
    PY00017    PY00382    PY00693    PY00781    PY01524    PY01525    PY02154    PY02749    PY03668    PY03899    PY05631   


    PF00162 - PGK (Pfam link)

    Interpro entry IPR001576 : Phosphoglycerate kinase (Interpro link)

    Interpro description:

    Phosphoglycerate kinase (PGK) is an enzyme that catalyses the formation of ATP to ADP and vice versa. In the second step of the second phase in glycolysis, 1,3-diphosphoglycerate is converted to 3-phosphoglycerate, forming one molecule of ATP. If the reverse were to occur, one molecule of ADP would be formed. This reaction is essential in most cells for the generation of ATP in aerobes, for fermentation in anaerobes and for carbon fixation in plants.

    PGK is found in all living organisms and its sequence has been highly conserved throughout evolution. The enzyme exists as a monomer containing two nearly equal-sized domains that correspond to the N- and C-termini of the protein (the last 15 C-terminal residues loop back into the N-terminal domain). 3-phosphoglycerate (3-PG) binds to the N-terminal, while the nucleotide substrates, MgATP or MgADP, bind to the C-terminal domain of the enzyme. This extended two-domain structure is associated with large-scale 'hinge-bending' conformational changes, similar to those found in hexokinase. At the core of each domain is a 6-stranded parallel beta-sheet surrounded by alpha helices. Domain 1 has a parallel beta-sheet of six strands with an order of 342156, while domain 2 has a parallel beta-sheet of six strands with an order of 321456. Analysis of the reversible unfolding of yeast phosphoglycerate kinase leads to the conclusion that the two lobes are capable of folding independently, consistent with the presence of intermediates on the folding pathway with a single domain folded.

    Phosphoglycerate kinase (PGK) deficiency is associated with haemolytic anaemia and mental disorders in man.

    This entry represents the full PGK enzyme.

    Proteins where this domain is known:
    PY04547   


    PF00163 - Ribosomal_S4 (Pfam link)

    Interpro entry IPR001912 : Ribosomal protein S4 (Interpro link)

    Pfam description:
    This family includes small ribosomal subunit S9 from prokaryotes and S16 from metazoans. This domain is predicted to bind to ribosomal RNA. This domain is composed of four helices in the known structure. However the domain is discontinuous in sequence and the alignment for this family contains only the first three helices.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein S4 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S4 is known to bind directly to 16S ribosomal RNA. Mutations in S4 have been shown to increase translational error frequencies. S4 is a protein of 171 to 205 amino-acid residues (except for NAM9, which is much larger). The crystal structure of a bacterial S4 protein revealed a two domain molecule. The first domain is composed of four helices in the known structure. The second domain is in the middle of the first one and displays some structural homology with the ETS DNA binding domain. This family includes small ribosomal subunit S4 from prokaryotes and S9 from animals.

    Proteins where this domain is known:
    PY02191    PY04143   


    PF00164 - Ribosomal_S12 (Pfam link)

    Interpro entry IPR006032 : Ribosomal protein S12/S23 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein S12 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S12 is known to be involved in the translation initiation step. It is a very basic protein of 120 to 150 amino-acid residues. S12 belongs to a family of ribosomal proteins which are grouped on the basis of sequence similarities. This protein is known typically as S12 in bacteria, S23 in eukaryotes and as either S12 or S23 in the Archaea.

    Bacterial S12 molecules contain a conserved aspartic acid residue which undergoes a novel post-translational modification, beta-methylthiolation, to form the corresponding 3-methylthioaspartic acid.

    Proteins where this domain is known:
    PY05324    PY05754   


    PF00166 - Cpn10 (Pfam link)

    Interpro entry IPR001476 : Chaperonin Cpn10 (Interpro link)

    Pfam description:
    This family contains GroES and Gp31-like chaperonins. Gp31 is a functional co-chaperonin that is required for the folding and assembly of Gp23, a major capsid protein, during phage morphogenesis.

    Interpro description:

    The chaperonins are 'helper' molecules required for correct folding and subsequent assembly of some proteins . These are required for normal cell growth, and are stress-induced, acting to stabilise or protect disassembled polypeptides under heat-shock conditions. Type I chaperonins present in eubacteria, mitochondria and chloroplasts require the concerted action of 2 proteins, chaperonin 60 (cpn60) and chaperonin 10 (cpn10).

    The 10 kDa chaperonin (cpn10 - or groES in bacteria) exists as a ring-shaped oligomer of between six to eight identical subunits, while the 60 kDa chaperonin (cpn60 - or groEL in bacteria) forms a structure comprising 2 stacked rings, each ring containing 7 identical subunits. These ring structures assemble by self-stimulation in the presence of Mg2+-ATP. The central cavity of the cylindrical cpn60 tetradecamer provides as isolated environment for protein folding whilst cpn-10 binds to cpn-60 and synchronizes the release of the folded protein in an Mg2+-ATP dependent manner. The binding of cpn10 to cpn60 inhibits the weak ATPase activity of cpn60.

    Escherichia coli GroES has also been shown to bind ATP cooperatively, and with an affinity comparable to that of GroEL. Each GroEL subunit contains three structurally distinct domains: an apical, an intermediate and an equatorial domain. The apical domain contains the binding sites for both GroES and the unfolded protein substrate. The equatorial domain contains the ATP-binding site and most of the oligomeric contacts. The intermediate domain links the apical and equatorial domains and transfers allosteric information between them. The GroEL oligomer is a tetradecamer, cylindrically shaped, that is organised in two heptameric rings stacked back to back. Each GroEL ring contains a central cavity, known as the 'Anfinsen cage', that provides an isolated environment for protein folding. The identical 10 kDa subunits of GroES form a dome-like heptameric oligomer in solution. ATP binding to GroES may be important in charging the seven subunits of the interacting GroEL ring with ATP, to facilitate cooperative ATP binding and hydrolysis for substrate protein release.

    Proteins where this domain is known:
    PY01850    PY02750   


    PF00168 - C2 (Pfam link)

    Interpro entry IPR000008 : (Interpro link)

    Interpro description:
    The C2 domain is a Ca2+-dependent membrane-targeting module found in many cellular proteins involved in signal transduction or membrane trafficking. C2 domains are unique among membrane targeting domains in that they show wide range of lipid selectivity for the major components of cell membranes, including phosphatidylserine and phosphatidylcholine. This C2 domain is about 116 amino-acid residues and is located between the two copies of the C1 domain in Protein Kinase C (that bind phorbol esters and diacylglycerol) (see and the protein kinase catalytic domain (see. Regions with significant homology to the C2-domain have been found in many proteins. The C2 domain is thought to be involved in calcium-dependent phospholipid binding and in membrane targetting processes such as subcellular localisation.

    The 3D structure of the C2 domain of synaptotagmin has been reported, the domain forms an eight-stranded beta sandwich constructed around a conserved 4-stranded motif, designated a C2 key. Calcium binds in a cup-shaped depression formed by the N- and C-terminal loops of the C2-key motif. Structural analyses of several C2 domains have shown them to consist of similar ternary structures in which three Ca2+-binding loops are located at the end of an 8 stranded antiparallel beta sandwich.

    Proteins where this domain is known:
    PY01213    PY03705    PY04695    PY05745   


    PF00169 - PH (Pfam link)

    Interpro entry IPR001849 : (Interpro link)

    Pfam description:
    PH stands for pleckstrin homology.

    Interpro description:

    The 'pleckstrin homology' (PH) domain is a domain of about 100 residues that occurs in a wide range of proteins involved in intracellular signalling or as constituents of the cytoskeleton.

    The function of this domain is not clear, several putative functions have been suggested:

  • binding to the beta/gamma subunit of heterotrimeric G proteins,
  • binding to lipids, e.g. phosphatidylinositol-4,5-bisphosphate,
  • binding to phosphorylated Ser/Thr residues,
  • attachment to membranes by an unknown mechanism.
  • It is possible that different PH domains have totally different ligand requirements.

    The 3D structure of several PH domains has been determined. All known cases have a common structure consisting of two perpendicular anti-parallel beta sheets, followed by a C-terminal amphipathic helix. The loops connecting the beta-strands differ greatly in length, making the PH domain relatively difficult to detect. There are no totally invariant residues within the PH domain.

    Proteins reported to contain one more PH domains belong to the following families:

    Proteins where this domain is known:
    PY00029    PY00102    PY00609    PY01124    PY02203    PY04188   

    Proteins where this domain has been detected by our approach:
    PY00790   


    PF00173 - Cyt-b5 (Pfam link)

    Interpro entry IPR001199 : Cytochrome b5 (Interpro link)

    Pfam description:
    This family includes heme binding domains from a diverse range of proteins. This family also includes proteins that bind to steroids. The family includes progesterone receptors such as Swiss:O00264. Many members of this subfamily are membrane anchored by an N-terminal transmembrane alpha helix. This family also includes a domain in some chitin synthases. There is no known ligand for this domain in the chitin synthases.

    Interpro description:
    Cytochromes b5 are ubiquitous electron transport proteins found in animals, plants and yeasts. The microsomal and mitochondrial variants are membrane-bound, while those from erythrocytes and other animal tissues are water-soluble.

    The 3D structure of bovine cyt b5 is known, the fold belonging to the alpha+beta class, with 5 strands and 5 short helices forming a framework for supporting a central haem group. The cytochrome b5 domain is similar to that of a number of oxidoreductases, such as plant and fungal nitrate reductases, sulphite oxidase, yeast flavocytochrome b2 (L-lactate dehydrogenase) and plant cyt b5/acyl lipid desaturase fusion protein.

    Proteins where this domain is known:
    PY04472    PY04794    PY07000   


    PF00175 - NAD_binding_1 (Pfam link)

    Interpro entry IPR001433 : Oxidoreductase FAD/NAD(P)-binding (Interpro link)

    Pfam description:
    Xanthine dehydrogenases, that also bind FAD/NAD, have essentially no similarity.

    Interpro description:

    Bacterial ferredoxin-NADP+ reductase may be bound to the thylakoid membrane or anchored to the thylakoid-bound phycobilisomes. Chloroplast ferredoxin-NADP+ reductase may play a key role in regulating the relative amounts of cyclic and non-cyclic electron flow to meet the demands of the plant for ATP and reducing power. It is involved in the final step in the linear photosynthetic electron transport chain and has also been implicated in cyclic electron flow around photosystem I where its role would be to return electrons from ferredoxin to the cytochrome B-F complex.

    This domain is present in a variety of proteins that include, bacterial flavohemoprotein, mammalian NADH-cytochrome b5 reductase, eukaryotic NADPH-cytochrome P450 reductase, nitrate reductase from plants, nitric-oxide synthase, bacterial vanillate demethylase and others.

    Proteins where this domain is known:
    PY03083    PY05179    PY06273   


    PF00176 - SNF2_N (Pfam link)

    Interpro entry IPR000330 : SNF2-related (Interpro link)

    Pfam description:
    This domain is found in proteins involved in a variety of processes including transcription regulation (e.g., SNF2, STH1, brahma, MOT1) , DNA repair (e.g., ERCC6, RAD16, RAD5), DNA recombination (e.g., RAD54), and chromatin unwinding (e.g., ISWI) as well as a variety of other proteins with little functional information (e.g., lodestar, ETL1).

    Interpro description:

    This domain is found in proteins involved in a variety of processes including transcription regulation (e.g., SNF2, STH1, brahma, MOT1), DNA repair (e.g., ERCC6, RAD16, RAD5), DNA recombination (e.g., RAD54), and chromatin unwinding (e.g., ISWI) as well as a variety of other proteins with little functional information (e.g., lodestar, ETL1). SNF2 functions as the ATPase component of the SNF2/SWI multisubunit complex, which utilises energy derived from ATP hydrolysis to disrupt histone-DNA interactions, resulting in the increased accessibility of DNA to transcription factors.

    Proteins that contain this domain appear to be distantly related to the DEAX box helicases however no helicase activity has ever been demonstrated for these proteins.

    Proteins where this domain is known:
    PY00648    PY00810    PY01120    PY01180    PY01231    PY02297    PY02376    PY02949    PY03840    PY05642    PY05882   


    PF00177 - Ribosomal_S7 (Pfam link)

    Interpro entry IPR000235 : Ribosomal protein S7 (Interpro link)

    Pfam description:
    This family contains ribosomal protein S7 from prokaryotes and S5 from eukaryotes.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein S7 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S7 is known to bind directly to part of the 3'end of 16S ribosomal RNA. It belongs to a family of ribosomal proteins which have been grouped on the basis of sequence similarities. The structure for S7 is known.

    Proteins where this domain is known:
    PY06068   


    PF00179 - UQ_con (Pfam link)

    Interpro entry IPR000608 : Ubiquitin-conjugating enzyme, E2 (Interpro link)

    Pfam description:
    Proteins destined for proteasome-mediated degradation may be ubiquitinated. Ubiquitination follows conjugation of ubiquitin to a conserved cysteine residue of UBC homologues. TSG101 is one of several UBC homologues that lacks this active site cysteine.

    Interpro description:

    The post-translational attachment of ubiquitin to proteins (ubiquitinylation) alters the function, location or trafficking of a protein, or targets it to the 26S proteasome for degradation. Ubiquitinylation is an ATP-dependent process that involves the action of at least three enzymes: a ubiquitin-activating enzyme (E1), a ubiquitin-conjugating enzyme (E2), and a ubiquitin ligase (E3, which work sequentially in a cascade. The E1 enzyme mediates an ATP-dependent transfer of a thioester-linked ubiquitin molecule to a cysteine residue on the E2 enzyme. The E2 enzyme then either transfers the ubiquitin moiety directly to a substrate, or to an E3 ligase, which can also ubiquitinylate a substrate.

    There are several different E2 enzymes (over 30 in humans), which are broadly grouped into four classes, all of which have a core catalytic domain (containing the active site cysteine), and some of which have short N- and C-terminal amino acid extensions: class I enzymes consist of just the catalytic core domain (UBC), class II possess a UBC and a C-terminal extension, class III possess a UBC and an N-terminal extension, and class IV possess a UBC and both N- and C-terminal extensions. These extensions appear to be important for some subfamily function, including E2 localisation and protein-protein interactions. In addition, there are proteins with an E2-like fold that are devoid of catalytic activity, but which appear to assist in poly-ubiquitin chain formation.

    Proteins where this domain is known:
    PY00468    PY00590    PY00971    PY01211    PY01609    PY01792    PY03025    PY03157    PY03603    PY03604    PY03737    PY03738    PY05644    PY05886   


    PF00180 - Iso_dh (Pfam link)

    Interpro entry IPR001804 : Isocitrate/isopropylmalate dehydrogenase (Interpro link)

    Interpro description:

    Isocitrate dehydrogenase (IDH) is an important enzyme of carbohydrate metabolism which catalyses the oxidative decarboxylation of isocitrate into alpha-ketoglutarate. IDH is either dependent on NAD+ or on NADP+. In eukaryotes there are at least three isozymes of IDH: two are located in the mitochondrial matrix (one NAD+-dependent, the other NADP+-dependent), while the third one (also NADP+-dependent) is cytoplasmic. In Escherichia coli the activity of a NADP+-dependent form of the enzyme is controlled by the phosphorylation of a serine residue; the phosphorylated form of IDH is completely inactivated.

    3-isopropylmalate dehydrogenase (IMDH) catalyses the third step in the biosynthesis of leucine in bacteria and fungi, the oxidative decarboxylation of 3-isopropylmalate into 2-oxo-4-methylvalerate. Tartrate dehydrogenase catalyses the reduction of tartrate to oxaloglycolate.

    These enzymes are evolutionary related. The best conserved region of these enzymes is a glycine-rich stretch of residues located in the C-terminal section.

    Proteins where this domain is known:
    PY00592    PY02505   


    PF00181 - Ribosomal_L2 (Pfam link)

    Interpro entry IPR002171 : Ribosomal protein L2 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L2 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L2 is known to bind to the 23S rRNA and to have peptidyltransferase activity. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups:

    Proteins where this domain is known:
    PY04343    PY04762    PY05952   


    PF00183 - HSP90 (Pfam link)

    Interpro entry IPR001404 : Heat shock protein Hsp90 (Interpro link)

    Interpro description:

    Prokaryotes and eukaryotes respond to heat shock and other forms of environmental stress by inducing synthesis of heat-shock proteins (hsp). The 90 kDa heat shock protein, Hsp90, is one of the most abundant proteins in eukaryotic cells, comprising 1Â2% of cellular proteins under non-stress conditions. Its contribution to various cellular processes including signal transduction, protein folding, protein degradation and morphological evolution has been extensively studied. The full functional activity of Hsp90 is gained in concert with other co-chaperones, playing an important role in the folding of newly synthesised proteins and stabilisation and refolding of denatured proteins after stress. Apart from its co-chaperones, Hsp90 binds to an array of client proteins, where the co-chaperone requirement varies and depends on the actual client.

    The sequences of hsp90s show a distinctive domain structure, with a highly-conserved N-terminal domain separated from a conserved, acidic C-terminal domain by a highly-acidic, flexible linker region.

    Proteins where this domain is known:
    PY00130    PY00131    PY00582    PY00974    PY01906    PY05217   


    PF00185 - OTCace (Pfam link)

    Interpro entry IPR006131 : Aspartate/ornithine carbamoyltransferase, Asp/Orn-binding region (Interpro link)

    Interpro description:

    This family contains two related enzymes:

    1. Aspartate carbamoyltransferase (ATCase) catalyzes the conversion of aspartate and carbamoyl phosphate to carbamoylaspartate, the second step in the de novo biosynthesis of pyrimidine nucleotides. In prokaryotes ATCase consists of two subunits: a catalytic chain (gene pyrB) and a regulatory chain (gene pyrI), while in eukaryotes it is a domain in a multi- functional enzyme (called URA2 in yeast, rudimentary in Drosophila, and CAD in mammals) that also catalyzes other steps of the biosynthesis of pyrimidines.
    2. Ornithine carbamoyltransferase (OTCase) catalyzes the conversion of ornithine and carbamoyl phosphate to citrulline. In mammals this enzyme participates in the urea cycle and is located in the mitochondrial matrix. In prokaryotes and eukaryotic microorganisms it is involved in the biosynthesis of arginine. In some bacterial species it is also involved in the degradation of arginine (the arginine deaminase pathway).
    It has been shown that these two enzymes are evolutionary related. The predicted secondary structure of both enzymes are similar and there are some regions of sequence similarities. One of these regions includes three residues which have been shown, by crystallographic studies , to be implicated in binding the phosphoryl group of carbamoyl phosphate and is described by The carboxyl-terminal, aspartate/ornithine-binding domain is connected to the amino-terminal domain by two alpha-helices, which comprise a hinge between domains.

    Proteins where this domain is known:
    PY06210   


    PF00186 - DHFR_1 (Pfam link)

    Interpro entry IPR001796 : Dihydrofolate reductase region (Interpro link)

    Interpro description:

    Dihydrofolate reductase (DHFR) catalyses the NADPH-dependent reduction of dihydrofolate to tetrahydrofolate, an essential step in de novo synthesis both of glycine and of purines and deoxythymidine phosphate (the precursors of DNA synthesis), and important also in the conversion of deoxyuridine monophosphate to deoxythymidine monophosphate. Although DHFR is found ubiquitously in prokaryotes and eukaryotes, and is found in all dividing cells, maintaining levels of fully reduced folate coenzymes, the catabolic steps are still not well understood.

    Bacterial species possesses distinct DHFR enzymes (based on their pattern of binding diaminoheterocyclic molecules), but mammalian DHFRs are highly similar. The active site is situated in the N-terminal half of the sequence, which includes a conserved Pro-Trp dipeptide; the tryptophan has been shown to be involved in the binding of substrate by the enzyme. Its central role in DNA precursor synthesis, coupled with its inhibition by antagonists such as trimethoprim and methotrexate, which are used as anti-bacterial or anti-cancer agents, has made DHFR a target of anticancer chemotherapy. However, resistance has developed against some drugs, as a result of changes in DHFR itself.

    Proteins where this domain is known:
    PY04370   


    PF00198 - 2-oxoacid_dh (Pfam link)

    Interpro entry IPR001078 : 2-oxoacid dehydrogenase acyltransferase, catalytic domain (Interpro link)

    Pfam description:
    These proteins contain one to three copies of a lipoyl binding domain followed by the catalytic domain.

    Interpro description:
    This domain is found in the lipoamide acyltransferase component of the branched-chain alpha-keto acid dehydrogenase complex which catalyses the overall conversion of alpha-keto acids to acyl-CoA and carbon dioxide. It contains multiple copies of three enzymatic components: branched-chain alpha-keto acid decarboxylase (E1), lipoamide acyltransferase (E2) and lipoamide dehydrogenase (E3). The domain is also found in the dihydrolipoamide succinyltransferase component of the 2-oxoglutarate dehydrogenase complex These proteins contain one to three copies of a lipoyl binding domain followed by the catalytic domain.

    Proteins where this domain is known:
    PY00503    PY03521    PY04573   


    PF00202 - Aminotran_3 (Pfam link)

    Interpro entry IPR005814 : Aminotransferase class-III (Interpro link)

    Interpro description:

    Aminotransferases share certain mechanistic features with other pyridoxalphosphate-dependent enzymes, such as the covalent binding of the pyridoxalphosphate group to a lysine residue. On the basis of sequence similarity, these various enzymes can be grouped into subfamilies. One of these, called class-III, includes acetylornithine aminotransferase, which catalyzes the transfer of an amino group from acetylornithine to alpha-ketoglutarate, yielding N-acetyl-glutamic-5-semi-aldehyde and glutamic acid; ornithine aminotransferase, which catalyzes the transfer of an amino group from ornithine to alpha-ketoglutarate, yielding glutamic-5-semi-aldehyde and glutamic acid; omega-amino acid--pyruvate aminotransferase, which catalyzes transamination between a variety of omega-amino acids, mono- and diamines, and pyruvate; 4-aminobutyrate aminotransferase (GABA transaminase), which catalyzes the transfer of an amino group from GABA to alpha-ketoglutarate, yielding succinate semialdehyde and glutamic acid; DAPA aminotransferase, a bacterial enzyme (bioA), which catalyzes an intermediate step in the biosynthesis of biotin, the transamination of 7-keto-8-aminopelargonic acid to form 7,8-diaminopelargonic acid; 2,2-dialkylglycine decarboxylase, a Burkholderia cepacia (Pseudomonas cepacia) enzyme (dgdA) that catalyzes the decarboxylating amino transfer of 2,2-dialkylglycine and pyruvate to dialkyl ketone, alanine and carbon dioxide; glutamate-1-semialdehyde aminotransferase (GSA); Bacillus subtilis aminotransferases yhxA and yodT; Haemophilus influenzae aminotransferase HI0949; and Caenorhabditis elegans aminotransferase T01B11.2.

    Proteins where this domain is known:
    PY00104   


    PF00203 - Ribosomal_S19 (Pfam link)

    Interpro entry IPR002222 : Ribosomal protein S19/S15 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    The small subunit ribosomal proteins can be categorised as: primary binding proteins, which bind directly and independently to 16S rRNA; secondary binding proteins, which display no specific affinity for 16S rRNA, but its assembly is contingent upon the presence of one or more primary binding proteins; and tertiary binding proteins, which require the presence of one or more secondary binding proteins and sometimes other tertiary binding proteins. The small ribosomal subunit protein S19 contains 88-144 amino acid residues. In Escherichia coli, S19 is known to form a complex with S13 that binds strongly to 16S ribosomal RNA. Experimental evidence has revealed that S19 is moderately exposed on the ribosomal surface, and is designated a secondary rRNA binding protein. S19 belongs to a family of ribosomal proteins that includes: eubacterial S19; algal and plant chloroplast S19; cyanelle S19; archaebacterial S19; plant mitochondrial S19; and eukaryotic S15 ('rig' protein).

    Proteins where this domain is known:
    PY00188   


    PF00204 - DNA_gyraseB (Pfam link)

    Interpro entry IPR013506 : DNA topoisomerase, type IIA, subunit B, region 2 (Interpro link)

    Pfam description:
    This family represents the second domain of DNA gyrase B which has a ribosomal S5 domain 2-like fold. This family is structurally related to PF01119.

    Interpro description:

    DNA topoisomerases regulate the number of topological links between two DNA strands (i.e. change the number of superhelical turns) by catalysing transient single- or double-strand breaks, crossing the strands through one another, then resealing the breaks. These enzymes have several functions: to remove DNA supercoils during transcription and DNA replication; for strand breakage during recombination; for chromosome condensation; and to disentangle intertwined DNA during mitosis. DNA topoisomerases are divided into two classes: type I enzymes (topoisomerases I, III and V) break single-strand DNA, and type II enzymes (topoisomerases II, IV and VI) break double-strand DNA.

    Type II topoisomerases are ATP-dependent enzymes, and can be subdivided according to their structure and reaction mechanisms: type IIA (topoisomerase II or gyrase, and topoisomerase IV) and type IIB (topoisomerase VI). These enzymes are responsible for relaxing supercoiled DNA as well as for introducing both negative and positive supercoils.

    Type IIA topoisomerases together manage chromosome integrity and topology in cells. Topoisomerase II (called gyrase in bacteria) primarily introduces negative supercoils into DNA. In bacteria, topoisomerase II consists of two polypeptide subunits, gyrA and gyrB, which form a heterotetramer: (BA)2. In most eukaryotes, topoisomerase II consists of a single polypeptide, where the N- and C-terminal regions correspond to gyrB and gyrA, respectively; this topoisomerase II forms a homodimer that is equivalent to the bacterial heterotetramer. There are four functional domains in topoisomerase II: domain 1 (N-terminal of gyrB) is an ATPase, domain 2 (C-terminal of gyrB) is responsible for subunit interactions, domain 3 (N-terminal of gyrA) is responsible for the breaking-rejoining function through its capacity to form protein-DNA bridges, and domain 4 (C-terminal of gyrA) is able to non-specifically bind DNA.

    Topoisomerase IV primarily decatenates DNA and relaxes positive supercoils, which is important in bacteria, where the circular chromosome becomes catenated, or linked, during replication. Topoisomerase IV consists of two polypeptide subunits, parE and parC, where parC is homologous to gyrA and parE is homologous to gyrB.

    This entry represents the second domain found in subunit B (gyrB and parE) of bacterial gyrase and topoisomerase IV, and the equivalent N-terminal region in eukaryotic topoisomerase II composed of a single polypeptide.

    More information about this protein can be found at Protein of the Month: DNA Topoisomerase.

    Proteins where this domain is known:
    PY03394    PY04024   


    PF00205 - TPP_enzyme_M (Pfam link)

    Interpro entry IPR012000 : Thiamine pyrophosphate enzyme, central region (Interpro link)

    Pfam description:
    The central domain of TPP enzymes contains a 2-fold Rossman fold.

    Interpro description:

    A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cofactor. It has been shown that some of these enzymes are structurally related. This central domain of TPP enzymes contains a 2-fold Rossman fold.

    Proteins where this domain is known:
    PY05696   


    PF00206 - Lyase_1 (Pfam link)

    Interpro entry IPR000362 : Fumarate lyase (Interpro link)

    Interpro description:

    A number of enzymes, belonging to the lyase class, for which fumarate is a substrate, have been shown to share a short conserved sequence around a methionine which is probably involved in the catalytic activity of this type of enzymes. The following are examples of members of this family:

  • P32427 (PCAB_PSEPU): 3-carboxymuconate lactonizing enzyme,(3-carboxy-cis,cis-muconate cycloisomerase), an enzyme involved in aromatic acids catabolism.
  • P24057 (CRD1_ANAPL): Delta-crystallin shares around 90% sequence identity with arginosuccinate lyase, showing that it is an example of a 'hijacked' enzyme - accumulated mutations have, however, rendered the protein enzymatically inactive.
  • P05042 (FUMC_ECOLI): Class I Fumarase enzyme,(fumarate hydratase), which catalyzes the reversible hydration of fumarate to L-malate. Class I enzymes are thermolabile dimeric enzymes (as for example: Escherichia coli fumC).
  • P04424 (ARLY_HUMAN): Arginosuccinase,(argininosuccinate lyase), which catalyzes the formation of arginine and fumarate from argininosuccinate, the last step in the biosynthesis of arginine.
  • P04422 (ASPA_ECOLI): Aspartate ammonia-lyase,(aspartase), which catalyzes the reversible conversion of aspartate to fumarate and ammonia. This reaction is analogous to that catalyzed by fumarase, except that ammonia rather than water is involved in the trans-elimination reaction.
  • P00923 (FUMA_ECOLI): class II Fumarase enzyme, are thermostable and tetrameric and are found in prokaryotes (as for example: E. coli fumA and fumB) as well as in eukaryotes. The sequence of the two classes of fumarases are not closely related.
  • P25739 (PUR8_ECOLI): Adenylosuccinase,(adenylosuccinate lyase), which catalyzes the eighth step in the de novo biosynthesis of purines, the formation of 5'-phosphoribosyl-5-amino-4-imidazolecarboxamide and fumarate from 1-(5- phosphoribosyl)-4-(N-succino-carboxamide). That enzyme can also catalyze the formation of fumarate and AMP from adenylosuccinate.
  • Proteins where this domain is known:
    PY05969    PY07545   


    PF00208 - ELFV_dehydrog (Pfam link)

    Interpro entry IPR006096 : Glutamate/phenylalanine/leucine/valine dehydrogenase, C-terminal (Interpro link)

    Interpro description:

    Glutamate, leucine, phenylalanine and valine dehydrogenases are structurally and functionally related. They contain a Gly-rich region containing a conserved Lys residue, which has been implicated in the catalytic activity, in each case a reversible oxidative deamination reaction.

    Glutamate dehydrogenases (GluDH) are enzymes that catalyse the NAD- and/or NADP-dependent reversible deamination of L-glutamate into alpha-ketoglutarate. GluDH isozymes are generally involved with either ammonia assimilation or glutamate catabolism. Two separate enzymes are present in yeasts: the NADP-dependent enzyme, which catalyses the amination of alpha-ketoglutarate to L-glutamate; and the NAD-dependent enzyme, which catalyses the reverse reaction - this form links the L-amino acids with the Krebs cycle, which provides a major pathway for metabolic interconversion of alpha-amino acids and alpha- keto acids.

    Leucine dehydrogenase (LeuDH) is a NAD-dependent enzyme that catalyses the reversible deamination of leucine and several other aliphatic amino acids to their keto analogues. Each subunit of this octameric enzyme from Bacillus sphaericus contains 364 amino acids and folds into two domains, separated by a deep cleft. The nicotinamide ring of the NAD+ cofactor binds deep in this cleft, which is thought to close during the hydride transfer step of the catalytic cycle.

    Phenylalanine dehydrogenase (PheDH) is na NAD-dependent enzyme that catalyses the reversible deamidation of L-phenylalanine into phenyl-pyruvate.

    Valine dehydrogenase (ValDH) is an NADP-dependent enzyme that catalyses the reversible deamidation of L-valine into 3-methyl-2-oxobutanoate.

    This entry represents the C-terminal domain of these proteins.

    Proteins where this domain is known:
    PY01264    PY03701    PY04261   


    PF00211 - Guanylate_cyc (Pfam link)

    Interpro entry IPR001054 : Adenylyl cyclase class-3/4/guanylyl cyclase (Interpro link)

    Interpro description:

    Guanylate cyclases catalyse the formation of cyclic GMP (cGMP) from GTP. cGMP acts as an intracellular messenger, activating cGMP-dependent kinases and regulating cGMP-sensitive ion channels. The role of cGMP as a second messenger in vascular smooth muscle relaxation and retinal photo-transduction is well established. Guanylate cyclase is found both in the soluble and particulate fractions of eukaryotic cells. The soluble and plasma membrane-bound forms differ in structure, regulation and other properties. Most currently known plasma membrane-bound forms are receptors for small polypeptides. The soluble forms of guanylate cyclase are cytoplasmic heterodimers having alpha and beta subunits.

    In all characterised eukaryote guanylyl- and adenylyl cyclases, cyclic nucleotide synthesis is carried out by the conserved class III cyclase domain.

    Proteins where this domain is known:
    PY00952    PY02999    PY04450    PY04451    PY07180   


    PF00213 - OSCP (Pfam link)

    Interpro entry IPR000711 : ATPase, F1 complex, OSCP/delta subunit (Interpro link)

    Pfam description:
    The ATP D subunit from E. coli is the same as the OSCP subunit which is this family. The ATP D subunit from metazoa are found in family Pfam:PF00401.

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    F-ATPases (also known as F1F0-ATPase, or H(+)-transporting two-sector ATPase) are composed of two linked complexes: the F1 ATPase complex is the catalytic core and is composed of 5 subunits (alpha, beta, gamma, delta, epsilon), while the F0 ATPase complex is the membrane-embedded proton channel that is composed of at least 3 subunits (A-C), nine in mitochondria (A-G, F6, F8). Both the F1 and F0 complexes are rotary motors that are coupled back-to-back. In the F1 complex, the central gamma subunit forms the rotor inside the cylinder made of the alpha(3)beta(3) subunits, while in the F0 complex, the ring-shaped C subunits forms the rotor. The two rotors rotate in opposite directions, but the F0 rotor is usually stronger, using the force from the proton gradient to push the F1 rotor in reverse in order to drive ATP synthesis . These ATPases can also work in reverse to hydrolyse ATP to create a proton gradient.

    This family represents subunits called delta in bacterial and chloroplast ATPase, or OSCP (oligomycin sensitivity conferral protein) in mitochondrial ATPase (note that in mitochondria there is a different delta subunit). The OSCP/delta subunit appears to be part of the peripheral stalk that holds the F1 complex alpha3beta3 catalytic core stationary against the torque of the rotating central stalk, and links subunit A of the F0 complex with the F1 complex. In mitochondria, the peripheral stalk consists of OSCP, as well as F0 components F6, B and D. In bacteria and chloroplasts the peripheral stalks have different subunit compositions: delta and two copies of F0 component B (bacteria), or delta and F0 components B and BÂ (chloroplasts), .

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY03110   


    PF00215 - OMPdecase (Pfam link)

    Interpro entry IPR001754 : Orotidine 5'-phosphate decarboxylase, core (Interpro link)

    Pfam description:
    This family includes Orotidine 5\'-phosphate decarboxylase enzymes EC:4.1.1.23 that are involved in the final step of pyrimidine biosynthesis. The family also includes enzymes such as hexulose-6-phosphate synthase. This family appears to be distantly related to Pfam:PF00834.

    Interpro description:

    Orotidine 5'-phosphate decarboxylase (OMPdecase) catalyses the last step in the de novo biosynthesis of pyrimidines, the decarboxylation of OMP into UMP. In higher eukaryotes OMPdecase is part, with orotate phosphoribosyltransferase, of a bifunctional enzyme, while the prokaryotic and fungal OMPdecases are monofunctional protein.

    Some parts of the sequence of OMPdecase are well conserved across species. The best conserved region is located in the N-terminal half of OMPdecases and is centred around a lysine residue which is essential for the catalytic function of the enzyme.

    Proteins where this domain is known:
    PY01515   


    PF00224 - PK (Pfam link)

    Interpro entry IPR015793 : Pyruvate kinase, barrel (Interpro link)

    Pfam description:
    This domain of the is actually a small beta-barrel domain nested within a larger TIM barrel. The active site is found in a cleft between the two domains.

    Interpro description:

    Pyruvate kinase (PK) catalyses the final step in glycolysis, the conversion of phosphoenolpyruvate to pyruvate with concomitant phosphorylation of ADP to ATP:

     ADP + phosphoenolpyruvate = ATP + pyruvate 

    The enzyme, which is found in all living organisms, requires both magnesium and potassium ions for its activity. In vertebrates, there are four tissue-specific isozymes: L (liver), R (red cells), M1 (muscle, heart and brain), and M2 (early foetal tissue). In plants, PK exists as cytoplasmic and plastid isozymes, while most bacteria and lower eukaryotes have one form, except in certain bacteria, such as Escherichia coli, that have two isozymes. All isozymes appear to be tetramers of identical subunits of ~500 residues.

    PK helps control the rate of glycolysis, along with phosphofructokinase and hexokinase. PK possesses allosteric sites for numerous effectors, yet the isozymes respond differently, in keeping with their different tissue distributions. The activity of L-type (liver) PK is increased by fructose-1,6-bisphosphate (F1,6BP) and lowered by ATP and alanine (gluconeogenic precursor), therefore when glucose levels are high, glycolysis is promoted, and when levels are low, gluconeogenesis is promoted. L-type PK is also hormonally regulated, being activated by insulin and inhibited by glucagon, which covalently modifies the PK enzyme. M1-type (muscle, brain) PK is inhibited by ATP, but F1,6BP and alanine have no effect, which correlates with the function of muscle and brain, as opposed to the liver.

    The structure of several pyruvate kinases from various organisms have been determined. The protein comprises three-four domains: a small N-terminal helical domain (absent in bacterial PK), a beta/alpha-barrel domain, a beta-barrel domain (inserted within the beta/alpha-barrel domain), and a 3-layer alpha/beta/alpha sandwich domain.

    This entry represents the two barrel domains, the beta/alpha-barrel, and the beta-barrel inserted within it.

    Proteins where this domain is known:
    PY03879    PY04645   


    PF00225 - Kinesin (Pfam link)

    Interpro entry IPR001752 : Kinesin, motor region (Interpro link)

    Interpro description:

    Kinesin is a microtubule-associated force-producing protein that may play a role in organelle transport. The kinesin motor activity is directed toward the microtubule's plus end. Kinesin is an oligomeric complex composed of two heavy chains and two light chains. The maintenance of the quaternary structure does not require interchain disulphide bonds.

    The heavy chain is composed of three structural domains: a large globular N-terminal domain which is responsible for the motor activity of kinesin (it is known to hydrolyse ATP, to bind and move on microtubules), a central alpha-helical coiled coil domain that mediates the heavy chain dimerisation; and a small globular C-terminal domain which interacts with other proteins (such as the kinesin light chains), vesicles and membranous organelles.

    A number of proteins have been recently found that contain a domain similar to that of the kinesin 'motor' domain:

    The kinesin motor domain is located in the N-terminal part of most of the above proteins, with the exception of KAR3, klpA, and ncd where it is located in the C-terminal section.

    The kinesin motor domain contains about 330 amino acids. An ATP-binding motif of type A is found near position 80 to 90, the C-terminal half of the domain is involved in microtubule-binding.

    Proteins where this domain is known:
    PY00879    PY00972    PY02372    PY02427    PY02733    PY02867    PY03054    PY03174    PY05256    PY05257    PY06543    PY06701    PY07317   


    PF00226 - DnaJ (Pfam link)

    Interpro entry IPR001623 : Heat shock protein DnaJ, N-terminal (Interpro link)

    Pfam description:
    DnaJ domains (J-domains) are associated with hsp70 heat-shock system and it is thought that this domain mediates the interaction. DnaJ-domain is therefore part of a chaperone (protein folding) system. The T-antigens, although not in Prosite are confirmed as DnaJ containing domains from literature.

    Interpro description:

    The prokaryotic heat shock protein DnaJ interacts with the chaperone hsp70-like DnaK protein. Structurally, the DnaJ protein consists of an N-terminal conserved domain (called 'J' domain) of about 70 amino acids, a glycine-rich region ('G' domain') of about 30 residues, a central domain containing four repeats of a CXXCXGXG motif ('CRR' domain) and a C-terminal region of 120 to 170 residues.

    Such a structure is shown in the following schematic representation:

    It is thought that the 'J' domain of DnaJ mediates the interaction with the dnaK protein and consists of four helices, the second of which has a charged surface that includes at least one pair of basic residues that are essential for interaction with the ATPase domain of Hsp70. The J- and CRR-domains are found in many prokaryotic and eukaryotic proteins, either together or separately. In yeast, J-domains have been classified into 3 groups; the class III proteins are functionally distinct and do not appear to act as molecular chaperones.

    Proteins where this domain is known:
    PY00027    PY00038    PY00633    PY01224    PY01286    PY01558    PY01612    PY02476    PY02866    PY02986    PY03272    PY03538    PY03544    PY03688    PY03711    PY04093    PY04182    PY04223    PY04382    PY04500    PY04661    PY05339    PY05607    PY05609    PY05753    PY07104    PY07174   


    PF00227 - Proteasome (Pfam link)

    Interpro entry IPR001353 : 20S proteasome, A and B subunits (Interpro link)

    Interpro description:

    This group contains threonine peptidases and non-peptidase homologs belong to MEROPS peptidase family T1 (proteasome family, clan PB(T)). The family consists of the protease components of the archaeal and bacterial proteasomes and the alpha and beta subunits of the eukaryotic proteasome.

    ATP-dependent protease complexes are present in all three kingdoms of life, where they rid the cell of misfolded or damaged proteins and control the level of certain regulatory proteins. They include the proteasome in Eukaryotes, Archaea, and Actinomycetales and the HslVU (ClpQY, clpXP) complex in other eubacteria. Genes homologous to eubacterial HslV (ClpQ) and HslU (ClpY, clpX) have also been demonstrated in to be present in the genome of trypanosomatid protozoa..

    The proteasome (or macropain) is a multicatalytic proteinase complex that is involved in an ATP/ubiquitin-dependent non-lysosomal proteolytic pathway. In eukaryotes the proteasome is composed of about 28 distinct subunits, which form a highly ordered ring-shaped structure (20S ring) of about 700 kDa. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, A and B. In eukaryotic organisms there are up to seven different types of beta subunits, three of which may carry the N-terminal threonine residues that are the nucleophiles in catalysis, and show different specificities. The molecule is barrel-shaped, and the active sites are on the inner surfaces. Terminal apertures restrict access of substrates to the active sites.

    The prokaryotes the ATP-dependent proteasome is coded for by the heat-shock locus VU (HslVU). It consists of HslV, the protease (MEROPS peptidase subfamily T1B), and HslU the ATPase and chaperone belonging to the AAA/Clp/Hsp100 family. The crystal structure ofThermotoga maritima HslV has been determined to 2.1-A resolution. The structure of the dodecameric enzyme is well conserved compared to those from Escherichia coli and Haemophilus influenzae.

    Proteins where this domain is known:
    PY00152    PY00267    PY00806    PY02094    PY02351    PY02352    PY02685    PY03034    PY03212    PY03772    PY04190    PY04957    PY06176    PY06665    PY06767   


    PF00230 - MIP (Pfam link)

    Interpro entry IPR000425 : Major intrinsic protein (Interpro link)

    Pfam description:
    MIP (Major Intrinsic Protein) family proteins exhibit essentially two distinct types of channel properties: (1) specific water transport by the aquaporins, and (2) small neutral solutes transport, such as glycerol by the glycerol facilitators.

    Interpro description:

    A number of transmembrane (TM) channel proteins can be grouped together on the basis of sequence similarities.

    These include:

    MIP family proteins are thought to contain 6 TM domains. Sequence analysis suggests that the proteins may have arisen through tandem, intragenic duplication from an ancestral protein that contained 3 TM domains.

    Some of the proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates a ttached to lipids or proteins. Aquaporin-CHIP (Aquaporin 1) belo ngs to the Colton blood group system and is associated with Co(a/b) antigen.

    Proteins where this domain is known:
    PY05950   


    PF00231 - ATP-synt (Pfam link)

    Interpro entry IPR000131 : ATPase, F1 complex, gamma subunit (Interpro link)

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    F-ATPases (also known as F1F0-ATPase, or H(+)-transporting two-sector ATPase) are composed of two linked complexes: the F1 ATPase complex is the catalytic core and is composed of 5 subunits (alpha, beta, gamma, delta, epsilon), while the F0 ATPase complex is the membrane-embedded proton channel that is composed of at least 3 subunits (A-C), nine in mitochondria (A-G, F6, F8). Both the F1 and F0 complexes are rotary motors that are coupled back-to-back. In the F1 complex, the central gamma subunit forms the rotor inside the cylinder made of the alpha(3)beta(3) subunits, while in the F0 complex, the ring-shaped C subunits forms the rotor. The two rotors rotate in opposite directions, but the F0 rotor is usually stronger, using the force from the proton gradient to push the F1 rotor in reverse in order to drive ATP synthesis . These ATPases can also work in reverse to hydrolyse ATP to create a proton gradient.

    The ATPase F1 complex gamma subunit forms the central shaft that connects the F0 rotary motor to the F1 catalytic core. The gamma subunit functions as a rotary motor inside the cylinder formed by the alpha(3)beta(3) subunits in the F1 complex. The best-conserved region of the gamma subunit is its C-terminus, which seems to be essential for assembly and catalysis.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY02803   


    PF00233 - PDEase_I (Pfam link)

    Interpro entry IPR002073 : 3'5'-cyclic nucleotide phosphodiesterase (Interpro link)

    Interpro description:

    The cyclic nucleotide phosphodiesterases (PDE) comprise a group of enzymes that degrade the phosphodiester bond in the second messenger molecules cAMP and cGMP. They are divided into 11 families. They regulate the localisation, duration and amplitude of cyclic nucleotide signalling within subcellular domains. PDEs are therefore important for signal transduction.

    PDE enzymes are often targets for pharmacological inhibition due to their unique tissue distribution, structural properties, and functional properties. Inhibitors include: Roflumilast for chronic obstructive pulmonary disease and asthma, Sildenafil for erectile dysfunction and Cilostazol for peripheral arterial occlusive disease, amongst others.

    Retinal 3',5'-cGMP phosphodiesterase is located in photoreceptor outer segments: it is light activated, playing a pivotal role in signal transduction. In rod cells, PDE is oligomeric, comprising an alpha-, a beta- and 2 gamma-subunits, while in cones, PDE is a homodimer of alpha chains, which are associated with several smaller subunits. Both rod and cone PDEs catalyse the hydrolysis of cAMP or cGMP to the corresponding nucleoside 5' monophosphates, both enzymes also binding cGMP with high affinity. The cGMP-binding sites are located in the N-terminal half of the protein sequence, while the catalytic core resides in the C-terminal portion.

    Proteins where this domain is known:
    PY01829    PY01857    PY03619    PY03988    PY07430   


    PF00237 - Ribosomal_L22 (Pfam link)

    Interpro entry IPR001063 : Ribosomal protein L22/L17 (Interpro link)

    Pfam description:
    This family includes L22 from prokaryotes and chloroplasts and L17 from eukaryotes.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L22 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L22 is known to bind 23S rRNA. It belongs to a family of ribosomal proteins which includes: bacterial L22; algal and plant chloroplast L22 (in legumes L22 is encoded in the nucleus instead of the chloroplast); cyanelle L22; archaebacterial L22; mammalian L17; plant L17 and yeast YL17.

    Proteins where this domain is known:
    PY00934    PY05577   


    PF00238 - Ribosomal_L14 (Pfam link)

    Interpro entry IPR000218 : Ribosomal protein L14b/L23e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L14 is one of the proteins from the large ribosomal subunit. In eubacteria, L14 is known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins, which have been grouped on the basis of sequence similarities. Based on amino-acid sequence homology, it is predicted that ribosomal protein L14 is a member of a recently identified family of structurally related RNA-binding proteins. L14 is a protein of 119 to 137 amino-acid residues.

    Proteins where this domain is known:
    PY06065    PY07409   


    PF00240 - ubiquitin (Pfam link)

    Interpro entry IPR000626 : Ubiquitin (Interpro link)

    Pfam description:
    This family contains a number of ubiquitin-like proteins: SUMO (smt3 homologue) (see Swiss:Q02724), Nedd8 (see Swiss:P29595), Elongin B (see Swiss:Q15370), Rub1 (see Swiss:Q9SHE7), and Parkin (see Swiss:O60260). A number of them are thought to carry a distinctive five-residue motif termed the proteasome-interacting motif (PIM), which may have a biologically significant role in protein delivery to proteasomes and recruitment of proteasomes to transcription sites.

    Interpro description:

    Ubiquitinylation is an ATP-dependent process that involves the action of at least three enzymes: a ubiquitin-activating enzyme (E1), a ubiquitin-conjugating enzyme (E2), and a ubiquitin ligase (E3, which work sequentially in a cascade. There are many different E3 ligases, which are responsible for the type of ubiquitin chain formed, the specificity of the target protein, and the regulation of the ubiquitinylation process. Ubiquitinylation is an important regulatory tool that controls the concentration of key signalling proteins, such as those involved in cell cycle control, as well as removing misfolded, damaged or mutant proteins that could be harmful to the cell. Several ubiquitin-like molecules have been discovered, such as Ufm1, SUMO1, NEDD8, Rad23, Elongin B and Parkin, the latter being involved in Parkinson's disease.

    Ubiquitin is a protein of 76 amino acid residues, found in all eukaryotic cells and whose sequence is extremely well conserved from protozoan to vertebrates. Ubiquitin acts through its post-translational attachment (ubiquitinylation) to other proteins, where these modifications alter the function, location or trafficking of the protein, or targets it for destruction by the 26S proteasome. The terminal glycine in the C-terminal 4-residue tail of ubiquitin can form an isopeptide bond with a lysine residue in the target protein, or with a lysine in another ubiquitin molecule to form a ubiquitin chain that attaches itself to a target protein. Ubiquitin has seven lysine residues, any one of which can be used to link ubiquitin molecules together, resulting in different structures that alter the target protein in different ways. It appears that Lys(11)-, Lys(29) and Lys(48)-linked poly-ubiquitin chains target the protein to the proteasome for degradation, while mono-ubiquitinylated and Lys(6)- or Lys(63)-linked poly-ubiquitin chains signal reversible modifications in protein activity, location or trafficking. For example, Lys(63)-linked poly-ubiquitinylation is known to be involved in DNA damage tolerance, inflammatory response, protein trafficking and signal transduction through kinase activation. In addition, the length of the ubiquitin chain alters the fate of the target protein. Regulatory proteins such as transcription factors and histones are frequent targets of ubquitinylation.

    Proteins where this domain is known:
    PY00122    PY00183    PY00473    PY01513    PY03074    PY03337    PY03971    PY04045    PY05126   

    Proteins where this domain has been detected by our approach:
    PY03631    PY05144    PY06311    PY06672   


    PF00241 - Cofilin_ADF (Pfam link)

    Interpro entry IPR002108 : Actin-binding, cofilin/tropomyosin type (Interpro link)

    Pfam description:
    Severs actin filaments and binds to actin monomers.

    Interpro description:

    The actin-depolymerising factor homology (ADF-H) domain is an ~150-amino acid motif that is present in three phylogenetically distinct classes of eukaryotic actin-binding proteins:

    Although these proteins are biochemically distinct and play different roles in actin dynamics, they all appear to use the ADF-H domain for their interactions with actin.

    The ADF-H domain consists of a six-stranded mixed beta-sheet in which the four central strands (beta2-beta5) are anti-parallel and the two edge strands (beta1 and beta6) run parallel with the neighbouring strands. The sheet is surrounded by two alpha-helices on each side .

    Proteins where this domain is known:
    PY01091    PY04700   


    PF00244 - 14-3-3 (Pfam link)

    Interpro entry IPR000308 : 14-3-3 protein (Interpro link)

    Interpro description:

    The 14-3-3 proteins are a large family of approximately 30kDa acidic proteins which exist primarily as homo- and heterodimeric within all eukaryotic cells. There is a high degree of sequence identity and conservation between all the 14-3-3 isotypes, particularly in the regions which form the dimer interface or line the central ligand binding channel of the dimeric molecule. Each 14-3-3 protein sequence can be roughly divided into three sections: a divergent amino terminus, the conserved core region and a divergent carboxyl terminus. The conserved middle core region of the 14-3-3s encodes an amphipathic groove that forms the main functional domain, a cradle for interacting with client proteins. The monomer consists of nine helices organised in an antiparallel manner, forming an L-shaped structure. The interior of the L-structure is composed of four helices: H3 and H5, which contain many charged and polar amino acids, and H7 and H9, which contain hydrophobic amino acids. These four helices form the concave amphipathic groove that interacts with target peptides.

    14-3-3 proteins mainly bind proteins containing phosphothreonine or phosphoserine motifs however exceptions to this rule do exist. Extensive investigation of the 14-3-3 binding site of the mammalian serine/threonine kinase Raf-1 has produced a consensus sequence for 14-3-3-binding, RSxpSxP (in the single-letter amino-acid code, where x denotes any amino acid and p indicates that the next residue is phosphorylated). 14-3-3 proteins appear to effect intracellular signalling in one of three ways - by direct regulation of the catalytic activity of the bound protein, by regulating interactions between the bound protein and other molecules in the cell by sequestration or modification or by controlling the subcellular localisation of the bound ligand. Proteins appear to initially bind to a single dominant site and then subsequently to many, much weaker secondary interaction sites. The 14-3-3 dimer is capable of changing the conformation of its bound ligand whilst itself undergoing minimal structural alteration.

    Proteins where this domain is known:
    PY01841    PY05990    PY06707   


    PF00246 - Peptidase_M14 (Pfam link)

    Interpro entry IPR000834 : Peptidase M14, carboxypeptidase A (Interpro link)

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of sequences contain a diverse range of gene families, which include metallopeptidases belonging to MEROPS peptidase family M14 (carboxypeptidase A, clan MC), subfamilies M14A and M14B.

    The carboxypeptidase A family can be divided into two subfamilies: carboxypeptidase H (regulatory) and carboxypeptidase A (digestive). Members of the H family have longer C-termini than those of family A, and carboxypeptidase M (a member of the H family) is bound to the membrane by a glycosylphosphatidylinositol anchor, unlike the majority of the M14 family, which are soluble.

    The zinc ligands have been determined as two histidines and a glutamate, and the catalytic residue has been identified as a C-terminal glutamate, but these do not form the characteristic metalloprotease HEXXH motif. Members of the carboxypeptidase A family are synthesised as inactive molecules with propeptides that must be cleaved to activate the enzyme. Structural studies of carboxypeptidases A and B reveal the propeptide to exist as a globular domain, followed by an extended alpha-helix; this shields the catalytic site, without specifically binding to it, while the substrate-binding site is blocked by making specific contacts.

    Other examples of protein families in this entry include:

    Proteins where this domain is known:
    PY03811   


    PF00248 - Aldo_ket_red (Pfam link)

    Interpro entry IPR001395 : Aldo/keto reductase (Interpro link)

    Pfam description:
    This family includes a number of K+ ion channel beta chain regulatory domains - these are reported to have oxidoreductase activity.

    Interpro description:

    The aldo-keto reductase family includes a number of related monomeric NADPH-dependent oxidoreductases, such as aldehyde reductase, aldose reductase, prostaglandin F synthase, xylose reductase, rho crystallin, and many others. All possess a similar structure, with a beta-alpha-beta fold characteristic of nucleotide binding proteins. The fold comprises a parallel beta-8/alpha-8-barrel, which contains a novel NADP-binding motif. The binding site is located in a large, deep, elliptical pocket in the C-terminal end of the beta sheet, the substrate being bound in an extended conformation. The hydrophobic nature of the pocket favours aromatic and apolar substrates over highly polar ones.

    Binding of the NADPH coenzyme causes a massive conformational change, reorienting a loop, effectively locking the coenzyme in place. This binding is more similar to FAD- than to NAD(P)-binding oxidoreductases.

    Some proteins of this entry contain a K+ ion channel beta chain regulatory domain; these are reported to have oxidoreductase activity.

    Proteins where this domain is known:
    PY02780    PY03351   


    PF00249 - Myb_DNA-binding (Pfam link)

    Interpro entry IPR014778 : (Interpro link)

    Pfam description:
    This family contains the DNA binding domains from Myb proteins, as well as the SANT domain family.

    Interpro description:
    The retroviral oncogene v-myb, and its cellular counterpart c-myb, encode nuclear DNA-binding proteins. These belong to the SANT domain family that specifically recognize the sequence YAAC(G/T)G. In myb, one of the most conserved regions consisting of three tandem repeats has been shown to be involved in DNA-binding.

    Proteins where this domain is known:
    PY00077    PY01558    PY03148    PY03808    PY05789    PY06951   


    PF00252 - Ribosomal_L16 (Pfam link)

    Interpro entry IPR016180 : Ribosomal protein L10e/L16 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This entry represents a structural domain with an alpha/beta-hammerhead fold, where the beta-hammerhead motif is similar to that in barrel-sandwich hybrids. Domains of this structure can be found in ribosomal proteins L10e and L16.

    Proteins where this domain is known:
    PY00995    PY04761    PY07431   


    PF00253 - Ribosomal_S14 (Pfam link)

    Interpro entry IPR001209 : Ribosomal protein S14 (Interpro link)

    Pfam description:
    This family includes both ribosomal S14 from prokaryotes and S29 from eukaryotes.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    S14 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S14 is known to be required for the assembly of 30S particles and may also be responsible for determining the conformation of 16S rRNA at the A site. It belongs to a family of ribosomal proteins that include, bacterial, algal and plant chloroplast, yeast mitochondrial, cyanelle and archael, Methanococcus vannielii S14's, as well as yeast mitochondrial MRP2, yeast YS29A/B and mammalian S29.

    Proteins where this domain is known:
    PY02435   


    PF00254 - FKBP_C (Pfam link)

    Interpro entry IPR001179 : Peptidyl-prolyl cis-trans isomerase, FKBP-type (Interpro link)

    Interpro description:

    Synonym(s): Peptidylprolyl cis-trans isomerase

    FKBP-type peptidylprolyl isomerases in vertebrates, are receptors for the two immunosuppressants, FK506 and rapamycin. The drugs inhibit T cell proliferation by arresting two distinct cytoplasmic signal transmission pathways. Peptidylprolyl isomerases accelerate protein folding by catalysing the cis-trans isomerisation of proline imidic peptide bonds in oligopeptides. These proteins are found in a variety of organisms.

    Proteins where this domain is known:
    PY02360   


    PF00256 - L15 (Pfam link)

    Interpro entry IPR001196 : Ribosomal protein L15 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    L15 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L15 is known to bind the 23S rRNA. Ribosomal protein, L15 from bacteria and plant chloroplasts (nuclear-encoded) belong to this family. Vertebrate L27a, Tetrahymena thermophila L29 and fungal L27a (L29, CRP-1, CYH2) also are members of this group.

    Ribosomal L18E protein from a number of archebacteria show homology to both the eukaryotic L18 and eubacterial ribosomal protein L15, an observation which has been seen to substantiate the belief that archaea represent an evolutionary stage between bacteria and eukaryotes.

    Proteins where this domain is known:
    PY03375   

    Proteins where this domain has been detected by our approach:
    PY00688   


    PF00258 - Flavodoxin_1 (Pfam link)

    Interpro entry IPR008254 : Flavodoxin/nitric oxide synthase (Interpro link)

    Interpro description:

    This domain is found in a number of proteins including flavodoxin and nitric-oxide synthase. Flavodoxins are electron-transfer proteins that function in various electron transport systems. They bind one FMN molecule, which serves as a redox-active prosthetic group and are functionally interchangeable with ferredoxins. They have been isolated from prokaryotes, cyanobacteria, and some eukaryotic algae. Nitric oxide synthase produces nitric oxide from L-arginie and NADPH. Nitric oxide acts as a messenger molecule in the body.

    Proteins where this domain is known:
    PY05179    PY05984    PY06344   


    PF00261 - Tropomyosin (Pfam link)

    Interpro entry IPR000533 : (Interpro link)

    Interpro description:
    Tropomyosins, are a family of closely related proteins present in muscle and non-muscle cells. In striated muscle, tropomyosin mediate the interactions between the troponin complex and actin so as to regulate muscle contraction. The role of tropomyosin in smooth muscle and non-muscle tissues is not clear. Tropomyosin is an alpha-helical protein that forms a coiled-coil structure of 2 parallel helices containing 2 sets of 7 alternating actin binding sites. There are multiple cell-specific isoforms, created by differential splicing of the messenger RNA from one gene, but the proportions of the isoforms vary between different cell types. Muscle isoforms of tropomyosin are characterised by having 284 amino acid residues and a highly conserved N-terminal region, whereas non-muscle forms are generally smaller and are heterogeneous in their N-terminal region.

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E., Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of the first three letters of the genus; a space; the first letter of the species name; a space and an arabic number. In the event that two species names have identical designations, they are discriminated from one another by adding one or more letters (as necessary) to each species designation.

    The allergens in this family include allergens with the following designations: Met e 1.

    Proteins where this domain has been detected by our approach:
    PY00653   


    PF00266 - Aminotran_5 (Pfam link)

    Interpro entry IPR000192 : Aminotransferase, class V/Cysteine desulfurase (Interpro link)

    Pfam description:
    This domain is found in amino transferases, and other enzymes including cysteine desulphurase EC:4.4.1.-.

    Interpro description:
    Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these various enzymes can be grouped into subfamilies. This entry represents the class V aminotransferases and the related, though functionally distinct, cysteine desulfurases.

    Proteins where this domain is known:
    PY02096    PY06105   


    PF00268 - Ribonuc_red_sm (Pfam link)

    Interpro entry IPR000358 : Ribonucleotide reductase (Interpro link)

    Interpro description:

    Ribonucleotide reductase catalyzes the reductive synthesis of deoxyribonucleotides from their corresponding ribonucleotides:

     2'-deoxyribonucleoside diphosphate + oxidized thioredoxin + H2O = ribonucleoside diphosphate + reduced thioredoxin 
    It provides the precursors necessary for DNA synthesis. RNRs divide into three classes on the basis of their metallocofactor usage. Class I RNRs, found in eukaryotes, bacteria, bacteriophage and viruses, use a diiron-tyrosyl radical, Class II RNRs, found in bacteria, bacteriophage, algae and archaea, use coenzyme B12 (adenosylcobalamin, AdoCbl). Class III RNRs, found in anaerobic bacteria and bacteriophage, use an FeS cluster and S-adenosylmethionine to generate a glycyl radical. Many organisms have more than one class of RNR present in their genomes.

    Ribonucleotide reductase is an oligomeric enzyme composed of a large subunit (700 to 1000 residues) and a small subunit (300 to 400 residues) - class II RNRs are less complex, using the small molecule B12 in place of the small chain. The small chain binds two iron atoms (three Glu, one Asp, and two His are involved in metal binding) and contains an active site tyrosine radical. The regions of the sequence that contain the metal-binding residues and the active site tyrosine are conserved in ribonucleotide reductase small chain from prokaryotes, eukaryotes and viruses. We have selected one of these regions as a signature pattern. It contains the active site residue as well as a glutamate and a histidine involved in the binding of iron.

    Proteins where this domain is known:
    PY03671    PY07154   


    PF00270 - DEAD (Pfam link)

    Interpro entry IPR011545 : DNA/RNA helicase, DEAD/DEAH box type, N-terminal (Interpro link)

    Pfam description:
    Members of this family include the DEAD and DEAH box helicases. Helicases are involved in unwinding nucleic acids. The DEAD box helicases are involved in various aspects of RNA metabolism, including nuclear transcription, pre mRNA splicing, ribosome biogenesis, nucleocytoplasmic transport, translation, RNA decay and organellar gene expression.

    Interpro description:

    Members of this family include the DEAD and DEAH box helicases. Helicases are involved in unwinding nucleic acids. The DEAD box helicases are involved in various aspects of RNA metabolism, including nuclear transcription, pre mRNA splicing, ribosome biogenesis, nucleocytoplasmic transport, translation, RNA decay and organellar gene expression.

    Proteins where this domain is known:
    PY00037    PY00326    PY00412    PY00830    PY01108    PY01247    PY01271    PY01284    PY01366    PY01492    PY01861    PY01870    PY01902    PY02206    PY02224    PY02253    PY03503    PY03573    PY03587    PY03755    PY03767    PY03841    PY03896    PY04046    PY04051    PY04061    PY04360    PY04724    PY05239    PY05724    PY06447    PY06529    PY06632    PY06824    PY07206    PY07373    PY07601   

    Proteins where this domain has been detected by our approach:
    PY00461    PY00742    PY00835    PY04108    PY04199    PY05116    PY06080    PY07358   


    PF00271 - Helicase_C (Pfam link)

    Interpro entry IPR001650 : DNA/RNA helicase, C-terminal (Interpro link)

    Pfam description:
    The Prosite family is restricted to DEAD/H helicases, whereas this domain family is found in a wide variety of helicases and helicase related proteins. It may be that this is not an autonomously folding unit, but an integral part of the helicase.

    Interpro description:

    The domain, which defines this group of proteins is found in a wide variety of helicases and helicase related proteins. It may be that this is not an autonomously folding unit, but an integral part of the helicase.

    The eukaryotic translation initiation factor 4A (eIF4A) is a member of the DEA(D/H)-box RNA helicase family This is a diverse group of proteins that couples an ATPase activity to RNA binding and unwinding. The structure of the carboxyl-terminal domain of eIF4A has been determined to 1.75 A resolution; it has a parallel alpha-beta topology that superimposes, with minor variations, on the structures and conserved motifs of the equivalent domain in other, distantly related helicases.

    Proteins where this domain is known:
    PY00037    PY00326    PY00412    PY00461    PY00648    PY00742    PY00810    PY00830    PY00835    PY01120    PY01180    PY01247    PY01271    PY01284    PY01366    PY01492    PY01861    PY01870    PY01901    PY02206    PY02224    PY02253    PY02297    PY02376    PY02474    PY02949    PY03503    PY03573    PY03587    PY03686    PY03755    PY03767    PY03782    PY03840    PY03841    PY03896    PY04046    PY04051    PY04061    PY04074    PY04107    PY04197    PY04199    PY04360    PY04724    PY05239    PY05642    PY05724    PY05882    PY06080    PY06148    PY06417    PY06447    PY06529    PY06632    PY06824    PY07206    PY07254    PY07358    PY07373    PY07601   

    Proteins where this domain has been detected by our approach:
    PY01108   


    PF00274 - Glycolytic (Pfam link)

    Interpro entry IPR000741 : Fructose-bisphosphate aldolase, class-I (Interpro link)

    Interpro description:

    Fructose-bisphosphate aldolase is a glycolytic enzyme that catalyses the reversible aldol cleavage or condensation of fructose-1,6-bisphosphate into dihydroxyacetone-phosphate and glyceraldehyde 3-phosphate. There are two classes of fructose-bisphosphate aldolases with different catalytic mechanisms: class I enzymes are found in animals, do not require a metal ion, and are characterised by the formation of a Schiff base intermediate between a highly conserved active site lysine and a substrate carbonyl group, while the class II enzymes are produced in bacteria and fungi, and require an active-site divalent metal ion. This entry represents the class I enzymes.

    In vertebrates, three forms of this enzyme are found: aldolase A is expressed in muscle, aldolase B in liver, kidney, stomach and intestine, and aldolase C in brain, heart and ovary. The different isozymes have different catalytic functions: aldolases A and C are mainly involved in glycolysis, while aldolase B is involved in both glycolysis and gluconeogenesis. Defects in aldolase B result in hereditary fructose intolerance.

    Proteins where this domain is known:
    PY03709   


    PF00276 - Ribosomal_L23 (Pfam link)

    Interpro entry IPR013025 : (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This domain is found in both eukaryotic L25 and prokaryotic and eukaryotic L23 proteins.

    Proteins where this domain is known:
    PY02911    PY04600   


    PF00278 - Orn_DAP_Arg_deC (Pfam link)

    Interpro entry IPR000183 : Orn/DAP/Arg decarboxylase 2 (Interpro link)

    Pfam description:
    These pyridoxal-dependent decarboxylases act on ornithine, lysine, arginine and related substrates.

    Interpro description:
    These enzymes are collectively known as group IV decarboxylases. Pyridoxal-dependent decarboxylases acting on ornithine, lysine, arginine and related substrates can be classified into two different families on the basis of sequence similarities. Members of this family while most probably evolutionary related, do not share extensive regions of sequence similarities. The proteins contain a conserved lysine residue which is known, in mouse ODC, to be the site of attachment of the pyridoxal-phosphate group. The proteins also contain a stretch of three consecutive glycine residues and has been proposed to be part of a substrate- binding region.

    Proteins where this domain is known:
    PY04754   


    PF00281 - Ribosomal_L5 (Pfam link)

    Interpro entry IPR002132 : Ribosomal protein L5 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L5 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L5 is known to be involved in binding 5S RNA to the large ribosomal subunit. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups:

    L5 is a protein of about 180 amino-acid residues.

    Proteins where this domain is known:
    PY02461   


    PF00285 - Citrate_synt (Pfam link)

    Interpro entry IPR002020 : Citrate synthase-like (Interpro link)

    Interpro description:

    Citrate synthaseis a member of a small family of enzymes that can directly form a carbon-carbon bond without the presence of metal ion cofactors. It catalyses the first reaction in the Krebs' cycle, namely the conversion of oxaloacetate and acetyl-coenzyme A into citrate and coenzyme A. This reaction is important for energy generation and for carbon assimilation. The reaction proceeds via a non-covalently bound citryl-coenzyme A intermediate in a 2-step process (aldol-Claisen condensation followed by the hydrolysis of citryl-CoA).

    Citrate synthase enzymes are found in two distinct structural types: type I enzymes (found in eukaryotes, Gram-positive bacteria and archaea) form homodimers and have shorter sequences than type II enzymes, which are found in Gram-negative bacteria and are hexameric in structure. In both types, the monomer is composed of two domains: a large alpha-helical domain consisting of two structural repeats, where the second repeat is interrupted by a small alpha-helical domain. The cleft between these domains forms the active site, where both citrate and acetyl-coenzyme A bind. The enzyme undergoes a conformational change upon binding of the oxaloacetate ligand, whereby the active site cleft closes over in order to form the acetyl-CoA binding site. The energy required for domain closure comes from the interaction of the enzyme with the substrate. Type II enzymes possess an extra N-terminal beta-sheet domain, and some type II enzymes are allosterically inhibited by NADH.

    This entry represents types I and II citrate synthase enzymes, as well as the related enzymes 2-methylcitrate synthase and ATP citrate synthase. 2-methylcitrate synthase catalyses the conversion of oxaloacetate and propanoyl-CoA into (2R,3S)-2-hydroxybutane-1,2,3-tricarboxylate and coenzyme A. This enzyme is induced during bacterial growth on propionate, while type II hexameric citrate synthase is constitutive. ATP citrate synthase (also known as ATP citrate lyase) catalyses the MgATP-dependent, CoA-dependent cleavage of citrate into oxaloacetate and acetyl-CoA, a key step in the reductive tricarboxylic acid pathway of CO2 assimilation used by a variety of autotrophic bacteria and archaea to fix carbon dioxide. ATP citrate synthase is composed of two distinct subunits. In eukaryotes, ATP citrate synthase is a homotetramer of a single large polypeptide, and is used to produce cytosolic acetyl-CoA from mitochondrial produced citrate.

    Proteins where this domain is known:
    PY01660   


    PF00288 - GHMP_kinases_N (Pfam link)

    Interpro entry IPR006204 : GHMP kinase (Interpro link)

    Pfam description:
    This family includes homoserine kinases, galactokinases and mevalonate kinases.

    Interpro description:

    The galacto-, homoserine, mevalonate and phosphomevalonate kinases contain, in their N-terminal section, a conserved Gly/Ser-rich region which is probably involved in the binding of ATP. This group of kinases has been called 'GHMP' (from the first letter of their substrates).

    Proteins where this domain is known:
    PY04665   


    PF00289 - CPSase_L_chain (Pfam link)

    Interpro entry IPR005481 : Carbamoyl phosphate synthase, large subunit, N-terminal (Interpro link)

    Pfam description:
    Carbamoyl-phosphate synthase catalyses the ATP-dependent synthesis of carbamyl-phosphate from glutamine or ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis of arginine and/or pyrimidines. The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large chain. The small chain promotes the hydrolysis of glutamine to ammonia, which is used by the large chain to synthesise carbamoyl phosphate. See Pfam:PF00988. The small chain has a GATase domain in the carboxyl terminus. See Pfam:PF00117.

    Interpro description:

    Carbamoyl phosphate synthase (CPSase) is a heterodimeric enzyme composed of a small and a large subunit (with the exception of CPSase III, see below). CPSase catalyses the synthesis of carbamoyl phosphate from biocarbonate, ATP and glutamine or ammonia, and represents the first committed step in pyrimidine and arginine biosynthesis in prokaryotes and eukaryotes, and in the urea cycle in most terrestrial vertebrates. CPSase has three active sites, one in the small subunit and two in the large subunit. The small subunit contains the glutamine binding site and catalyses the hydrolysis of glutamine to glutamate and ammonia. The large subunit has two homologous carboxy phosphate domains, both of which have ATP-binding sites; however, the N-terminal carboxy phosphate domain catalyses the phosphorylation of biocarbonate, while the C-terminal domain catalyses the phosphorylation of the carbamate intermediate. The carboxy phosphate domain found duplicated in the large subunit of CPSase is also present as a single copy in the biotin-dependent enzymes acetyl-CoA carboxylase (ACC), propionyl-CoA carboxylase (PCCase), pyruvate carboxylase (PC) and urea carboxylase.

    Most prokaryotes carry one form of CPSase that participates in both arginine and pyrimidine biosynthesis, however certain bacteria can have separate forms. The large subunit in bacterial CPSase has four structural domains: the carboxy phosphate domain 1, the oligomerisation domain, the carbamoyl phosphate domain 2 and the allosteric domain. CPSase heterodimers from Escherichia coli contain two molecular tunnels: an ammonia tunnel and a carbamate tunnel. These inter-domain tunnels connect the three distinct active sites, and function as conduits for the transport of unstable reaction intermediates (ammonia and carbamate) between successive active sites. The catalytic mechanism of CPSase involves the diffusion of carbamate through the interior of the enzyme from the site of synthesis within the N-terminal domain of the large subunit to the site of phosphorylation within the C-terminal domain.

    Eukaryotes have two distinct forms of CPSase: a mitochondrial enzyme (CPSase I) that participates in both arginine biosynthesis and the urea cycle; and a cytosolic enzyme (CPSase II) involved in pyrimidine biosynthesis. CPSase II occurs as part of a multi-enzyme complex along with aspartate transcarbamoylase and dihydroorotase; this complex is referred to as the CAD protein. The hepatic expression of CPSase is transcriptionally regulated by glucocorticoids and/or cAMP. There is a third form of the enzyme, CPSase III, found in fish, which uses glutamine as a nitrogen source instead of ammonia. CPSase III is closely related to CPSase I, and is composed of a single polypeptide that may have arisen from gene fusion of the glutaminase and synthetase domains.

    This entry represents the N-terminal domain of the large subunit of carbamoyl phosphate synthase. This domain can also be found in certain other related proteins.

    Proteins where this domain is known:
    PY01695    PY04781    PY06257   


    PF00293 - NUDIX (Pfam link)

    Interpro entry IPR000086 : NUDIX hydrolase, core (Interpro link)

    Interpro description:
    MutT is a small bacterial protein (~12-15Kd) involved in the GO system responsible for removing an oxidatively damaged form of guanine (8-hydroxy- guanine or 7,8-dihydro-8-oxoguanine) from DNA and the nucleotide pool. 8-oxo-dGTP is inserted opposite dA and dC residues of template DNA with near equal efficiency, leading to A.T to G.C transversions. MutT specifically degrades 8-oxo-dGTP to the monophosphate, with the concomitant release of pyrophosphate. A short conserved N-terminal region of mutT (designated the MutT domain) is also found in a variety of other prokaryotic, viral and eukaryotic proteins.

    The generic name 'NUDIX hydrolases' (NUcleoside DIphosphate linked to some other moiety X) has been coined for this domain family. The family can be divided into a number of subgroups, of which MutT anti- mutagenic activity represents only one type; most of the rest hydrolyse diverse nucleoside diphosphate derivatives (including ADP-ribose, GDP- mannose, TDP-glucose, NADH, UDP-sugars, dNTP and NTP).

    Proteins where this domain is known:
    PY04487    PY07320   


    PF00294 - PfkB (Pfam link)

    Interpro entry IPR011611 : (Interpro link)

    Pfam description:
    This family includes a variety of carbohydrate and pyrimidine kinases.

    Interpro description:

    This entry includes a variety of carbohydrate and pyrimidine kinases. The family includes phosphomethylpyrimidine kinase. This enzyme is part of the Thiamine pyrophosphate (TPP) synthesis pathway, TPP is an essential cofactor for many enzymes.

    Proteins where this domain is known:
    PY01633   


    PF00297 - Ribosomal_L3 (Pfam link)

    Interpro entry IPR000597 : Ribosomal protein L3 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L3 is known to bind to the 23S rRNA and may participate in the formation of the peptidyltransferase centre of the ribosome. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities includes bacterial, red algal, cyanelle, mammalian, yeast and Arabidopsis thaliana L3 proteins; archaeal Haloarcula marismortui HmaL3 (HL1), and yeast mitochondrial YmL9.

    Proteins where this domain is known:
    PY04814    PY05881    PY07001   


    PF00298 - Ribosomal_L11 (Pfam link)

    Interpro entry IPR000911 : Ribosomal protein L11 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L11 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L11 is known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups bacteria, plant chloroplast, read algal chloroplast, cyanelle and archaeabacterial L11; and mammalian, plant and yeast L12 (YL15). L11 is a protein of 140 to 165 amino-acid residues. In E. coli, the C-terminal half of L11 has been shown to be in an extended and loosely folded conformation and is likely to be buried within the ribosomal structure.

    Proteins where this domain is known:
    PY01745    PY04344   


    PF00300 - PGAM (Pfam link)

    Interpro entry IPR013078 : (Interpro link)

    Pfam description:
    Y019_MYCTU and YK23_YEAST are not included in the Prosite entry. However these sequences are significantly similar and contain identical active site residues.

    Interpro description:

    Phosphoglycerate mutase (PGAM) and bisphosphoglycerate mutase (BPGM) are structurally related enzymes that catalyse reactions involving the transfer of phospho groups between the three carbon atoms of phosphoglycerate. Both enzymes can catalyse three different reactions with different specificities, the isomerization of 2-phosphoglycerate (2-PGA) to 3-phosphoglycerate (3-PGA) with 2,3-diphosphoglycerate (2,3-DPG) as the primer of the reaction, the synthesis of 2,3-DPG from 1,3-DPG with 3-PGA as a primer and the degradation of 2,3-DPG to 3-PGA (phosphataseactivity).

    In mammals, PGAM is a dimeric protein with two isoforms, the M (muscle) and B (brain) forms. In yeast, PGAM is a tetrameric protein.

    BPGM is a dimeric protein and is found mainly in erythrocytes where it plays a major role in regulating haemoglobin oxygen affinity as a consequence of controlling 2,3-DPG concentration. The catalytic mechanism of both PGAM and BPGM involves the formation of a phosphohistidine intermediate.

    A number of other proteins including, the bifunctional enzyme 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase that catalyses both the synthesis and the degradation of fructose-2,6-bisphosphate and bacterial alpha-ribazole-5'-phosphate phosphatase, which is involved in cobalamin biosynthesis, contain this domain.

    Proteins where this domain is known:
    PY00543    PY03959    PY07389   


    PF00303 - Thymidylat_synt (Pfam link)

    Interpro entry IPR000398 : Thymidylate synthase, C-terminal (Interpro link)

    Pfam description:
    Swiss:P28176 is not included as a member of this family, Although annotated as such there is no significant sequence similarity to other members.

    Interpro description:
    Thymidylate synthase catalyzes the reductive methylation of dUMP to dTMP with concomitant conversion of 5,10-methylenetetrahydrofolate to dihydrofolate:
     5,10-methylenetetrahydrofolate + dUMP = dihydrofolate + dTMP 
    This provides the sole de novo pathway for production of dTMP and is the only enzyme in folate metabolism in which the 5,10-methylenetetrahydrofolate is oxidised during one-carbon transfer. The enzyme is essential for regulating the balanced supply of the 4 DNA precursors in normal DNA replication: defects in the enzyme activity affecting the regulation process cause various biological and genetic abnormalities, such as thymineless death. The enzyme is an important target for certain chemotherapeutic drugs. Thymidylate synthase is an enzyme of about 30 to 35 Kd in most species except in protozoan and plants where it exists as a bifunctional enzyme that includes a dihydrofolate reductase domain. A cysteine residue is involved in the catalytic mechanism (it covalently binds the 5,6-dihydro-dUMP intermediate). The sequence around the active site of this enzyme is conserved from phages to vertebrates.

    Proteins where this domain is known:
    PY04370   


    PF00306 - ATP-synt_ab_C (Pfam link)

    Interpro entry IPR000793 : ATPase, F1/V1/A1 complex, alpha/beta subunit, C-terminal (Interpro link)

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    This entry represents the alpha and beta subunits found in the F1, V1, and A1 complexes of F-, V- and A-ATPases, respectively (sometimes called the A and B subunits in V- and A-ATPases). The F-ATPases (or F1F0-ATPases), V-ATPases (or V1V0-ATPases) and A-ATPases (or A1A0-ATPases) are composed of two linked complexes: the F1, V1 or A1 complex contains the catalytic core that synthesizes/hydrolyses ATP, and the F0, V0 or A0 complex that forms the membrane-spanning pore. The F-, V- and A-ATPases all contain rotary motors, one that drives proton translocation across the membrane and one that drives ATP synthesis/hydrolysis .

    In F-ATPases, there are three copies each of the alpha and beta subunits that form the catalytic core of the F1 complex, while the remaining F1 subunits (gamma, delta, epsilon) form part of the stalks. There is a substrate-binding site on each of the alpha and beta subunits, those on the beta subunits being catalytic, while those on the alpha subunits are regulatory. The alpha and beta subunits form a cylinder that is attached to the central stalk. The alpha/beta subunits undergo a sequence of conformational changes leading to the formation of ATP from ADP, which are induced by the rotation of the gamma subunit, itself is driven by the movement of protons through the F0 complex C subunit.

    In V- and A-ATPases, the alpha/A and beta/B subunits of the V1 or A1 complex are homologous to the alpha and beta subunits in the F1 complex of F-ATPases, except that the alpha subunit is catalytic and the beta subunit is regulatory.

    The alpha/A and beta/B subunits can each be divided into three regions, or domains, centred around the ATP-binding pocket, and based on structure and function, where the central region is the nucleotide-binding domain. This entry represents the C-terminal domain of the alpha/A/beta/B subunits, which forms a left-handed superhelix composed of 4-5 individual helices. The C-terminal domain can vary between the alpha and beta subunits, and between different ATPases .

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY00963    PY01556    PY05102    PY05971   


    PF00307 - CH (Pfam link)

    Interpro entry IPR001715 : (Interpro link)

    Pfam description:
    The CH domain is found in both cytoskeletal proteins and signal transduction proteins. The CH domain is involved in actin binding in some members of the family. However in calponins there is evidence that the CH domain is not involved in its actin binding activity. Most proteins have two copies of the CH domain, however some proteins such as calponin and Swiss:P15498 have only a single copy.

    Interpro description:

    The calponin homology domain (also known as CH-domain) is a superfamily of actin-binding domains found in both cytoskeletal proteins and signal transduction proteins. It comprises the following groups of actin-binding domains:

    A comprehensive review of proteins containing this type of actin-binding domains is given in.

    The CH domain is involved in actin binding in some members of the family. However in calponins there is evidence that the CH domain is not involved in its actin binding activity. Most proteins have two copies of the CH domain, however some proteins such as calponin and the human vav proto-oncogene have only a single copy. The structure of an example CH-domain has recently been solved.

    Proteins where this domain is known:
    PY06631   


    PF00310 - GATase_2 (Pfam link)

    Interpro entry IPR000583 : Glutamine amidotransferase, class-II (Interpro link)

    Interpro description:

    A large group of biosynthetic enzymes are able to catalyse the removal of the ammonia group from glutamine and then to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known as glutamine amidotransferase (GATase). The GATase domain exists either as a separate polypeptidic subunit or as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two classes of GATase domains have been identified, class-I (also known as trpG-type) and class-II (also known as purF-type). Enzymes containing Class-II GATase domains include amido phosphoribosyltransferase (glutamine phosphoribosylpyrophosphate amidotransferase), which catalyses the first step in purine biosynthesis (gene purF in bacteria, ADE4 in yeast); glucosamine--fructose-6-phosphate aminotransferase, which catalyses the formation of glucosamine 6-phosphate from fructose 6-phosphate and glutamine (gene glmS in Escherichia coli, nodM in Rhizobium, GFA1 in yeast); and asparagine synthetase (glutamine-hydrolizing), which is responsible for the synthesis of asparagine from aspartate and glutamine. A cysteine is present at the N-terminal extremity of the mature form of all these enzymes.

    This domain is found in a number of cysteine peptidases belonging to MEROPS peptidase family C44 and their non-peptidase homologs.

    Proteins where this domain is known:
    PY00101    PY02906    PY03719   


    PF00311 - PEPcase (Pfam link)

    Interpro entry IPR001449 : Phosphoenolpyruvate carboxylase (Interpro link)

    Interpro description:

    Phosphoenolpyruvate carboxylase (PEPCase), an enzyme found in all multicellular plants, catalyses the formation of oxaloacetate from phosphoenolpyruvate (PEP) and a hydrocarbonate ion. This reaction is harnessed by C4 plants to capture and concentrate carbon dioxide into the photosynthetic bundle sheath cells. It also plays a key role in the nitrogen fixation pathway in legume root nodules: here it functions in concert with glutamine, glutamate and asparagine synthetases and aspartate amido transferase, to synthesise aspartate and asparagine, the major nitrogen transport compounds in various amine-transporting plant species.

    PEPCase also plays an antipleurotic role in bacteria and plant cells, supplying oxaloacetate to the TCA cycle, which requires continuous input of C4 molecules in order to replenish the intermediates removed for amino acid biosynthesis. The C-terminus of the enzyme contains the active site that includes a conserved lysine residue, involved in substrate binding, and other conserved residues important for the catalytic mechanism.

    Proteins where this domain is known:
    PY00206   


    PF00312 - Ribosomal_S15 (Pfam link)

    Interpro entry IPR000589 : Ribosomal protein S15 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein S15 is one of the proteins from the small ribosomal subunit. In Escherichia coli, this protein binds to 16S ribosomal RNA and functions at early steps in ribosome assembly. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities,], groups bacterial and plant chloroplast S15; archaeal Haloarcula marismortui HmaS15 (HS11); yeast mitochondrial S28; and mammalian, yeast, Brugia pahangi and Wuchereria bancrofti S13. S15 is a protein of 80 to 250 amino-acid residues.

    Proteins where this domain is known:
    PY02800    PY03370    PY03929   


    PF00317 - Ribonuc_red_lgN (Pfam link)

    Interpro entry IPR013509 : Ribonucleotide reductase large subunit, N-terminal (Interpro link)

    Interpro description:

    Ribonucleotide reductase catalyzes the reductive synthesis of deoxyribonucleotides from their corresponding ribonucleotides. It provides the precursors necessary for DNA synthesis. RNRs divide into three classes on the basis of their metallocofactor usage. Class I RNRs, found in eukaryotes, bacteria, bacteriophage and viruses, use a diiron-tyrosyl radical, Class II RNRs, found in bacteria, bacteriophage, algae and archaea, use coenzyme B12 (adenosylcobalamin, AdoCbl). Class III RNRs, found in anaerobic bacteria and bacteriophage, use an FeS cluster and S-adenosylmethionine to generate a glycyl radical. Many organisms have more than one class of RNR present in their genomes.

    Ribonucleotide reductase is an oligomeric enzyme composed of a large subunit (700 to 1000 residues) and a small subunit (300 to 400 residues) - class II RNRs are less complex, using the small molecule B12 in place of the small chain.

    The reduction of ribonucleotides to deoxyribonucleotides involves the transfer of free radicals, the function of each metallocofactor is to generate an active site thiyl radical. This thiyl radical then initiates the nucleotide reduction process by hydrogen atom abstraction from the ribonucleotide. The radical-based reaction involves five cysteines: two of these are located at adjacent anti-parallel strands in a new type of ten-stranded alpha/beta-barrel; two others reside at the carboxyl end in a flexible arm; and the fifth, in a loop in the centre of the barrel, is positioned to initiate the radical reaction. There are several regions of similarity in the sequence of the large chain of prokaryotes, eukaryotes and viruses spread across 3 domains: an N-terminal domain common to the mammalian and bacterial enzymes; a C-terminal domain common to the mammalian and viral ribonucleotide reductases; and a central domain common to all three.

    Proteins where this domain is known:
    PY03473   


    PF00318 - Ribosomal_S2 (Pfam link)

    Interpro entry IPR001865 : Ribosomal protein S2 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal S2 proteins have been shown to belong to a family that includes 40S ribosomal subunit 40kDa proteins, putative laminin-binding proteins, NAB-1 protein and 29.3kDa protein from Haloarcula marismortui. The laminin-receptor proteins are thus predicted to be the eukaryotic homologue of the eubacterial S2 risosomal proteins.

    Proteins where this domain is known:
    PY06059   


    PF00326 - Peptidase_S9 (Pfam link)

    Interpro entry IPR001375 : Peptidase S9, prolyl oligopeptidase active site region (Interpro link)

    Interpro description:

    Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases.

    Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base. The geometric orientations of the catalytic residues are similar between families, despite different protein folds. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC).

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This domain covers the active site serine of the serine peptidases belonging to MEROPS peptidase family S9 (prolyl oligopeptidase family, clan SC). The protein fold of the peptidase domain for members of this family resembles that of serine carboxypeptidase D, the type example of clan SC. Examples of protein families containing this domain are:

    These proteins belong to MEROPS peptidase families S9A, S9B and S9C.

    Proteins where this domain is known:
    PY02677   


    PF00327 - Ribosomal_L30 (Pfam link)

    Interpro entry IPR000517 : Ribosomal protein L30p/L7e, N-terminal (Interpro link)

    Pfam description:
    This family includes prokaryotic L30 and eukaryotic L7.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L30 is one of the proteins from the large ribosomal subunit. L30 belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups bacteria and archaea L30, yeast mitochondrial L33, and Drosophila melanogaster, Dictyostelium discoideum (Slime mold), fungal and mammalian L7 ribosomal proteins. L30 from bacteria are small proteins of about 60 residues, those from archaea are proteins of about 150 residues, and eukaryotic L7 are proteins of about 250 to 270 residues.

    This entry represents the core domain of prokaryotic L30 and eukaryotic L7.

    Proteins where this domain is known:
    PY05756   


    PF00328 - Acid_phosphat_A (Pfam link)

    Interpro entry IPR000560 : Histidine acid phosphatase (Interpro link)

    Interpro description:

    Acid phosphatases are a heterogeneous group of proteins that hydrolyse phosphate esters, optimally at low pH. It has been shown that a number of acid phosphatases, from both prokaryotes and eukaryotes, share two regions of sequence similarity, each centred around a conserved histidine residue. These two histidines seem to be involved in the enzymes' catalytic mechanism. The first histidine is located in the N-terminal section and forms a phosphohistidine intermediate while the second is located in the C-terminal section and possibly acts as proton donor. Enzymes belonging to this family are called 'histidine acid phosphatases' and include:

    Proteins where this domain is known:
    PY01267    PY02066   


    PF00330 - Aconitase (Pfam link)

    Interpro entry IPR001030 : Aconitase/3-isopropylmalate dehydratase large subunit, alpha/beta/alpha (Interpro link)

    Interpro description:

    3-isopropylmalate dehydratase (or isopropylmalate isomerase; catalyses the stereo-specific isomerisation of 2-isopropylmalate and 3-isopropylmalate, via the formation of 2-isopropylmaleate. This enzyme performs the second step in the biosynthesis of leucine, and is present in most prokaryotes and many fungal species. The prokaryotic enzyme is a heterodimer composed of a large (LeuC) and small (LeuD) subunit, while the fungal form is a monomeric enzyme. Both forms of isopropylmalate are related and are part of the larger aconitase family. Aconitases are mostly monomeric proteins which share four domains in common and contain a single, labile [4Fe-4S] cluster. Three structural domains (1, 2 and 3) are tightly packed around the iron-sulphur cluster, while a fourth domain (4) forms a deep active-site cleft. The prokaryotic enzyme is encoded by two adjacent genes, leuC and leuD, corresponding to aconitase domains 1-3 and 4 respectively. LeuC does not bind an iron-sulphur cluster. It is thought that some prokaryotic isopropylamalate dehydrogenases can also function as homoaconitase converting cis-homoaconitate to homoisocitric acid in lysine biosynthesis. Homoaconitase has been identified in higher fungi (mitochondria) and several archaea and one thermophilic species of bacteria, Thermus thermophilus.

    Aconitase (aconitate hydratase; is an iron-sulphur protein that contains a [4Fe-4S]-cluster and catalyses the interconversion of isocitrate and citrate via a cis-aconitate intermediate. Aconitase functions in both the TCA and glyoxylate cycles, however unlike the majority of iron-sulphur proteins that function as electron carriers, the [4Fe-4S]-cluster of aconitase reacts directly with an enzyme substrate. In eukaryotes there is a cytosolic form (cAcn) and a mitochondrial form (mAcn) of the enzyme. In bacteria there are also 2 forms, aconitase A (AcnA) and B (AcnB). Several aconitases are known to be multi-functional enzymes with a second non-catalytic, but essential function that arises when the cellular environment changes, such as when iron levels drop. Eukaryotic cAcn and mAcn, and bacterial AcnA have the same domain organisation, consisting of three N-terminal alpha/beta/alpha domains, a linker region, followed by a C-terminal 'swivel' domain with a beta/beta/alpha structure (1-2-3-linker-4), although mAcn is small than cAcn. However, bacterial AcnB has a different organisation: it contains an N-terminal HEAT-like domain, followed by the 'swivel' domain, then the three alpha/beta/alpha domains (HEAT-4-1-2-3). Below is a description of some of the multi-functional activities associated with different aconitases.

    This entry represents a region containing 3 domains, each with a 3-layer alpha/beta/alpha topology. This regions represents the [4Fe-4S] cluster-binding region found at the N-terminal of eukaryotic mAcn, cAcn/IPR1 and IRP2, and bacterial AcnA, but in the C-terminal of bacterial AcnB. This domain is also found in the large subunit of isopropylmalate dehydratase (LeuC).

    More information about these proteins can be found at Protein of the Month: Aconitase.

    Proteins where this domain is known:
    PY00319   


    PF00333 - Ribosomal_S5 (Pfam link)

    Interpro entry IPR013810 : Ribosomal protein S5, N-terminal (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein S5 is one of the proteins from the small ribosomal subunit, and is a protein of 166 to 254 amino-acid residues. In Escherichia coli, S5 is known to be important in the assembly and function of the 30S ribosomal subunit. Mutations in S5 have been shown to increase translational error frequencies. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups bacterial, cyanelle, red algal chloroplast, archaeal and fungal mitochondrial S5; mammalian, Caenorhabditis elegans, Drosophila and plant S2; and yeast S4 (SUP44).

    This entry represents the N-terminal domain of ribosomal protein S5, which has an alpha-beta(3)-alpha structure that folds into two layers, alpha/beta.

    Proteins where this domain is known:
    PY06704   


    PF00334 - NDK (Pfam link)

    Interpro entry IPR001564 : Nucleoside diphosphate kinase, core (Interpro link)

    Interpro description:

    Nucleoside diphosphate kinases (NDK) are enzymes required for the synthesis of nucleoside triphosphates (NTP) other than ATP. They provide NTPs for nucleic acid synthesis, CTP for lipid synthesis, UTP for polysaccharide synthesis and GTP for protein elongation, signal transduction and microtubule polymerization.

    In eukaryotes, there seems to be a small family of NDK isozymes each of which acts in a different subcellular compartment and/or has a distinct biological function. Eukaryotic NDK isozymes are hexamers of two highly related chains (A and B). By random association (A6, A5B...AB5, B6), these two kinds of chain form isoenzymes differing in their isoelectric point.

    NDK are proteins of 17 Kd that act via a ping-pong mechanism in which a histidine residue is phosphorylated, by transfer of the terminal phosphate group from ATP. In the presence of magnesium, the phosphoenzyme can transfer its phosphate group to any NDP, to produce an NTP.

    NDK isozymes have been sequenced from prokaryotic and eukaryotic sources. It has also been shown that the Drosophila awd (abnormal wing discs) protein, is a microtubule-associated NDK. Mammalian NDK is also known as metastasis inhibition factor nm23. The sequence of NDK has been highly conserved through evolution. There is a single histidine residue conserved in all known NDK isozymes, which is involved in the catalytic mechanism. Our signature pattern contains this residue.

    Proteins where this domain is known:
    PY04911   


    PF00338 - Ribosomal_S10 (Pfam link)

    Interpro entry IPR001848 : Ribosomal protein S10 (Interpro link)

    Pfam description:
    This family includes small ribosomal subunit S10 from prokaryotes and S20 from eukaryotes.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Evidence suggests that, in prokaryotes, the peptidyl transferase reaction is performed by the large subunit 23S rRNA, whereas proteins probably have a greater role in eukaryotic ribosomes. Most of the proteins lie close to, or on the surface of, the 30S subunit, arranged peripherally around the rRNA. The small subunit ribosomal proteins can be categorised as primary binding proteins, which bind directly and independently to 16S rRNA; secondary binding proteins, which display no specific affinity for 16S rRNA, but its assembly is contingent upon the presence of one or more primary binding proteins; and tertiary binding proteins, which require the presence of one or more secondary binding proteins and sometimes other tertiary binding proteins.

    The small ribosomal subunit protein S10 consists of about 100 amino acid residues. In Escherichia coli, S10 is involved in binding tRNA to the ribosome, and also operates as a transcriptional elongation factor. Experimental evidence has revealed that S10 has virtually no groups exposed on the ribosomal surface, and is one of the "split proteins": these are a discrete group that are selectively removed from 30S subunits under low salt conditions and are required for the formation of activated 30S reconstitution intermediate (RI*) particles. S10 belongs to a family of proteins that includes: bacteria S10; algal chloroplast S10; cyanelle S10; archaebacterial S10; Marchantia polymorpha and Prototheca wickerhamii mitochondrial S10; Arabidopsis thaliana mitochondrial S10 (nuclear encoded); vertebrate S20; plant S20; and yeast URP2.

    Proteins where this domain is known:
    PY00943   


    PF00342 - PGI (Pfam link)

    Interpro entry IPR001672 : Phosphoglucose isomerase (PGI) (Interpro link)

    Pfam description:
    Phosphoglucose isomerase catalyses the interconversion of glucose-6-phosphate and fructose-6-phosphate.

    Interpro description:

    Phosphoglucose isomerase (PGI) is a dimeric enzyme that catalyses the reversible isomerization of glucose-6-phosphate and fructose-6-phosphate. PGI is involved in different pathways: in most higher organisms it is involved in glycolysis; in mammals it is involved in gluconeogenesis; in plants in carbohydrate biosynthesis; in some bacteria it provides a gateway for fructose into the Entner-Doudouroff pathway. The multifunctional protein, PGI, is also known as neuroleukin (a neurotrophic factor that mediates the differentiation of neurons), autocrine motility factor (a tumour-secreted cytokine that regulates cell motility), differentiation and maturation mediator and myofibril-bound serine proteinase inhibitor, and has different roles inside and outside the cell. In the cytoplasm, it catalyses the second step in glycolysis, while outside the cell it serves as a nerve growth factor and cytokine.

    PGI from Bacillus stearothermophilus has an open twisted alpha/beta structural motif consisting of two globular domains and two protruding parts. It has been suggested that the top part of the large domain together with one of the protruding loops might participate in inducing the neurotrophic activity. The structure of rabbit muscle phosphoglucose isomerase complexed with various inhibitors shows that the enzyme is a dimer with two alpha/beta-sandwich domains in each subunit. The location of the bound D-gluconate 6-phosphate inhibitor leads to the identification of residues involved in substrate specificity. In addition, the positions of amino acid residues that are substituted in the genetic disease nonspherocytic hemolytic anemia suggest how these substitutions can result in altered catalysis or protein stability.

    Proteins where this domain is known:
    PY00618   


    PF00344 - SecY (Pfam link)

    Interpro entry IPR002208 : SecY protein (Interpro link)

    Interpro description:

    Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to the translocase component.. From there, the mature proteins are either targeted to the outer membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial chromosome.

    The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of the mature peptide into the periplasm (SecD and SecF). The chaperone protein SecB is a highly acidic homotetrameric protein that exists as a "dimer of dimers" in the bacterial cytoplasm. SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane protein ATPase SecA for secretion. The structure of the Escherichia coli SecYEG assembly revealed a sandwich of two membranes interacting through the extensive cytoplasmic domains. Each membrane is composed of dimers of SecYEG. The monomeric complex contains 15 transmembrane helices.

    The eubacterial secY protein interacts with the signal sequences of secretory proteins as well as with two other components of the protein translocation system: secA and secE. SecY is an integral plasma membrane protein of 419 to 492 amino acid residues that apparently contains 10 transmembrane (TM), 6 cytoplasmic and 5 periplasmic regions.

    Cytoplasmic regions 2 and 3, and TM domains 1, 2, 4, 5, 7 and 10 are well conserved: the conserved cytoplasmic regions are believed to interact with cytoplasmic secretion factors, while the TM domains may participate in protein export. Homologs of secY are found in archaebacteria. SecY is also encoded in the chloroplast genome of some algae where it could be involved in a prokaryotic-like protein export system across the two membranes of the chloroplast endoplasmic reticulum (CER) which is present in chromophyte and cryptophyte algae.

    Proteins where this domain is known:
    PY02510   


    PF00347 - Ribosomal_L6 (Pfam link)

    Interpro entry IPR000702 : Ribosomal protein L6 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    L6 is a protein from the large (50S) subunit. In Escherichia coli, it is located in the aminoacyl-tRNA binding site of the peptidyltransferase centre, and is known to bind directly to 23S rRNA. It belongs to a family of ribosomal proteins, including L6 from bacteria, cyanelles (structures that perform similar functions to chloroplasts, but have structural and biochemical characteristics of Cyanobacteria) and mitochondria; and L9 from mammals, Drosophila, plants and yeast. L6 comprises 2 almost identical folds, suggesting that is was derived by the duplication of an ancient RNA-binding protein gene. Analysis reveals several sites on the protein surface where interactions with other ribosome components may occur, the N-terminus being involved in protein-protein interactions and the C-terminus containing possible RNA-binding sites.

    Proteins where this domain is known:
    PY01587   


    PF00348 - polyprenyl_synt (Pfam link)

    Interpro entry IPR000092 : Polyprenyl synthetase (Interpro link)

    Interpro description:
    A variety of isoprenoid compounds are synthesized by various organisms. For example in eukaryotes the isoprenoid biosynthetic pathway is responsible for the synthesis of a variety of end products including cholesterol, dolichol, ubiquinone or coenzyme Q. In bacteria this pathway leads to the synthesis of isopentenyl tRNA, isoprenoid quinones, and sugar carrier lipids. Among the enzymes that participate in that pathway, are a number of polyprenyl synthetase enzymes which catalyze a 1'4-condensation between 5 carbon isoprene units. It has been shown that all the above enzymes share some regions of sequence similarity. Two of these regions are rich in aspartic-acid residues and could be involved in the catalytic mechanism and/or the binding of the substrates.

    Proteins where this domain is known:
    PY00446    PY05919   


    PF00349 - Hexokinase_1 (Pfam link)

    Interpro entry IPR001312 : Hexokinase (Interpro link)

    Pfam description:
    Hexokinase (EC:2.7.1.1) contains two structurally similar domains represented by this family and PFAM:PF03727. Some members of the family have two copies of each of these domains.

    Interpro description:

    Hexokinase is an important enzyme that catalyses the ATP-dependent conversion of aldo- and keto-hexose sugars to the hexose-6-phosphate (H6P). The enzyme can catalyse this reaction on glucose, fructose, sorbitol and glucosamine, and as such is the first step in a number of metabolic pathways. The addition of a phosphate group to the sugar acts to trap it in a cell, since the negatively charged phosphate cannot easily traverse the plasma membrane.

    The enzyme is widely distributed in eukaryotes. There are three isozymes of hexokinase in yeast (PI, PII and glucokinase): isozymes PI and PII phosphorylate both aldo- and keto-sugars; glucokinase is specific for aldo-hexoses. All three isozymes contain two domains. Structural studies of yeast hexokinase reveal a well-defined catalytic pocket that binds ATP and hexose, allowing easy transfer of the phosphate from ATP to the sugar. Vertebrates contain four hexokinase isozymes, designated I to IV, where types I to III contain a duplication of the two-domain yeast-type hexokinases. Both the N- and C-terminal halves bind hexose and H6P, though in types I an III only the C-terminal half supports catalysis, while both halves support catalysis in type II. The N-terminal half is the regulatory region. Type IV hexokinase is similar to the yeast enzyme in containing only the two domains, and is sometimes incorrectly referred to as glucokinase.

    The different vertebrate isozymes differ in their catalysis, localisation and regulation, thereby contributing to the different patterns of glucose metabolism in different tissues. Whereas types I to III can phosphorylate a variety of hexose sugars and are inhibited by glucose-6-phosphate (G6P), type IV is specific for glucose and shows no G6P inhibition. Type I enzyme may have a catabolic function, producing H6P for energy production in glycolysis; it is bound to the mitochondrial membrane, which enables the coordination of glycolysis with the TCA cycle. Types II and III enzyme may have anabolic functions, providing H6P for glycogen or lipid synthesis. Type IV enzyme is found in the liver and pancreatic beta-cells, where it is controlled by insulin (activation) and glucagon (inhibition). In pancreatic beta-cells, type IV enzyme acts as a glucose sensor to modify insulin secretion. Mutations in type IV hexokinase have been associated with diabetes mellitus.

    Proteins where this domain is known:
    PY02030   


    PF00350 - Dynamin_N (Pfam link)

    Interpro entry IPR001401 : Dynamin, GTPase region (Interpro link)

    Interpro description:

    Membrane transport between compartments in eukaryotic cells requires proteins that allow the budding and scission of nascent cargo vesicles from one compartment and their targeting and fusion with another. Dynamins are large GTPases that belong to a protein superfamily that, in eukaryotic cells, includes classical dynamins, dynamin-like proteins, OPA1, Mx proteins, mitofusins and guanylate-binding proteins/atlastins, and are involved in the scission of a wide range of vesicles and organelles. They play a role in many processes including budding of transport vesicles, division of organelles, cytokinesis and pathogen resistance.

    The minimal distinguishing architectural features that are common to all dynamins and are distinct from other GTPases are the structure of the large GTPase domain (300 amino acids) and the presence of two additional domains; the middle domain and the GTPase effector domain (GED), which are involved in oligomerization and regulation of the GTPase activity.

    This entry represents the GTPase domain, containing the GTP-binding motifs that are needed for guanine-nucleotide binding and hydrolysis. The conservation of these motifs is absolute except for the the final motif in guanylate-binding proteins. The GTPase catalytic activity can be stimulated by oligomerisation of the protein, which is mediated by interactions between the GTPase domain, the middle domain and the GED.

    Proteins where this domain is known:
    PY00714    PY03528    PY04073    PY07647   


    PF00352 - TBP (Pfam link)

    Interpro entry IPR000814 : TATA-box binding (Interpro link)

    Interpro description:

    The TATA-box binding protein (TBP) is required for the initiation of transcription by RNA polymerases I, II and III, from promoters with or without a TATA box. TBP associates with a host of factors, including the general transcription factors TFIIA, -B, -D, -E, and -H, to form huge multi-subunit pre-initiation complexes on the core promoter. Through its association with different transcription factors, TBP can initiate transcription from different RNA polymerases. There are several related TBPs, including TBP-like (TBPL) proteins.

    The C-terminal core of TBP (~180 residues) is highly conserved and contains two 77-amino acid repeats that produce a saddle-shaped structure that straddles the DNA; this region binds to the TATA box and interacts with transcription factors and regulatory proteins . By contrast, the N-terminal region varies in both length and sequence.

    Proteins where this domain is known:
    PY00685    PY00712   


    PF00355 - Rieske (Pfam link)

    Interpro entry IPR005806 : Rieske [2Fe-2S] region (Interpro link)

    Pfam description:
    The rieske domain has a centre. Two conserved cysteines that one Fe ion while the other Fe ion is coordinated by two conserved histidines.

    Interpro description:

    Ubiquinol-cytochrome c reductase (bc1 complex or complex III) is an enzyme complex of bacterial and mitochondrial oxidative phosphorylation systems It catalyses the oxidoreduction of the mobile redox components ubiquinol and cytochrome c, generating an electrochemical potential, which is linked to ATP synthesis. The complex consists of three subunits in most bacteria, and nine in mitochondria: both bacterial and mitochondrial complexes contain cytochrome b and cytochrome c1 subunits, and an iron-sulphur 'Rieske' subunit, which contains a high potential 2Fe-2S cluster.The mitochondrial form also includes six other subunits that do not possess redox centres. Plastoquinone-plastocyanin reductase (b6f complex), present in cyanobacteria and the chloroplasts of plants, catalyses the oxidoreduction of plastoquinol and cytochrome f. This complex, which is functionally similar to ubiquinol-cytochrome c reductase, comprises cytochrome b6, cytochrome f and Rieske subunits.

    The Rieske subunit acts by binding either a ubiquinol or plastoquinol anion, transferring an electron to the 2Fe-2S cluster, then releasing the electron to the cytochrome c or cytochrome f haem iron. The rieske domain has a [2Fe-2S] centre. Two conserved cysteines that one Fe ion while the other Fe ion is coordinated by two conserved histidines. The 2Fe-2S cluster is bound in the highly conserved C-terminal region of the Rieske subunit.

    Proteins where this domain is known:
    PY01431    PY05634   


    PF00364 - Biotin_lipoyl (Pfam link)

    Interpro entry IPR000089 : (Interpro link)

    Pfam description:
    This family covers two Prosite entries, the conserved lysine residue binds biotin in one group and lipoic acid in the other. Note that the HMM does not currently recognise the Glycine cleavage system H proteins.

    Interpro description:
    The biotin / lipoyl attachment domain has a conserved lysine residue that binds biotin or lipoic acid. Biotin plays a catalytic role in some carboxyl transfer reactions and is covalently attached, via an amide bond, to a lysine residue in enzymes requiring this coenzyme. E2 acyltransferases have an essential cofactor, lipoic acid, which is covalently bound via an amide linkage to a lysine group. The lipoic acid cofactor is found in a variety of proteins that include, H-protein of the glycine cleavage system (GCS), mammalian and yeast pyruvate dehydrogenases and fast migrating protein (FMP) (gene acoC) from Ralstonia eutropha (Alcaligenes eutrophus).

    Proteins where this domain is known:
    PY00503    PY01695    PY03521    PY04573   


    PF00365 - PFK (Pfam link)

    Interpro entry IPR000023 : Phosphofructokinase (Interpro link)

    Interpro description:
    The enzyme-catalysed transfer of a phosphoryl group from ATP is an important reaction in a wide variety of biological processes. One enzyme that utilises this reaction is phosphofructokinase (PFK), which catalyses the phosphorylation of fructose-6-phosphate to fructose-1,6- bisphosphate, a key regulatory step in the glycolytic pathway. PFK exists as a homotetramer in bacteria and mammals (where each monomer possesses 2 similar domains), and as an octomer in yeast (where there are 4 alpha- (PFK1) and 4 beta-chains (PFK2), the latter, like the mammalian monomers, possessing 2 similar domains).

    PFK is ~300 amino acids in length, and structural studies of the bacterial enzyme have shown it comprises two similar (alpha/beta) lobes: one involved in ATP binding and the other housing both the substrate-binding site and the allosteric site (a regulatory binding site distinct from the active site, but that affects enzyme activity). The identical tetramer subunits adopt 2 different conformations: in a 'closed' state, the bound magnesium ion bridges the phosphoryl groups of the enzyme products (ADP and fructose-1,6- bisphosphate); and in an 'open' state, the magnesium ion binds only the ADP, as the 2 products are now further apart. These conformations are thought to be successive stages of a reaction pathway that requires subunit closure to bring the 2 molecules sufficiently close to react.

    Deficiency in PFK leads to glycogenosis type VII (Tauri's disease), an autosomal recessive disorder characterised by severe nausea, vomiting, muscle cramps and myoglobinuria in response to bursts of intense or vigorous exercise. Sufferers are usually able to lead a reasonably ordinary life by learning to adjust activity levels.

    Proteins where this domain is known:
    PY01321   

    Proteins where this domain has been detected by our approach:
    PY05918   


    PF00366 - Ribosomal_S17 (Pfam link)

    Interpro entry IPR000266 : Ribosomal protein S17 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    The ribosomal proteins catalyse ribosome assembly and stabilise the rRNA, tuning the structure of the ribosome for optimal function. Evidence suggests that, in prokaryotes, the peptidyl transferase reaction is performed by the large subunit 23S rRNA, whereas proteins probably have a greater role in eukaryotic ribosomes. Most of the proteins lie close to, or on the surface of, the 30S subunit, arranged peripherally around the rRNA. The small subunit ribosomal proteins can be categorised as primary binding proteins, which bind directly and independently to 16S rRNA; secondary binding proteins, which display no specific affinity for 16S rRNA, but its assembly is contingent upon the presence of one or more primary binding proteins; and tertiary binding proteins, which require the presence of one or more secondary binding proteins and sometimes other tertiary binding proteins. The small ribosomal subunit protein S17 is known to bind specifically to the 5' end of 16S ribosomal RNA in Escherichia coli (primary rRNA binding protein), and is thought to be involved in the recognition of termination codons. Experimental evidence has revealed that S17 has virtually no groups exposed on the ribosomal surface.

    Proteins where this domain is known:
    PY06696   


    PF00370 - FGGY_N (Pfam link)

    Interpro entry IPR018484 : Carbohydrate kinase, FGGY, N-terminal (Interpro link)

    Pfam description:
    This domain adopts a ribonuclease H-like fold and is structurally related to the C-terminal domain.

    Interpro description:
    It has been shown that four different type of carbohydrate kinases seem to be evolutionary related. These enzymes include L-fucolokinase (gene fucK); gluconokinase (gene gntK); glycerol kinase (gene glpK); xylulokinase (gene xylB); and L-xylulose kinase (gene lyxK). These enzymes are proteins of from 480 to 520 amino acid residues.

    This entry represents the N-terminal domain of these proteins. It adopts a ribonuclease H-like fold and is structurally related to the C-terminal domain.

    Proteins where this domain is known:
    PY00935   


    PF00378 - ECH (Pfam link)

    Interpro entry IPR001753 : Crotonase, core (Interpro link)

    Pfam description:
    This family contains a diverse set of enzymes including: Enoyl-CoA hydratase (Swiss:Q13011). Napthoate synthase (Swiss:P27290). Carnitate racemase (Swiss:P31551). 3-hydoxybutyryl-CoA dehydratase (Swiss:P52046). Dodecanoyl-CoA delta-isomerase (Swiss:P42126).

    Interpro description:

    The crotonase superfamily is comprised of mechanistically diverse proteins that share a conserved trimeric quaternary structure (sometimes a hexamer consisting of a dimer of trimers), the core of which consists of 4 turns of a (beta/beta/alpha)n superhelix. Some enzymes in the superfamily have been shown to display dehalogenase, hydratase, and isomerase activities, while others have been implicated in carbon-carbon bond formation and cleavage as well as the hydrolysis of thioesters. However, these different enzymes share the need to stabilise an enolate anion intermediate derived from an acyl-CoA substrate. This is accomplished by two structurally conserved peptidic NH groups that provide hydrogen bonds to the carbonyl moieties of the acyl-CoA substrates and form an "oxyanion hole". The CoA thioester derivatives bind in a characteristic hooked shape and a conserved tunnel binds the pantetheine group of CoA, which links the 3'-phosphate ADP binding site to the site of reaction. Enzymes in the crotonase superfamily include:

    This entry represents the core domain found in crotonase superfamily members.

    Proteins where this domain is known:
    PY02220    PY06826   


    PF00380 - Ribosomal_S9 (Pfam link)

    Interpro entry IPR000754 : Ribosomal protein S9 (Interpro link)

    Pfam description:
    This family includes small ribosomal subunit S9 from prokaryotes and S16 from eukaryotes.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein S9 is one of the proteins from the small ribosomal subunit. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups bacterial; algal chloroplast; cyanelle and archaeal S9 proteins; and mammalian; plant; and yeast mitochondrial ribosomal S9 proteins.

    Proteins where this domain is known:
    PY01220    PY06464    PY07168   


    PF00382 - TFIIB (Pfam link)

    Interpro entry IPR013150 : Transcription factor TFIIB, cyclin-related (Interpro link)

    Interpro description:

    Cyclins are eukaryotic proteins that play an active role in controlling nuclear cell division cycles, and regulate cyclin dependent kinases (CDKs). Cyclins, together with the p34 (cdc2) or cdk2 kinases, form the Maturation Promoting Factor (MPF). There are two main groups of cyclins, G1/S cyclins, which are essential for the control of the cell cycle at the G1/S (start) transition, and G2/M cyclins, which are essential for the control of the cell cycle at the G2/M (mitosis) transition. G2/M cyclins accumulate steadily during G2 and are abruptly destroyed as cells exit from mitosis (at the end of the M-phase). In most species, there are multiple forms of G1 and G2 cyclins. For example, in vertebrates, there are two G2 cyclins, A and B, and at least three G1 cyclins, C, D, and E.

    Cyclin homologues have been found in various viruses, including Saimiriine herpesvirus 2 (Herpesvirus saimiri) and Human herpesvirus 8 (HHV-8) (Kaposi's sarcoma-associated herpesvirus). These viral homologues differ from their cellular counterparts in that the viral proteins have gained new functions and eliminated others to harness the cell and benefit the virus.

    In eukaryotes, transcription initiation of all protein encoding genes involves the polymerase II system. This sytem is modulated by both general and specific transcription factors. The general factors (which include TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIG and TFIIH) operate through common promoter elements, such as the TATA box. Transcription factor IIB (TFIIB) is of central importance in transcription of class II genes. It associates with TFIID-TFIIA bound to DNA (the DA complex) to form a ternary TFIID-IIA-IBB (DAB) complex, which is recognized by RNA polymerase II. TFIIB comprises ~315-340 residues and contains an imperfect C-terminal repeat of a 75-residue domain that may contribute to the symmetry of the folded protein. The basal archaeal transcription machinery resembles that of the eukaryotic polymerase II system and includes a homologue of TFIIB.

    This entry represents a cyclin-like domain which is found repeated in the C-terminal region of a variety of eukaryotic TFIIB's and their archaeal counterparts. These domains individually form the typical cyclin fold, and in the transcription complex they straddle the C-terminal region of the TATA-binding protein - an interaction essential for the formation of the transcription initiation complex.

    Proteins where this domain is known:
    PY07313   

    Proteins where this domain has been detected by our approach:
    PY06502    PY07322   


    PF00383 - dCMP_cyt_deam_1 (Pfam link)

    Interpro entry IPR002125 : CMP/dCMP deaminase, zinc-binding (Interpro link)

    Interpro description:

    Cytidine deaminase (cytidine aminohydrolase) catalyzes the hydrolysis of cytidine into uridine and ammonia while deoxycytidylate deaminase (dCMP deaminase) hydrolyzes dCMP into dUMP. Both enzymes are known to bind zinc and to require it for their catalytic activity. These two enzymes do not share any sequence similarity with the exception of a region that contains three conserved histidine and cysteine residues which are thought to be involved in the binding of the catalytic zinc ion.

    Such a region is also found in other proteins:

    Proteins where this domain is known:
    PY04017    PY04717   


    PF00385 - Chromo (Pfam link)

    Interpro entry IPR000953 : Chromo domain (Interpro link)

    Interpro description:
    The CHROMO (CHRromatin Organization MOdifier) domain is a conserved region of around 60 amino acids, originally identified in Drosophila modifiers of variegation. These are proteins that alter the structure of chromatin to the condensed morphology of heterochromatin, a cytologically visible condition where gene expression is repressed. In one of these proteins, Polycomb, the chromo domain has been shown to be important for chromatin targeting. Proteins that contain a chromo domain appear to fall into 3 classes. The first class includes proteins having an N-terminal chromo domain followed by a region termed the chromo shadow domain, eg. Drosophila and human heterochromatin protein Su(var)205 (HP1). The second class includes proteins with a single chromo domain, eg. Drosophila protein Polycomb (Pc); mammalian modifier 3; human Mi-2 autoantigenand and several yeast and Caenorhabditis elegans hypothetical proteins. In the third class paired tandem chromo domains are found, eg. in mammalian DNA-binding/helicase proteins CHD-1 to CHD-4 and yeast protein CHD1.

    Proteins where this domain is known:
    PY02297    PY03628    PY04907   


    PF00387 - PI-PLC-Y (Pfam link)

    Interpro entry IPR001711 : Phospholipase C, phosphatidylinositol-specific, Y domain (Interpro link)

    Pfam description:
    This associates with Pfam:PF00388 to form a single structural unit.

    Interpro description:

    Phosphatidylinositol-specific phospholipase C, an eukaryotic intracellular enzyme, plays an important role in signal transduction processes (see. It catalyzes the hydrolysis of 1-phosphatidyl-D-myo-inositol-3,4,5-triphosphate into the second messenger molecules diacylglycerol and inositol-1,4,5-triphosphate. This catalytic process is tightly regulated by reversible phosphorylation and binding of regulatory proteins.

    In mammals, there are at least 6 different isoforms of PI-PLC, they differ in their domain structure, their regulation, and their tissue distribution. Lower eukaryotes also possess multiple isoforms of PI-PLC.

    All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as 'X-box' (see and 'Y-box'. The order of these two regions is always the same (NH2-X-Y-COOH), but the spacing is variable. In most isoforms, the distance between these two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, and one SH3 domain are inserted between the two PLC-specific domains. The two conserved regions have been shown to be important for the catalytic activity. At the C-terminal of the Y-box, there is a C2 domain (see possibly involved in Ca-dependent membrane attachment.

    Proteins where this domain is known:
    PY04833   


    PF00388 - PI-PLC-X (Pfam link)

    Interpro entry IPR000909 : Phospholipase C, phosphatidylinositol-specific , X region (Interpro link)

    Pfam description:
    This associates with Pfam:PF00387 to form a single structural unit.

    Interpro description:
    Phosphatidylinositol-specific phospholipase C, a eukaryotic intracellular enzyme, plays an important role in signal transduction processes. It catalyzes the hydrolysis of 1-phosphatidyl-D-myo-inositol-3,4,5-triphosphate into the second messenger molecules diacylglycerol and inositol-1,4,5-triphosphate. This catalytic process is tightly regulated by reversible phosphorylation and binding of regulatory proteins. In mammals, there are at least 6 different isoforms of PI-PLC, they differ in their domain structure, their regulation, and their tissue distribution. Lower eukaryotes also possess multiple isoforms of PI-PLC. All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as the 'X-box' and 'Y-box'. The order of these two regions is always the same (NH2-X-Y-COOH), but the spacing is variable. In most isoforms, the distance between these two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, and one SH3 domain are inserted between the two PLC-specific domains. The two conserved regions have been shown to be important for the catalytic activity. By profile analysis, we could show that sequences with significant similarity to the X-box domain occur also in prokaryotic and trypanosome PI-specific phospholipases C. Apart from this region, the prokaryotic enzymes show no similarity to their eukaryotic counterparts.

    Proteins where this domain is known:
    PY04833   


    PF00393 - 6PGD (Pfam link)

    Interpro entry IPR006114 : 6-phosphogluconate dehydrogenase, C-terminal (Interpro link)

    Pfam description:
    This family represents the C-terminal all-alpha domain of 6-phosphogluconate dehydrogenase. The domain contains two structural repeats of 5 helices each.

    Interpro description:

    6-Phosphogluconate dehydrogenase (6PGD) is an oxidative carboxylase that catalyses the decarboxylating reduction of 6-phosphogluconate into ribulose 5-phosphate in the presence of NADP. This reaction is a component of the hexose mono-phosphate shunt and pentose phosphate pathways (PPP). Prokaryotic and eukaryotic 6PGD are proteins of about 470 amino acids whose sequences are highly conserved. The protein is a homodimer in which the monomers act independently: each contains a large, mainly alpha-helical domain and a smaller beta-alpha-beta domain, containing a mixed parallel and anti-parallel 6-stranded beta sheet. NADP is bound in a cleft in the small domain, the substrate binding in an adjacent pocket.

    This entry represents the C-terminal all-alpha domain of 6-phosphogluconate dehydrogenase. The domain contains two structural repeats of 5 helices each. The NAD-binding domain is described in

    Proteins where this domain is known:
    PY00858   


    PF00397 - WW (Pfam link)

    Interpro entry IPR001202 : WW/Rsp5/WWP (Interpro link)

    Pfam description:
    The WW domain is a protein module with two highly conserved tryptophans that binds proline-rich peptide motifs in vitro.

    Interpro description:

    Synonym(s): Rsp5 or WWP domain

    The WW domain is a short conserved region in a number of unrelated proteins, which folds as a stable, triple stranded beta-sheet. This short domain of approximately 40 amino acids, may be repeated up to four times in some proteins. The name WW or WWP derives from the presence of two signature tryptophan residues that are spaced 20-23 amino acids apart and are present in most WW domains known to date, as well as that of a conserved Pro. The WW domain binds to proteins with particular proline-motifs, [AP]-P-P-[AP]-Y, and/or phosphoserine- phosphothreonine-containing motifs. It is frequently associated with other domains typical for proteins in signal transduction processes.

    A large variety of proteins containing the WW domain are known. These include; dystrophin, a multidomain cytoskeletal protein; utrophin, a dystrophin-like protein of unknown function; vertebrate YAP protein, substrate of an unknown serine kinase; Mus musculus (Mouse) NEDD-4, involved in the embryonic development and differentiation of the central nervous system; Saccharomyces cerevisiae (Baker's yeast) RSP5, similar to NEDD-4 in its molecular organization; Rattus norvegicus (Rat) FE65, a transcription-factor activator expressed preferentially in liver; Nicotiana tabacum (Common tobacco) DB10 protein and others.

    Proteins where this domain is known:
    PY01815    PY02680    PY02887    PY03171    PY03239    PY04528   

    Proteins where this domain has been detected by our approach:
    PY00121    PY02705   


    PF00398 - RrnaAD (Pfam link)

    Interpro entry IPR001737 : Ribosomal RNA adenine methylase transferase (Interpro link)

    Interpro description:

    This family of proteins include rRNA adenine dimethylases (e.g. KsgA) and the Erythromycin resistance methylases (Erm).

    The bacterial enzyme KsgA catalyses the transfer of a total of four methyl groups from S-adenosyl-l-methionine (S-AdoMet) to two adjacent adenosine bases in 16S rRNA. This enzyme and the resulting modified adenosine bases appear to be conserved in all species of eubacteria, eukaryotes, and archaea, and in eukaryotic organelles. Bacterial resistance to the aminoglycoside antibiotic kasugamycin involves inactivation of KsgA and resulting loss of the dimethylations, with modest consequences to the overall fitness of the organism. In contrast, the yeast ortholog, Dim1, is essential. In Saccharomyces cerevisiae (Baker's yeast), and presumably in other eukaryotes, the enzyme performs a vital role in pre-rRNA processing in addition to its methylating activity. The best conserved region in these enzymes is located in the N-terminal section and corresponds to a region that is probably involved in S-adenosyl methionine (SAM) binding domain.

    The crystal structure of KsgA from Escherichia coli has been solved to a resolution of 2.1A. It bears a strong similarity to the crystal structure of ErmC' from Bacillus stearothermophilus and a lesser similarity to the yeast mitochondrial transcription factor, sc-mtTFB.

    The Erm family of RNA methyltransferases, which methylate a single adenosine base in 23S rRNA confer resistance to the MLS-B group of antibiotics. Despite their sequence similarity, the two enzyme families have strikingly different levels of regulation that remain to be elucidated. Other orthologs, of this family include the yeast and Homo sapiens (Human) mitochondrial transcription factors (MTF1 and h-mtTFB respectively), which are nuclear encoded. Human-mtTFB is able to stimulate transcription in vitro independently of its S-adenosylmethionine binding and rRNA methyltransferase activity.

    Proteins where this domain is known:
    PY06281    PY06971   


    PF00400 - WD40 (Pfam link)

    Interpro entry IPR001680 : (Interpro link)

    Interpro description:

    WD-40 repeats (also known as WD or beta-transducin repeats) are short ~40 amino acid motifs, often terminating in a Trp-Asp (W-D) dipeptide. WD40 repeats usually assume a 7-8 bladed beta-propeller fold, but proteins have been found with 4 to 16 repeated units, which also form a circularised beta-propeller structure. WD-repeat proteins are a large family found in all eukaryotes and are implicated in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control and apoptosis. Repeated WD40 motifs act as a site for protein-protein interaction, and proteins containing WD40 repeats are known to serve as platforms for the assembly of protein complexes or mediators of transient interplay among other proteins. The specificity of the proteins is determined by the sequences outside the repeats themselves. Examples of such complexes are G proteins (beta subunit is a beta-propeller), TAFII transcription factor, and E3 ubiquitin ligase. In Arabidopsis spp., several WD40-containing proteins act as key regulators of plant-specific developmental events.

    Proteins where this domain is known:
    PY00184    PY00209    PY00229    PY00516    PY00572    PY00595    PY00602    PY00623    PY00810    PY01039    PY01043    PY01045    PY01106    PY01132    PY01135    PY01158    PY01337    PY01359    PY01377    PY01574    PY01805    PY01912    PY01946    PY01983    PY02029    PY02111    PY02153    PY02221    PY02268    PY02344    PY02400    PY02499    PY02598    PY02802    PY03149    PY03519    PY03590    PY03714    PY03773    PY03775    PY03863    PY03899    PY03920    PY03930    PY03942    PY04096    PY04225    PY04284    PY04526    PY04897    PY04898    PY05114    PY05579    PY05621    PY05679    PY05801    PY05987    PY06589    PY06717    PY07202    PY07385    PY07496    PY07576   

    Proteins where this domain has been detected by our approach:
    PY01361    PY02072    PY03666    PY03677    PY07045    PY07289   


    PF00406 - ADK (Pfam link)

    Interpro entry IPR000850 : Adenylate kinase (Interpro link)

    Interpro description:
    Adenylate kinases (ADK) are phosphotransferases that catalyse the reversible reaction
     AMP + MgATP = ADP + MgADP 
    an essential reaction for many processes in living cells. Two ADK isozymes have been identified in mammalian cells. These specifically bind AMP and favour binding to ATP over other nucleotide triphosphates (AK1 is cytosolic and AK2 is located in the mitochondria). A third ADK has been identified in bovine heart and human cells, this is a mitochondrial GTP:AMP phosphotransferase, also specific for the phosphorylation of AMP, but can only use GTP or ITP as a substrate. ADK has also been identified in different bacterial species and in yeast . Two further enzymes are known to be related to the ADK family, i.e. yeast uridine monophosphokinase and slime mold UMP-CMP kinase. Within the ADK family there are several conserved regions, including the ATP-binding domains. One of the most conserved areas includes an Arg residue, whose modification inactivates the enzyme, together with an Asp that resides in the catalytic cleft of the enzyme and participates in a salt bridge.

    Proteins where this domain is known:
    PY01562    PY02813    PY05592    PY05596   


    PF00408 - PGM_PMM_IV (Pfam link)

    Interpro entry IPR005843 : Alpha-D-phosphohexomutase, C-terminal (Interpro link)

    Interpro description:

    The alpha-D-phosphohexomutase superfamily is composed of four related enzymes, each of which catalyses a phosphoryl transfer on their sugar substrates: phosphoglucomutase (PGM), phosphoglucomutase/phosphomannomutase (PGM/PMM), phosphoglucosamine mutase (PNGM), and phosphoacetylglucosamine mutase (PAGM). PGM converts D-glucose 1-phosphate into D-glucose 6-phosphate, and participates in both the breakdown and synthesis of glucose. PGM/PMM () are primarily bacterial enzymes that use either glucose or mannose as substrate, participating in the biosynthesis of a variety of carbohydrates such as lipopolysaccharides and alginate. Both PNGM () and PAGM () are involved in the biosynthesis of UDP-N-acetylglucosamine.

    Despite differences in substrate specificity, these enzymes share a similar catalytic mechanism, converting 1-phospho-sugars to 6-phospho-sugars via a biphosphorylated 1,6-phospho-sugar. The active enzyme is phosphorylated at a conserved serine residue and binds one magnesium ion; residues around the active site serine are well conserved among family members. The reaction mechanism involves phosphoryl transfer from the phosphoserine to the substrate to create a biophosphorylated sugar, followed by a phosphoryl transfer from the substrate back to the enzyme.

    The structures of PGM and PGM/PMM have been determined, and were found to be very similar in topology. These enzymes are both composed of four domains and a large central active site cleft, where each domain contains residues essential for catalysis and/or substrate recognition. Domain I contains the catalytic phosphoserine, domain II contains a metal-binding loop to coordinate the magnesium ion, domain III contains the sugar-binding loop that recognises the two different binding orientations of the 1- and 6-phospho-sugars, and domain IV contains a phosphate-binding site required for orienting the incoming phospho-sugar substrate.

    This entry represents the C-terminal domain alpha-D-phosphohexomutase enzymes.

    Proteins where this domain is known:
    PY02130   


    PF00410 - Ribosomal_S8 (Pfam link)

    Interpro entry IPR000630 : Ribosomal protein S8 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein S8 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S8 is known to bind directly to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups eubacterial, algal and plant chloroplast, cyanelle, archaebacterial and Marchantia polymorpha mitochondrial S8; mammalian and plant S15A; and yeast S22 (S24) ribosomal proteins.

    Proteins where this domain is known:
    PY04951    PY05271   


    PF00411 - Ribosomal_S11 (Pfam link)

    Interpro entry IPR001971 : Ribosomal protein S11 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein S11 plays an essential role in selecting the correct tRNA in protein biosynthesis. It is located on the large lobe of the small ribosomal subunit. On the basis of sequence similarities, S11 belongs to a family of bacterial, archaeal and eukaryotic ribosomal proteins.

    Proteins where this domain is known:
    PY01635    PY01636   


    PF00415 - RCC1 (Pfam link)

    Interpro entry IPR000408 : (Interpro link)

    Interpro description:

    The regulator of chromosome condensation (RCC1) is a eukaryotic protein which binds to chromatin and interacts with ran, a nuclear GTP-binding protein to promote the loss of bound GDP and the uptake of fresh GTP, thus acting as a guanine-nucleotide dissociation stimulator (GDS). The interaction of RCC1 with ran probably plays an important role in the regulation of gene expression.

    RCC1, known as PRP20 or SRM1 in yeast, pim1 in fission yeast and BJ1 in Drosophila, is a protein that contains seven tandem repeats of a domain of about 50 to 60 amino acids. As shown in the following schematic representation, the repeats make up the major part of the length of the protein. Outside the repeat region, there is just a small N-terminal domain of about 40 to 50 residues and, in the Drosophila protein only, a C-terminal domain of about 130 residues.

    The RCC1-type of repeat is also found in the X-linked retinitis pigmentosa GTPase regulator. The RCC repeats form a beta-propeller structure.

    Proteins where this domain is known:
    PY01216    PY01893    PY07101   

    Proteins where this domain has been detected by our approach:
    PY01994    PY02933   


    PF00416 - Ribosomal_S13 (Pfam link)

    Interpro entry IPR001892 : Ribosomal protein S13 (Interpro link)

    Pfam description:
    This family includes ribosomal protein S13 from prokaryotes and S18 from eukaryotes.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein S13 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S13 is known to be involved in binding fMet-tRNA and, hence, in the initiation of translation. It is a basic protein of 115 to 177 amino-acid residues. This family of ribosomal proteins is present in procaryotes and eukaryotes.

    Proteins where this domain is known:
    PY07508   


    PF00425 - Chorismate_bind (Pfam link)

    Interpro entry IPR015890 : (Interpro link)

    Pfam description:
    This family includes the catalytic regions of the chorismate binding enzymes anthranilate synthase, isochorismate synthase, aminodeoxychorismate synthase and para-aminobenzoate synthase.

    Interpro description:
    This entry represents the catalytic regions of the chorismate binding enzymes anthranilate synthase, isochorismate synthase, aminodeoxychorismate synthase and para-aminobenzoate synthase. Anthranilate synthase catalyses the reaction:
     chorismate + l-glutamine =  anthranilate + pyruvate + l-glutamate. 
    The enzyme is a tetramer comprising 2 I and 2 II components: this entry is restricted to component I that catalyses the formation of anthranilate using ammonia rather than glutamine, while component II provides glutamine amidotransferase activity

    Proteins where this domain is known:
    PY04548   


    PF00428 - Ribosomal_60s (Pfam link)

    Interpro entry IPR001813 : Ribosomal protein 60S (Interpro link)

    Pfam description:
    This family includes archaebacterial L12, eukaryotic P0, P1 and P2.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    The 60S acidic ribosomal protein plays an important role in the elongation step of protein synthesis. This family includes archaebacterial L12, eukaryotic P0, P1 and P2.

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E., Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of the first three letters of the genus; a space; the first letter of the species name; a space and an arabic number. In the event that two species names have identical designations, they are discriminated from one another by adding one or more letters (as necessary) to each species designation.

    The allergens in this family include allergens with the following designations: Alt a 6, Alt a 12, Cla h 3, Cla h 4 and Cla h 12.

    Proteins where this domain is known:
    PY00659    PY02905    PY06972   


    PF00432 - Prenyltrans (Pfam link)

    Interpro entry IPR001330 : Prenyltransferase/squalene oxidase (Interpro link)

    Interpro description:

    The beta subunit of the farnesyltransferases is responsible for peptide binding. Squalene-hopene cyclase is a bacterial enzyme that catalyzes the cyclization of squalene into hopene, a key step in hopanoid (triterpenoid) metabolism. Lanosterol synthase (oxidosqualene-lanosterol cyclase) catalyzes the cyclization of (S)-2,3-epoxysqualene to lanosterol, the initial precursor of cholesterol, steroid hormones and vitamin D in vertebrates and of ergosterol in fungi. Cycloartenol synthase (2,3-epoxysqualene-cycloartenol cyclase) is a plant enzyme that catalyzes the cyclization of (S)-2,3-epoxysqualene to cycloartenol.

    Proteins where this domain is known:
    PY02024    PY02409    PY07078   


    PF00433 - Pkinase_C (Pfam link)

    Interpro entry IPR017892 : Protein kinase, C-terminal (Interpro link)

    Interpro description:

    Protein kinases are a group of enzymes that possess a catalytic subunit which transfers the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. The enzymes fall into two broad classes, characterised with respect to substrate specificity: serine/threonine specific and tyrosine specific.

    Protein kinase function has been evolutionarily conserved from Escherichia coli to human. Protein kinases play a role in a mulititude of cellular processes, including division, proliferation, apoptosis, and differentiation. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins.

    The catalytic subunits of protein kinases are highly conserved, and several structures have been solved, leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases.

    This domain is found in a large variety of protein kinases with different functions and dependencies. Protein kinase C, for example, is a calcium-activated, phospholipid-dependent serine- and threonine-specific enzyme. It is activated by diacylglycerol which, in turn, phosphorylates a range of cellular proteins. This domain is most often found associated with

    Proteins where this domain is known:
    PY00403   

    Proteins where this domain has been detected by our approach:
    PY05235   


    PF00436 - SSB (Pfam link)

    Interpro entry IPR000424 : Primosome PriB/single-strand DNA-binding (Interpro link)

    Pfam description:
    This family includes single stranded binding proteins and also the primosomal replication protein N (PriB). PriB forms a complex with PriA, PriC and ssDNA.

    Interpro description:
    The Escherichia coli single-strand binding protein (gene ssb), also known as the helix-destabilizing protein, is a protein of 177 amino acids. It binds tightly, as a homotetramer, to single-stranded DNA (ss-DNA) and plays an important role in DNA replication, recombination and repair. Closely related variants of SSB are encoded in the genome of a variety of large self-transmissible plasmids. SSB has also been characterised in bacteria such as Proteus mirabilis or Serratia marcescens. Eukaryotic mitochondrial proteins that bind ss-DNA and are probably involved in mitochondrial DNA replication are structurally and evolutionary related to prokaryotic SSB.

    Proteins where this domain is known:
    PY05238   


    PF00438 - S-AdoMet_synt_N (Pfam link)

    Interpro entry IPR002133 : S-adenosylmethionine synthetase (Interpro link)

    Pfam description:
    The three domains of S-adenosylmethionine synthetase have the same alpha+beta fold.

    Interpro description:

    S-adenosylmethionine synthetase (MAT) is the enzyme that catalyzes the formation of S-adenosylmethionine (AdoMet) from methionine and ATP. AdoMet is an important methyl donor for transmethylation and is also the propylamino donor in polyamine biosynthesis.

    In bacteria there is a single isoform of AdoMet synthetase (gene metK), there are two in budding yeast (genes SAM1 and SAM2) and in mammals while in plants there is generally a multigene family.

    The sequence of AdoMet synthetase is highly conserved throughout isozymes and species. The active sites of both the Escherichia coli and rat liver MAT reside between two subunits, with contributions from side chains of residues from both subunits, resulting in a dimer as the minimal catalytic entity. The side chains that contribute to the ligand binding sites are conserved between the two proteins. In the structures of complexes with the E. coli enzyme, the phosphate groups have the same positions in the (PPi plus Pi) complex and the (ADP plus Pi) complex, and are located at the bottom of a deep cavity with the adenosyl group nearer the entrance.

    Proteins where this domain is known:
    PY06246   


    PF00439 - Bromodomain (Pfam link)

    Interpro entry IPR001487 : (Interpro link)

    Pfam description:
    Bromodomains are 110 amino acid long domains, that are found in many chromatin associated proteins. Bromodomains can interact specifically with acetylated lysine.

    Interpro description:
    Bromodomains are found in a variety of mammalian, invertebrate and yeast DNA-binding proteins. Bromodomains can interact with acetylated lysine. In some proteins, the classical bromodomain has diverged to such an extent that parts of the region are either missing or contain an insertion (e.g., mammalian protein HRX, Caenorhabditis elegans hypothetical protein ZK783.4, yeast protein YTA7). The bromodomain may occur as a single copy, or in duplicate.

    The precise function of the domain is unclear, but it may be involved in protein-protein interactions and may play a role in assembly or activity of multi-component complexes involved in transcriptional activation.

    Proteins where this domain is known:
    PY00423    PY00684    PY02679    PY03146    PY03752    PY04922    PY05553   

    Proteins where this domain has been detected by our approach:
    PY02111    PY02268    PY04642   


    PF00443 - UCH (Pfam link)

    Interpro entry IPR001394 : Peptidase C19, ubiquitin carboxyl-terminal hydrolase 2 (Interpro link)

    Interpro description:

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad.

    This group of cysteine peptidases belong to the MEROPS peptidase family C19 (ubiquitin-specific protease family, clan CA). Families within the CA clan are loosely termed papain-like as protein fold of the peptidase unit resembles that of papain, the type example for clan CA. Predicted active site residues for members of this family and family C1 occur in the same order in the sequence: N/Q, C, H. The type example is human ubiquitin-specific protease 14.

    Ubiquitin is highly conserved, commonly found conjugated to proteins in eukaryotic cells, where it may act as a marker for rapid degradation, or it may have a chaperone function in protein assembly. The ubiquitin is released by cleavage from the bound protein by a protease. A number of deubiquitinising proteases are known: all are activated by thiol compounds, and inhibited by thiol-blocking agents and ubiquitin aldehyde, and as such have the properties of cysteine proteases.

    The deubiquitinsing proteases can be split into 2 size ranges (20-30 kDa and 100-200 kDa): this family are the 100-200 kDa peptides which includes the Ubp1 ubiquitin peptidase from yeast. Only one conserved cysteine can be identified, along with two conserved histidines. The spacing between the cysteine and the second histidine is thought to be more representative of the cysteine/histidine spacing of a cysteine protease catalytic dyad.

    Proteins where this domain is known:
    PY00546    PY01242    PY01440    PY02443    PY03410    PY03738    PY03802    PY04608    PY05772   


    PF00448 - SRP54 (Pfam link)

    Interpro entry IPR000897 : Signal recognition particle, SRP54 subunit, GTPase (Interpro link)

    Pfam description:
    This family includes relatives of the G-domain of the SRP54 family of proteins.

    Interpro description:

    The signal recognition particle (SRP) is a multimeric protein, which along with its conjugate receptor (SR), is involved in targeting secretory proteins to the rough endoplasmic reticulum (RER) membrane in eukaryotes, or to the plasma membrane in prokaryotes. SRP recognises the signal sequence of the nascent polypeptide on the ribosome, retards its elongation, and docks the SRP-ribosome-polypeptide complex to the RER membrane via the SR receptor. SRP consists of six polypeptides (SRP9, SRP14, SRP19, SRP54, SRP68 and SRP72) and a single 300 nucleotide 7S RNA molecule. The RNA component catalyses the interaction of SRP with its SR receptor. In higher eukaryotes, the SRP complex consists of the Alu domain and the S domain linked by the SRP RNA. The Alu domain consists of a heterodimer of SRP9 and SRP14 bound to the 5' and 3' terminal sequences of SRP RNA. This domain is necessary for retarding the elongation of the nascent polypeptide chain, which gives SRP time to dock the ribosome-polypeptide complex to the RER membrane.

    This entry represents the GTPase domain of the 54 kDa SRP54 component, a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. SRP54 of the signal recognition particle has a three-domain structure: an N-terminal helical bundle domain, a GTPase domain, and the M-domain that binds the 7s RNA and also binds the signal sequence. The extreme C-terminal region is glycine-rich and lower in complexity and poorly conserved between species. The GTPase domain is evolutionary related to P-loop NTPase domains found in a variety of other proteins.

    These proteins include Escherichia coli and Bacillus subtilis ffh protein (P48), which seems to be the prokaryotic counterpart of SRP54; signal recognition particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which ensures, in conjunction with SRP, the correct targeting of nascent secretory proteins to the endoplasmic reticulum membrane; bacterial FtsY protein, which is believed to play a similar role to that of the docking protein in eukaryotes; the pilA protein from Neisseria gonorrhoeae, the homolog of ftsY; and bacterial flagellar biosynthesis protein flhF.

    Proteins where this domain is known:
    PY04912    PY06341   


    PF00453 - Ribosomal_L20 (Pfam link)

    Interpro entry IPR005813 : Ribosomal protein L20 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    L20 is a protein from the large (50S) subunit; in Escherichia coli it is known to bind directly to the 23S rRNA, and is required for ribosome assembly, but does not take part in protein synthesis. It belongs to a family of ribosomal proteins, including L20 from eubacteria, plant and alga chloroplasts and cyanelles.

    Proteins where this domain is known:
    PY02256   


    PF00454 - PI3_PI4_kinase (Pfam link)

    Interpro entry IPR000403 : Phosphatidylinositol 3- and 4-kinase, catalytic (Interpro link)

    Pfam description:
    Some members of this family probably do not have lipid kinase activity and are protein kinases, e.g. Swiss:P42345.

    Interpro description:

    Protein kinases are a group of enzymes that possess a catalytic subunit which transfers the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. The enzymes fall into two broad classes, characterised with respect to substrate specificity: serine/threonine specific and tyrosine specific.

    Protein kinase function has been evolutionarily conserved from Escherichia coli to human. Protein kinases play a role in a mulititude of cellular processes, including division, proliferation, apoptosis, and differentiation. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins.

    The catalytic subunits of protein kinases are highly conserved, and several structures have been solved, leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases.

    Phosphatidylinositol 3-kinase (PI3-kinase) is an enzyme that phosphorylates phosphoinositides on the 3-hydroxyl group of the inositol ring. The three products of PI3-kinase - PI-3-P, PI-3,4-P(2) and PI-3,4,5-P(3) function as secondary messengers in cell signalling. Phosphatidylinositol 4-kinase (PI4-kinase) is an enzyme that acts on phosphatidylinositol (PI) in the first committed step in the production of the secondary messenger inositol-1'4'5'-trisphosphate. This domain is also present in a wide range of protein kinases, involved in diverse cellular functions, such as control of cell growth, regulation of cell cycle progression, a DNA damage checkpoint, recombination, and maintenance of telomere length. Despite significant homology to lipid kinases, no lipid kinase activity has been demonstrated for any of the PIK-related kinases.

    The PI3- and PI4-kinases share a well conserved domain at their C-terminal section; this domain seems to be distantly related to the catalytic domain of protein kinases . The catalytic domain of PI3K has the typical bilobal structure that is seen in other ATP-dependent kinases, with a small N-terminal lobe and a large C-terminal lobe. The core of this domain is the most conserved region of the PI3Ks. The ATP cofactor binds in the crevice formed by the N-and C-terminal lobes, a loop between two strands provides a hydrophobic pocket for binding of the adenine moiety, and a lysine residue interacts with the alpha-phosphate. In contrast to protein kinases, the PI3K loop which interacts with the phosphates of the ATP and is known as the glycine-rich or P-loop, contains no glycine residues. Instead, contact with the ATP -phosphate is maintained through the side chain of a conserved serine residue.

    Proteins where this domain is known:
    PY00334    PY04039    PY06311   


    PF00456 - Transketolase_N (Pfam link)

    Interpro entry IPR005474 : (Interpro link)

    Pfam description:
    This family includes transketolase enzymes EC:2.2.1.1. and also partially matches to 2-oxoisovalerate dehydrogenase beta subunit Swiss:P37941 EC:1.2.4.4. Both these enzymes utilise thiamine pyrophosphate as a cofactor, suggesting there may be common aspects in their mechanism of catalysis.

    Interpro description:

    Transketolase (TK) catalyzes the reversible transfer of a two-carbon ketol unit from xylulose 5-phosphate to an aldose receptor, such as ribose 5-phosphate, to form sedoheptulose 7-phosphate and glyceraldehyde 3- phosphate. This enzyme, together with transaldolase, provides a link between the glycolytic and pentose-phosphate pathways. TK requires thiamine pyrophosphate as a cofactor. In most sources where TK has been purified, it is a homodimer of approximately 70 Kd subunits. TK sequences from a variety of eukaryotic and prokaryotic sources show that the enzyme has been evolutionarily conserved. In the peroxisomes of methylotrophic yeast Pichia angusta (Yeast) (Hansenula polymorpha), there is a highly related enzyme, dihydroxy-acetone synthase (DHAS)(also known as formaldehyde transketolase), which exhibits a very unusual specificity by including formaldehyde amongst its substrates.

    1-deoxyxylulose-5-phosphate synthase (DXP synthase) is an enzyme so far found in bacteria (gene dxs) and plants (gene CLA1) which catalyzes the thiamine pyrophosphoate-dependent acyloin condensation reaction between carbon atoms 2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xylulose-5-phosphate (dxp), a precursor in the biosynthetic pathway to isoprenoids, thiamine (vitamin B1), and pyridoxol (vitamin B6). DXP synthase is evolutionary related to TK. The N-terminal section, contains a histidine residue which appears to function in proton transfer during catalysis . In the central section there are conserved acidic residues that are part of the active cleft and may participate in substrate-binding. This family includes transketolase enzymesand also partially matches to 2-oxoisovalerate dehydrogenase beta subunit. Both these enzymes utilise thiamine pyrophosphate as a cofactor, suggesting there may be common aspects in their mechanism of catalysis.

    Proteins where this domain is known:
    PY03111   


    PF00462 - Glutaredoxin (Pfam link)

    Interpro entry IPR002109 : Glutaredoxin (Interpro link)

    Interpro description:

    Glutaredoxins, also known as thioltransferases (disulphide reductases, are small proteins of approximately one hundred amino-acid residues which utilise glutathione and NADPH as cofactors. Oxidized glutathione is regenerated by glutathione reductase. Together these components compose the glutathione system.

    Glutaredoxin functions as an electron carrier in the glutathione-dependent synthesis of deoxyribonucleotides by the enzyme ribonucleotide reductase. Like thioredoxin, which functions in a similar way, glutaredoxin possesses an active centre disulphide bond. It exists in either a reduced or an oxidized form where the two cysteine residues are linked in an intramolecular disulphide bond.

    Glutaredoxin has been sequenced in a variety of species. On the basis of extensive sequence similarity, it has been proposed that Vaccinia virus protein O2L is most probably a glutaredoxin. Finally, it must be noted that Bacteriophage T4 thioredoxin seems also to be evolutionary related. In position 5 of the pattern T4 thioredoxin has Val instead of Pro.

    This entry represents Glutaredoxin.

    Proteins where this domain is known:
    PY00223    PY03169    PY05059    PY05305    PY07597   


    PF00464 - SHMT (Pfam link)

    Interpro entry IPR001085 : Glycine hydroxymethyltransferase (Interpro link)

    Interpro description:
    Synonym(s): Serine hydroxymethyltransferase, Serine aldolase, Threonine aldolase

    Serine hydroxymethyltransferase (SHMT) is a pyridoxal phosphate (PLP) dependent enzyme and belongs to the aspartate aminotransferase superfamily (fold type I). The pyridoxal-P group is attached to a lysine residue around which the sequence is highly conserved in all forms of the enzyme. The enzyme carries out interconversion of serine and glycine using PLP as the cofactor. SHMT catalyses the transfer of a hydroxymethyl group from N5, N10- methylene tetrahydrofolate to glycine, resulting in the formation of serine and tetrahydrofolate. Both eukaryotic and prokaryotic SHMT enzymes form tight obligate homodimers and the mammalian enzyme forms a homotetramer. PLP dependent enzymes were previously classified into alpha, beta and gamma classes, based on the chemical characteristics (carbon atom involved) of the reaction they catalysed. The availability of several structures allowed a comprehensive analysis of the evolutionary classification of PLP dependent enzymes, and it was found that the functional classification did not always agree with the evolutionary history of these enzymes. Structure and sequence analysis has revealed that the PLP dependent enzymes can be classified into four major groups of different evolutionary origin: aspartate aminotransferase superfamily (fold type I), tryptophan synthase beta superfamily (fold type II), alanine racemase superfamily (fold type III), D-amino acid superfamily (fold type IV) and glycogen phophorylase family (fold type V).

    In vertebrates, glycine hydroxymethyltransferase exists in a cytoplasmic and a mitochondrial form whereas only one form is found in prokaryotes.

    Proteins where this domain is known:
    PY00669    PY00962   


    PF00466 - Ribosomal_L10 (Pfam link)

    Interpro entry IPR001790 : Ribosomal protein L10 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    On the basis of sequence similarities the following prokaryotic and eukaryotic ribosomal proteins can be grouped:

    Proteins where this domain is known:
    PY00659    PY03082   


    PF00467 - KOW (Pfam link)

    Interpro entry IPR005824 : (Interpro link)

    Pfam description:
    This family has been extended to coincide with ref. The KOW (Kyprides, Ouzounis, Woese) motif is found in a variety of ribosomal proteins and NusG.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    The KOW (Kyprides, Ouzounis, Woese) motif is found in a variety of ribosomal proteins and the bacterial transcription antitermination proteins NusG.

    Proteins where this domain is known:
    PY04507    PY06733   

    Proteins where this domain has been detected by our approach:
    PY00983    PY01750    PY03112    PY03779   


    PF00472 - RF-1 (Pfam link)

    Interpro entry IPR000352 : Class I peptide chain release factor (Interpro link)

    Pfam description:
    This domain is found in peptide chain release factors such as RF-1 (Swiss:P07011) and RF-2 (Swiss:P07012), and a number of smaller proteins of unknown function such as Swiss:P40711. This domain contains the peptidyl-tRNA hydrolase activity. The domain contains a highly conserved motif GGQ, where the glutamine is thought to coordinate the water that mediates the hydrolysis.

    Interpro description:
    Peptide chain release factors (RFs) are required for the termination of protein biosynthesis. At present two classes of RFs can be distinguished. Class I RFs bind to ribosomes that have encountered a stop codon at their decoding site and induce release of the nascent polypeptide. Class II RFs are GTP-binding proteins that interact with class I RFs and enhance class I RF activity. In prokaryotes there are two class I RFs that act in a codon specific manner: RF-1 (gene prfA) mediates UAA and UAG-dependent termination while RF-2 (gene prfB) mediates UAA and UGA-dependent termination. RF-1 and RF-2 are structurally and evolutionary related proteins which have been shown to be part of a larger family.

    Proteins where this domain is known:
    PY03620    PY04145    PY04471   


    PF00476 - DNA_pol_A (Pfam link)

    Interpro entry IPR001098 : DNA-directed DNA polymerase, family A (Interpro link)

    Interpro description:
    Synonym(s): DNA nucleotidyltransferase (DNA-directed)

    DNA-directed DNA polymerases are the key enzymes catalysing the accurate replication of DNA. They require either a small RNA molecule or a protein as a primer for the de novo synthesis of a DNA chain. A number of polymerases belong to this family.

    Proteins where this domain is known:
    PY00163    PY02328   


    PF00478 - IMPDH (Pfam link)

    Interpro entry IPR001093 : IMP dehydrogenase/GMP reductase (Interpro link)

    Pfam description:
    This family is involved in biosynthesis of guanosine nucleotide. Members of this family contain a TIM barrel structure. In the inosine monophosphate dehydrogenases 2 CBS domains Pfam:PF00571 are inserted in the TIM barrel. This family is a member of the common phosphate binding site TIM barrel family.

    Interpro description:
    Synonym(s): Inosine-5'-monophosphate dehydrogenase, Inosinic acid dehydrogenase; Synonym(s): Guanosine 5'-monophosphate oxidoreductase

    This entry contains two related enzymes IMP dehydrogenase and GMP reducatase. These enzymes adopt a TIM barrel structure.

    IMP dehydrogenase (IMPDH) catalyzes the rate-limiting reaction of de novo GTP biosynthesis, the NAD-dependent reduction of IMP into XMP.

     Inosine 5-phosphate + NAD+ + H2O = xanthosine 5-phosphate + NADH 
    IMP dehydrogenase is associated with cell proliferation and is a possible target for cancer chemotherapy. Mammalian and bacterial IMPDHs are tetramers of identical chains. There are two IMP dehydrogenase isozymes in humans. IMP dehydrogenase nearly always contains a long insertion that has two CBS domains within it.

    GMP reductase catalyzes the irreversible and NADPH-dependent reductive deamination of GMP into IMP.

     NADPH + guanosine 5-phosphate = NADP+ + inosine 5-phosphate + NH3  
    It converts nucleobase, nucleoside and nucleotide derivatives of G to A nucleotides, and maintains intracellular balance of A and G nucleotides.

    Proteins where this domain is known:
    PY03601   


    PF00479 - G6PD_N (Pfam link)

    Interpro entry IPR001282 : Glucose-6-phosphate dehydrogenase (Interpro link)

    Interpro description:

    Glucose-6-phosphate dehydrogenase (G6PDH) is a ubiquitous protein, present in bacteria and all eukaryotic cell types. The enzyme catalyses the the first step in the pentose pathway, i.e. the conversion of glucose-6-phosphate to gluconolactone 6-phosphate in the presence of NADP, producing NADPH. The ubiquitous expression of the enzyme gives it a major role in the production of NADPH for the many NADPH-mediated reductive processes in all cells. Deficiency of G6PDH is a common genetic abnormality affecting millions of people worldwide. Many sequence variants, most caused by single point mutations, are known, exhibiting a wide variety of phenotypes.

    Proteins where this domain is known:
    PY00793   


    PF00481 - PP2C (Pfam link)

    Interpro entry IPR014045 : (Interpro link)

    Pfam description:
    Protein phosphatase 2C is a Mn++ or Mg++ dependent protein serine/threonine phosphatase.

    Interpro description:

    This domain is found in protein phosphatase 2C, as well as other proteins eg. pyruvate dehydrogenase (lipoamide)]-phosphataseand adenylate cyclase

    Protein phosphatase 2C (PP2C) is one of the four major classes of mammalian serine/threonine specific protein phosphatases PP2C is a monomeric enzyme of about 42 Kd which shows broad substrate specificity and is dependent on divalent cations (mainly manganese and magnesium) for its activity. Its exact physiological role is still unclear. Three isozymes are currently known in mammals: PP2C-alpha, -beta and -gamma. In yeast, there are at least four PP2C homologs: phosphatase PTC1 which has weak tyrosine phosphatase activity in addition to its activity on serines, phosphatases PTC2 and PTC3, and hypothetical protein YBR125c. Isozymes of PP2C are also known from Arabidopsis thaliana (ABI1, PPH1), Caenorhabditis elegans (FEM-2, F42G9.1, T23F11.1), Leishmania chagasi and Paramecium tetraurelia. In A. thaliana, the kinase associated protein phosphatase (KAPP) is an enzyme that dephosphorylates the Ser/Thr receptor-like kinase RLK5 and which contains a C-terminal PP2C domain.

    PP2C does not seem to be evolutionary related to the main family of serine/ threonine phosphatases: PP1, PP2A and PP2B. However, it is significantly similar to the catalytic subunit of pyruvate dehydrogenase phosphatase (PDPC) , which catalyzes dephosphorylation and concomitant reactivation of the alpha subunit of the E1 component of the pyruvate dehydrogenase complex. PDPC is a mitochondrial enzyme and, like PP2C, is magnesium-dependent.

    Proteins where this domain is known:
    PY00861    PY01654    PY01995    PY02192    PY02310    PY02322    PY02901    PY06111    PY06845   


    PF00483 - NTP_transferase (Pfam link)

    Interpro entry IPR005835 : Nucleotidyl transferase (Interpro link)

    Pfam description:
    This family includes a wide range of enzymes which transfer nucleotides onto phosphosugars.

    Interpro description:

    Nucleotidyl transferases transfer nucleotides from one compound to another. This domain is found in a number of enzymes that transfer nucleotides onto phosphosugars.

    Proteins where this domain is known:
    PY05717   


    PF00487 - FA_desaturase (Pfam link)

    Interpro entry IPR005804 : Fatty acid desaturase, type 1 (Interpro link)

    Interpro description:

    Fatty acid desaturases are enzymes that catalyse the insertion of a double bond at the delta position of fatty acids.

    There seem to be two distinct families of fatty acid desaturases which do not seem to be evolutionary related.

    Family 1 is composed of:

    Family 2 is composed of:

    This entry contains fatty acid desaturases belonging to Family 1.

    Proteins where this domain is known:
    PY04895   


    PF00488 - MutS_V (Pfam link)

    Interpro entry IPR000432 : DNA mismatch repair protein MutS, C-terminal (Interpro link)

    Pfam description:
    This domain is found in proteins of the MutS family (DNA mismatch repair proteins) and is found associated with Pfam:PF01624, Pfam:PF05188, Pfam:PF05192 and Pfam:PF05190. The mutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair; other members of the family included the eukaryotic MSH 1,2,3, 4,5 and 6 proteins. These have various roles in DNA repair and recombination. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein. The aligned region corresponds with domain V of Thermus aquaticus MutS as characterised in, which contains a Walker A motif, and is structurally similar to the ATPase domain of ABC transporters.

    Interpro description:

    Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA.

    MutS is a modular protein with a complex structure, and is composed of:

    Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts.

    This entry represents the C-terminal region found in proteins in the MutS family of DNA mismatch repair proteins. The C-terminal region of MutS is comprised of the ATPase domain and the HTH (helix-turn-helix) domain, the latter being involved in dimer contacts. Yeast MSH3, bacterial proteins involved in DNA mismatch repair, and the predicted protein product of the Rep-3 gene of mouse share extensive sequence similarity. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein.

    Proteins where this domain is known:
    PY01096    PY02936    PY07191   


    PF00490 - ALAD (Pfam link)

    Interpro entry IPR001731 : Tetrapyrrole biosynthesis, porphobilinogen synthase (Interpro link)

    Interpro description:

    Tetrapyrroles are large macrocyclic compounds derived from a common biosynthetic pathway. The end-product, uroporphyrinogen III, is used to synthesise a number of important molecules, including vitamin B12, haem, sirohaem, chlorophyll, coenzyme F430 and phytochromobilin.

    The first stage in tetrapyrrole synthesis is the synthesis of 5-aminoaevulinic acid ALA via two possible routes: (1) condensation of succinyl CoA and glycine (C4 pathway) using ALA synthase, or (2) decarboxylation of glutamate (C5 pathway) via three different enzymes, glutamyl-tRNA synthetase to charge a tRNA with glutamate, glutamyl-tRNA reductase to reduce glutamyl-tRNA to glutamate-1-semialdehyde (GSA), and GSA aminotransferase to catalyse a transamination reaction to produce ALA.

    The second stage is to convert ALA to uroporphyrinogen III, the first macrocyclic tetrapyrrolic structure in the pathway. This is achieved by the action of three enzymes in one common pathway: porphobilinogen (PBG) synthase (or ALA dehydratase) to condense two ALA molecules to generate porphobilinogen; hydroxymethylbilane synthase (or PBG deaminase) to polymerise four PBG molecules into preuroporphyrinogen (tetrapyrrole structure); and uroporphyrinogen III synthase to link two pyrrole units together (rings A and D) to yield uroporphyrinogen III.

    Uroporphyrinogen III is the first branch point of the pathway. To synthesise cobalamin (vitamin B12), sirohaem, and coenzyme F430, uroporphyrinogen III needs to be converted into precorrin-2 by the action of uroporphyrinogen III methyltransferase. To synthesise haem and chlorophyll, uroporphyrinogen III needs to be decarboxylated into coproporphyrinogen III by the action of uroporphyrinogen III decarboxylase.

    This entry represents porphobilinogen (PBG) synthase (PBGS, or 5-aminoaevulinic acid dehydratase, or ALAD), which functions during the second stage of tetrapyrrole biosynthesis. This enzyme catalyses a Knorr-type condensation reaction between two molecules of ALA to generate porphobilinogen, the pyrrolic building block used in later steps. The structure of the enzyme is based on a TIM barrel topology made up of eight identical subunits, where each subunit binds to a metal ion that is essential for activity, usually zinc (in yeast, mammals and certain bacteria) or magnesium (in plants and other bacteria). A lysine has been implicated in the catalytic mechanism. The lack of PBGS enzyme causes a rare porphyric disorder known as ALAD porphyria, which appears to involve conformational changes in the enzyme.

    Proteins where this domain is known:
    PY04302   


    PF00491 - Arginase (Pfam link)

    Interpro entry IPR006035 : Ureohydrolase (Interpro link)

    Interpro description:

    The ureohydrolase superfamily includes arginase, agmatinase, formiminoglutamase and proclavaminate amidinohydrolase. These enzymes share a 3-layer alpha-beta-alpha structure, and play important roles in arginine/agmatine metabolism, the urea cycle, histidine degradation, and other pathways.

    Arginase, which catalyses the conversion of arginine to urea and ornithine, is one of the five members of the urea cycle enzymes that convert ammonia to urea as the principal product of nitrogen excretion. There are several arginase isozymes that differ in catalytic, molecular and immunological properties. Deficiency in the liver isozyme leads to argininemia, which is usually associated with hyperammonemia.

    Agmatinase hydrolyses agmatine to putrescine, the precursor for the biosynthesis of higher polyamines, spermidine and spermine. In addition, agmatine may play an important regulatory role in mammals.

    Formiminoglutamase catalyses the fourth step in histidine degradation, acting to hydrolyse N-formimidoyl-L-glutamate to L-glutamate and formamide.

    Proclavaminate amidinohydrolase is involved in clavulanic acid biosynthesis. Clavulanic acid acts as an inhibitor of a wide range of beta-lactamase enzymes that are used by various microorganisms to resist beta-lactam antibiotics. As a result, this enzyme improves the effectiveness of beta-lactamase antibiotics.

    Proteins where this domain is known:
    PY03443    PY05019   


    PF00493 - MCM (Pfam link)

    Interpro entry IPR001208 : DNA-dependent ATPase MCM (Interpro link)

    Interpro description:

    MCM proteins are DNA-dependent ATPases required for the initiation of eukaryotic DNA replication. In eukaryotes there is a family of six proteins, MCM2 to MCM7. They were first identified in yeast where most of them have a direct role in the initiation of chromosomal DNA replication by interacting directly with autonomously replicating sequences (ARS). They were thus called minichromosome maintenance proteins, MCM proteins.

    This family is also present in the archebacteria in 1 to 4 copies. Methanocaldococcus jannaschii (Methanococcus jannaschii) has four members, MJ0363, MJ0961, MJ1489 and MJECL13.

    The "MCM motif" contains Walker-A and Walker-B type nucleotide binding motifs. The diagnostic sequence defining the MCMs is IDEFDKM. Only Mcm2 (aka Cdc19 or Nda1) has been subjected to mutational analysis in this region, and most mutations abolish its activity. The presence of a putative ATP-binding domain implies that these proteins may be involved in an ATP-consuming step in the initiation of DNA replication in eukaryotes.

    The MCM proteins bind together in a large complex. Within this complex, individual subunits associate with different affinities, and there is a tightly associated core of Mcm4 (Cdc21), Mcm6 (Mis5) and Mcm7. This core complex in human MCMs has been associated with helicase activity in vitro, leading to the suggestion that the MCM proteins are the eukaryotic replicative helicase.

    Schizosaccharomyces pombe (Fission yeast) MCMs, like those in metazoans, are found in the nucleus throughout the cell cycle. This is in contrast to the Saccharomyces cerevisiae (Baker's yeast) in which MCM proteins move in and out of the nucleus during each cell cycle. The assembly of the MCM complex in S. pombe is required for MCM localisation, ensuring that only intact MCM complexes remain in the nucleus.

    Proteins where this domain is known:
    PY01644    PY02431    PY02857    PY03236    PY03411    PY03736    PY04668    PY04888   


    PF00498 - FHA (Pfam link)

    Interpro entry IPR000253 : (Interpro link)

    Pfam description:
    The FHA (Forkhead-associated) domain is a phosphopeptide binding motif.

    Interpro description:

    The forkhead-associated (FHA) domain is a phosphopeptide recognition domain found in many regulatory proteins. It displays specificity for phosphothreonine-containing epitopes but will also recognise phosphotyrosine with relatively high affinity. It spans approximately 80-100 amino acid residues folded into an 11-stranded beta sandwich, which sometimes contain small helical insertions between the loops connecting the strands.

    To date, genes encoding FHA-containing proteins have been identified in eubacterial and eukaryotic but not archaeal genomes. The domain is present in a diverse range of proteins, such as kinases, phosphatases, kinesins, transcription factors, RNA-binding proteins and metabolic enzymes which partake in many different cellular processes - DNA repair, signal transduction, vesicular transport and protein degradation are just a few examples.

    Proteins where this domain is known:
    PY00031    PY00240    PY01765    PY03501    PY06836   


    PF00501 - AMP-binding (Pfam link)

    Interpro entry IPR000873 : AMP-dependent synthetase and ligase (Interpro link)

    Interpro description:

    A number of prokaryotic and eukaryotic enzymes, which appear to act via an ATP-dependent covalent binding of AMP to their substrate, share a region of sequence similarity, . This region is a Ser/Thr/Gly-rich domain that is further characterised by a conserved Pro-Lys-Gly triplet. The family of enzymes includes luciferase, long chain fatty acid Co-A ligase, acetyl-CoA synthetase and various other closely-related synthetases.

    Proteins where this domain is known:
    PY00791    PY01239    PY02707    PY05696    PY05999   


    PF00505 - HMG_box (Pfam link)

    Interpro entry IPR000910 : High mobility group, HMG1/HMG2 (Interpro link)

    Interpro description:

    High mobility group (HMG or HMGB) proteins are a family of relatively low molecular weight non-histone components in chromatin. HMG1 (also called HMG-T in fish) and HMG2 are two highly related proteins that bind single-stranded DNA preferentially and unwind double-stranded DNA. Although they have no sequence specificity, they have a high affinity for bent or distorted DNA, and bend linear DNA. HMG1 and HMG2 contain two DNA-binding HMG-box domains (A and B) that show structural and functional differences, and have a long acidic C-terminal domain rich in aspartic and glutamic acid residues. The acidic tail modulates the affinity of the tandem HMG boxes in HMG1 and 2 for a variety of DNA targets. HMG1 and 2 appear to play important architectural roles in the assembly of nucleoprotein complexes in a variety of biological processes, for example V(D)J recombination, the initiation of transcription, and DNA repair.

    The profile in this entry describing the HMG-domains is much more general than the signature. In addition to the HMG1 and HMG2 proteins, HMG-domains occur in single or multiple copies in the following protein classes; the SOX family of transcription factors; SRY sex determining region Y protein and related proteins; LEF1 lymphoid enhancer binding factor 1; SSRP recombination signal recognition protein; MTF1 mitochondrial transcription factor 1; UBF1/2 nucleolar transcription factors; Abf2 yeast ARS-binding factor; and Saccharomyces cerevisiae transcription factors Ixr1, Rox1, Nhp6a, Nhp6b and Spp41.

    Proteins where this domain is known:
    PY05184    PY06049    PY07077   


    PF00510 - COX3 (Pfam link)

    Interpro entry IPR000298 : Cytochrome c oxidase, subunit III (Interpro link)

    Interpro description:

    Cytochrome c oxidase is the terminal enzyme of the respiratory chain of mitochondria and many aerobic bacteria. It catalyses the transfer of electrons from reduced cytochrome c to molecular oxygen:

     4 cytochrome c+2 + 4 H+ + O2  -->  4 cytochrome  c+3 + 2 H2O

    This reaction is coupled to the pumping of four additional protons across the mitochondrial or bacterial membrane.

    Cytochrome c oxidase is an oligomeric enzymatic complex that is located in the mitochondrial inner membrane of eukaryotes and in the plasma membrane of aerobic prokaryotes. The core structure of prokaryotic and eukaryotic cytochrome c oxidase contains three common subunits, I, II and III. In prokaryotes, subunits I and III can be fused and a fourth subunit is sometimes found, whereas in eukaryotes there are a variable number of additional small polypeptidic subunits. The functional role of subunit III is not yet understood.

    As the bacterial respiratory systems are branched, they have a number of distinct terminal oxidases, rather than the single cytochrome c oxidase present in the eukaryotic mitochondrial systems. Although the cytochrome o oxidases do not catalyse the cytochrome c but the quinol (ubiquinol) oxidation they belong to the same haem-copper oxidase superfamily as cytochrome c oxidases. Members of this family share sequence similarities in all three core subunits: subunit I is the most conserved subunit, whereas subunit II is the least conserved.

    Proteins where this domain is known:
    PY00144    PY00770    PY00776    PY07297   


    PF00514 - Arm (Pfam link)

    Interpro entry IPR000225 : (Interpro link)

    Pfam description:
    Approx. 40 amino acid repeat. Tandem repeats form super-helix of helices that is proposed to mediate interaction of beta-catenin with its ligands. CAUTION: This family does not contain all known armadillo repeats.

    Interpro description:

    The armadillo (Arm) repeat is an approximately 40 amino acid long tandemly repeated sequence motif first identified in the Drosophila melanogaster segment polarity gene armadillo involved in signal transduction through wingless. Animal Arm-repeat proteins function in various processes, including intracellular signalling and cytoskeletal regulation, and include such proteins as beta-catenin, the junctional plaque protein plakoglobin, the adenomatous polyposis coli (APC) tumour suppressor protein, and the nuclear transport factor importin-alpha, amongst others. A subset of these proteins is conserved across eukaryotic kingdoms. In higher plants, some Arm-repeat proteins function in intracellular signalling like their mammalian counterparts, while others have novel functions.

    The 3-dimensional fold of an armadillo repeat is known from the crystal structure of beta-catenin, where the 12 repeats form a superhelix of alpha helices with three helices per unit. The cylindrical structure features a positively charged grove, which presumably interacts with the acidic surfaces of the known interaction partners of beta-catenin.

    Proteins where this domain is known:
    PY01164    PY01795    PY02199   

    Proteins where this domain has been detected by our approach:
    PY01200    PY01759   


    PF00515 - TPR_1 (Pfam link)

    Interpro entry IPR001440 : (Interpro link)

    Interpro description:

    The tetratrico peptide repeat (TPR) is a structural motif present in a wide range of proteins. It mediates protein-protein interactions and the assembly of multiprotein complexes. The TPR motif consists of 3-16 tandem-repeats of 34 amino acids residues, although individual TPR motifs can be dispersed in the protein sequence. Sequence alignment of the TPR domains reveals a consensus sequence defined by a pattern of small and large amino acids. TPR motifs have been identified in various different organisms, ranging from bacteria to humans. Proteins containing TPRs are involved in a variety of biological processes, such as cell cycle regulation, transcriptional control, mitochondrial and peroxisomal protein transport, neurogenesis and protein folding.

    The X-ray structure of a domain containing three TPRs from protein phosphatase 5 revealed that TPR adopts a helix-turn-helix arrangement, with adjacent TPR motifs packing in a parallel fashion, resulting in a spiral of repeating anti-parallel alpha-helices. The two helices are denoted helix A and helix B. The packing angle between helix A and helix B is ~24 degrees; within a single TPR and generates a right-handed superhelical shape. Helix A interacts with helix B and with helix A' of the next TPR. Two protein surfaces are generated: the inner concave surface is contributed to mainly by residue on helices A, and the other surface presents residues from both helices A and B.

    Proteins where this domain is known:
    PY00129    PY00398    PY00832    PY01200    PY01215    PY01491    PY01575    PY01708    PY02360    PY02799    PY03138    PY04697   

    Proteins where this domain has been detected by our approach:
    PY00139    PY01229    PY01292    PY01727    PY01817    PY01938    PY01944    PY02835    PY03072    PY03919    PY04510   


    PF00521 - DNA_topoisoIV (Pfam link)

    Interpro entry IPR002205 : DNA topoisomerase, type IIA, subunit A or C-terminal (Interpro link)

    Interpro description:

    DNA topoisomerases regulate the number of topological links between two DNA strands (i.e. change the number of superhelical turns) by catalysing transient single- or double-strand breaks, crossing the strands through one another, then resealing the breaks. These enzymes have several functions: to remove DNA supercoils during transcription and DNA replication; for strand breakage during recombination; for chromosome condensation; and to disentangle intertwined DNA during mitosis. DNA topoisomerases are divided into two classes: type I enzymes (topoisomerases I, III and V) break single-strand DNA, and type II enzymes (topoisomerases II, IV and VI) break double-strand DNA.

    Type II topoisomerases are ATP-dependent enzymes, and can be subdivided according to their structure and reaction mechanisms: type IIA (topoisomerase II or gyrase, and topoisomerase IV) and type IIB (topoisomerase VI). These enzymes are responsible for relaxing supercoiled DNA as well as for introducing both negative and positive supercoils.

    Type IIA topoisomerases together manage chromosome integrity and topology in cells. Topoisomerase II (called gyrase in bacteria) primarily introduces negative supercoils into DNA. In bacteria, topoisomerase II consists of two polypeptide subunits, gyrA and gyrB, which form a heterotetramer: (BA)2. In most eukaryotes, topoisomerase II consists of a single polypeptide, where the N- and C-terminal regions correspond to gyrB and gyrA, respectively; this topoisomerase II forms a homodimer that is equivalent to the bacterial heterotetramer. There are four functional domains in topoisomerase II: domain 1 (N-terminal of gyrB) is an ATPase, domain 2 (C-terminal of gyrB) is responsible for subunit interactions (differs between eukaryotic and bacterial enzymes), domain 3 (N-terminal of gyrA) is responsible for the breaking-rejoining function through its capacity to form protein-DNA bridges, and domain 4 (C-terminal of gyrA) is able to non-specifically bind DNA.

    Topoisomerase IV primarily decatenates DNA and relaxes positive supercoils, which is important in bacteria, where the circular chromosome becomes catenated, or linked, during replication. Topoisomerase IV consists of two polypeptide subunits, parE and parC, where parC is homologous to gyrA and parE is homologous to gyrB.

    This entry represents subunit A (gyrA and parC) of bacterial gyrase and topoisomerase IV, and the equivalent C-terminal region in eukaryotic topoisomerase II composed of a single polypeptide. This subunit has DNA-binding capacity.

    More information about this protein can be found at Protein of the Month: DNA Topoisomerase.

    Proteins where this domain is known:
    PY03394    PY07326   


    PF00530 - SRCR (Pfam link)

    Interpro entry IPR001190 : Speract/scavenger receptor (Interpro link)

    Pfam description:
    These domains are disulphide rich extracellular domains. These domains are found in several extracellular receptors and may be involved in protein-protein interactions.

    Interpro description:

    The egg peptide speract receptor is a transmembrane glycoprotein. Other members of this family include the macrophage scavenger receptor type I (a membrane glycoprotein implicated in the pathologic deposition of cholesterol in arterial walls during artherogenesis), an enteropeptidase and T-cell surface glycoprotein CD5 (may act as a receptor in regulating T-cell proliferation).

    Proteins where this domain is known:
    PY01071   


    PF00531 - Death (Pfam link)

    Interpro entry IPR000488 : Death (Interpro link)

    Interpro description:

    The death domain (DD) is a homotypic protein interaction module composed of a bundle of six alpha-helices. DD is related in sequence and structure to the death effector domain (DED, see and the caspase recruitment domain (CARD, see, which work in similar pathways and show similar interaction properties. DD bind each other forming oligomers. Mammals have numerous and diverse DD-containing proteins. Within these proteins, the DD domains can be found in combination with other domains, including: CARDs, DEDs, ankyrin repeats, caspase-like folds, kinase domains, leucine zippers, leucine-rich repeats (LRR), TIR domains, and ZU5 domains.

    Some DD-containing proteins are involved in the regulation of apoptosis and inflammation through their activation of caspases and NF-kappaB, which typically involves interactions with TNF (tumour necrosis factor) cytokine receptors. In humans, eight of the over 30 known TNF receptors contain DD in their cytoplasmic tails; several of these TNF receptors use caspase activation as a signalling mechanism. The DD mediates self-association of these receptors, thus giving the signal to downstream events that lead to apoptosis. Other DD-containing proteins, such as ankyrin, MyD88 and pelle, are probably not directly involved in cell death signalling. DD-containing proteins also have links to innate immunity, communicating with Toll family receptors through bipartite adapter proteins such as MyD88.

    Proteins where this domain has been detected by our approach:
    PY04013   


    PF00533 - BRCT (Pfam link)

    Interpro entry IPR001357 : BRCT (Interpro link)

    Pfam description:
    The BRCT domain is found predominantly in proteins involved in cell cycle checkpoint functions responsive to DNA damage. It has been suggested that the Retinoblastoma protein contains a divergent BRCT domain, this has not been included in this family. The BRCT domain of XRCC1 forms a homodimer in the crystal structure. This suggests that pairs of BRCT domains associate as homo- or heterodimers.

    Interpro description:

    The BRCT domain (after the C_terminal domain of a breast cancer susceptibility protein) is found predominantly in proteins involved in cell cycle checkpoint functions responsive to DNA damage, for example as found in the breast cancer DNA-repair protein BRCA1. The domain is an approximately 100 amino acid tandem repeat, which appears to act as a phospho-protein binding domain.

    A chitin biosynthesis protein from yeast also seems to belong to this group.

    Proteins where this domain is known:
    PY04496   

    Proteins where this domain has been detected by our approach:
    PY04698    PY04958    PY07549   


    PF00534 - Glycos_transf_1 (Pfam link)

    Interpro entry IPR001296 : Glycosyl transferase, group 1 (Interpro link)

    Pfam description:
    Mutations in this domain of Swiss:P37287 lead to disease (Paroxysmal Nocturnal haemoglobinuria). Members of this family transfer activated sugars to a variety of substrates, including glycogen, Fructose-6-phosphate and lipopolysaccharides. Members of this family transfer UDP, ADP, GDP or CMP linked sugars. The eukaryotic glycogen synthases may be distant members of this family.

    Interpro description:

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These enzymes catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates and related proteins into distinct sequence based families has been described. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    Proteins containign this domain transfer UDP, ADP, GDP or CMP linked sugars to a variety of substrates, including glycogen, fructose-6-phosphate and lipopolysaccharides. The bacterial enzymes are involved in various biosynthetic processes that include exopolysaccharide biosynthesis, lipopolysaccharide core biosynthesis and the biosynthesis of the slime polysaccaride colanic acid. Mutations in this domain of the human N-acetylglucosaminyl-phosphatidylinositol biosynthetic protein are the cause of paroxysmal nocturnal hemoglobinuria (PNH), an acquired hemolytic blood disorder characterised by venous thrombosis, erythrocyte hemolysis, infections and defective hematopoiesis.

    Proteins where this domain is known:
    PY03143   


    PF00535 - Glycos_transf_2 (Pfam link)

    Interpro entry IPR001173 : (Interpro link)

    Pfam description:
    Diverse family, transferring sugar from UDP-glucose, UDP-N-acetyl- galactosamine, GDP-mannose or CDP-abequose, to a range of substrates including cellulose, dolichol phosphate and teichoic acids.

    Interpro description:

    The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These enzymes catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates and related proteins into distinct sequence based families has been described. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.

    This domain is found in a diverse family of glycosyl transferases that transfer the sugar from UDP-glucose, UDP-N-acetyl-galactosamine, GDP-mannose or CDP-abequose, to a range of substrates including cellulose, dolichol phosphate and teichoic acids.

    Proteins where this domain is known:
    PY00944   


    PF00536 - SAM_1 (Pfam link)

    Interpro entry IPR001660 : (Interpro link)

    Pfam description:
    It has been suggested that SAM is an evolutionarily conserved protein binding domain that is involved in the regulation of numerous developmental processes in diverse eukaryotes. The SAM domain can potentially function as a protein interaction module through its ability to homo- and heterooligomerise with other SAM domains.

    Interpro description:

    The sterile alpha motif (SAM) domain is a putative protein interaction module present in a wide variety of proteins involved in many biological processes. The SAM domain that spreads over around 70 residues is found in diverse eukaryotic organisms. SAM domains have been shown to homo- and hetero-oligomerise, forming multiple self-association architectures and also binding to various non-SAM domain-containing proteins, nevertheless with a low affinity constant. SAM domains also appear to possess the ability to bind RNA. Smaug  a protein that helps to establish a morphogen gradient in Drosophila embryos by repressing the translation of nanos (nos) mRNA  binds to the 3' untranslated region (UTR) of nos mRNA via two similar hairpin structures. The 3D crystal structure of the Smaug RNA-binding region shows a cluster of positively charged residues on the Smaug-SAM domain, which could be the RNA-binding surface. This electropositive potential is unique among all previously determined SAM-domain structures and is conserved among Smaug-SAM homologs. These results suggest that the SAM domain might have a primary role in RNA binding.

    Structural analyses show that the SAM domain is arranged in a small five-helix bundle with two large interfaces. In the case of the SAM domain of EphB2, each of these interfaces is able to form dimers. The presence of these two distinct intermonomers binding surface suggest that SAM could form extended polymeric structures.

    Proteins where this domain has been detected by our approach:
    PY01730    PY01883   


    PF00542 - Ribosomal_L12 (Pfam link)

    Interpro entry IPR013823 : Ribosomal protein L7/L12, C-terminal (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This entry represents the C-terminal domain of the large subunit ribosomal proteins, known as the L7/L12 family. L7/L12 is present in each 50S subunit in four copies organised as two dimers. The L8 protein complex consisting of two dimers of L7/L12 and L10 in Escherichia coli ribosomes is assembled on the conserved region of 23 S rRNA termed the GTPase-associated domain. The L7/L12 dimer probably interacts with EF-Tu. L7 and L12 only differ in a single post translational modification of the addition of an acetyl group to the N terminus of L7.

    Proteins where this domain is known:
    PY03559   


    PF00549 - Ligase_CoA (Pfam link)

    Interpro entry IPR005811 : ATP-citrate lyase/succinyl-CoA ligase (Interpro link)

    Pfam description:
    This family includes the CoA ligases Succinyl-CoA synthetase alpha and beta chains, malate CoA ligase and ATP-citrate lyase. Some members of the family utilise ATP others use GTP.

    Interpro description:

    This entry represents a domain found in both the alpha and beta chains of succinyl-CoA synthase GDP-forming) and(ADP-forming)). This domain can also be found in ATP citrate synthase () and malate-CoA ligase (). Some members of the domain utilise ATP others use GTP.

    Proteins where this domain is known:
    PY02175    PY05049   

    Proteins where this domain has been detected by our approach:
    PY05175   


    PF00550 - PP-binding (Pfam link)

    Interpro entry IPR006163 : Phosphopantetheine-binding (Interpro link)

    Pfam description:
    A 4\'-phosphopantetheine prosthetic group is attached through a serine. This prosthetic group acts as a a \'swinging arm\' for the attachment of activated fatty acid and amino-acid groups. This domain forms a four helix bundle. This family includes members not included in Prosite. The inclusion of these members is supported by sequence analysis and functional evidence. The related domain of Swiss:P19828 has the attachment serine replaced by an alanine.

    Interpro description:

    Phosphopantetheine (or pantetheine 4' phosphate) is the prosthetic group of acyl carrier proteins (ACP) in some multienzyme complexes where it serves as a 'swinging arm' for the attachment of activated fatty acid and amino-acid groups.

    The amino-terminal region of the ACP proteins is well defined and consists of alpha four helices arranged in a right-handed bundle held together by interhelical hydrophobic interactions. The Asp-Ser-Leu (DSL)motif is conserved in all of the ACP sequences, and the 4'-PP prosthetic group is covalently linked via a phosphodiester bond to the serine residue. The DSL sequence is present at the amino terminus of helix II, a domain of the protein referred to as the recognition helix and which is responsible for the interaction of ACPs with the enzymes of type II fatty acid synthesis.

    Proteins where this domain is known:
    PY04779   


    PF00551 - Formyl_trans_N (Pfam link)

    Interpro entry IPR002376 : Formyl transferase, N-terminal (Interpro link)

    Pfam description:
    This family includes the following members. Glycinamide ribonucleotide transformylase catalyses the third step in de novo purine biosynthesis, the transfer of a formyl group to 5\'-phosphoribosylglycinamide. Formyltetrahydrofolate deformylase produces formate from formyl- tetrahydrofolate. Methionyl-tRNA formyltransferase transfers a formyl group onto the amino terminus of the acyl moiety of the methionyl aminoacyl-tRNA. Inclusion of the following members is supported by PSI-blast. HOXX_BRAJA (P31907) contains a related domain of unknown function. PRTH_PORGI (P46071) contains a related domain of unknown function. Y09P_MYCTU (Q50721) contains a related domain of unknown function.

    Interpro description:
    A number of formyl transferases belong to this group. Methionyl-tRNA formyltransferase transfers a formyl group onto the amino terminus of the acyl moiety of the methionyl aminoacyl-tRNA. The formyl group appears to play a dual role in the initiator identity of N-formylmethionyl-tRNA by promoting its recognition by IF2 and by impairing its binding to EFTU-GTP. Formyltetrahydrofolate dehydrogenase produces formate from formyl- tetrahydrofolate. This is the N-terminal domain of these enzymes and is found upstream of the C-terminal domain.

    The trifunctional glycinamide ribonucleotide synthetase-aminoimidazole ribonucleotide synthetase-glycinamide ribonucleotide transformylase catalyses the second, third and fifth steps in de novo purine biosynthesis. The glycinamide ribonucleotide transformylase belongs to this group.

    Proteins where this domain is known:
    PY03768   


    PF00552 - Integrase (Pfam link)

    Interpro entry IPR001037 : Integrase, C-terminal, retroviral (Interpro link)

    Pfam description:
    Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is composed of three domains. The amino-terminal domain is a zinc binding domain. The central domain is the catalytic domain Pfam:PF00665. This domain is the carboxyl terminal domain that is a non-specific DNA binding domain.

    Interpro description:

    Integrase comprises three domains capable of folding independently and whose three-dimensional structures are known. However, the manner in which the N-terminal, catalytic core, and C-terminal domains interact in the holoenzyme remains obscure. Numerous studies indicate that the enzyme functions as a multimer, minimally a dimer. The integrase proteins from Human immunodeficiency virus 1 (HIV-1) and Avian sarcoma virus (have been studied most carefully with respect to the structural basis of catalysis. Although the active site of avian virus integrase does not undergo significant conformational changes on binding the required metal cofactor, that of HIV-1 does. This active site-mediated conformational change in HIV-1 reorganises the catalytic core and C-terminal domains and appears to promote an interaction that is favourable for catalysis.

    Retroviral integrase is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. The presence of retrovirus integrase-related gene sequences in eukaryotes is known. Bacterial transposases involved in the transposition of the insertion sequence also belong to this group.

    HIV-1 integrase catalyses the incorporation of virally derived DNA into the human genome. This unique step in the virus life cycle provides a variety of points for intervention and hence is an attractive target for the development of new therapeutics for the treatment of AIDS. Substrate recognition by the retroviral integrase enzyme is critical for retroviral integration. To catalyze this recombination event, integrase must recognize and act on two types of substrates, viral DNA and host DNA, yet the necessary interactions exhibit markedly different degrees of specificity.

    Proteins where this domain is known:
    PY07014   


    PF00557 - Peptidase_M24 (Pfam link)

    Interpro entry IPR000994 : Peptidase M24, structural domain (Interpro link)

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This entry contains proteins that belong to MEROPS peptidase family M24 (clan MG), which share a common structural-fold, the "pita-bread" fold. The fold contains both alpha helices and an anti-parallel beta sheet within two structurally similar domains that are thought to be derived from an ancient gene duplication. The active site, where conserved, is located between the two domains. The fold is common to methionine aminopeptidase, aminopeptidase P, prolidase, agropine synthase and creatinase . Though many of these peptidases require a divalent cation, creatinase is not a metal-dependent enzyme.

    The entry also contains proteins that have lost catalytic activity, for example Spt16 , which is a component of the FACT complex. The crystal structure of the N terminal domain of Spt16, determined to 2.1A, reveals an aminopeptidase P fold whose enzymatic activity has been lost. This fold binds directly to histones H3-H4 through a interaction with their globular core domains, as well as with their N-terminal tails.

    The FACT complex is a stable heterodimer in Saccharomyces cerevisiae (Baker's yeast) comprising Spt16p ( ) and Pob3p (). The complex plays a role in transcription initiation and promotes binding of TATA-binding protein (TBP) to a TATA box in chromatin; it also facilitates RNA Polymerase II transcription elongation through nucleosomes by destabilizing and then reassembling nucleosome structure.

    Proteins where this domain is known:
    PY00802    PY00855    PY01653    PY02559    PY04617   

    Proteins where this domain has been detected by our approach:
    PY05380   


    PF00560 - LRR_1 (Pfam link)

    Interpro entry IPR001611 : Leucine-rich repeat (Interpro link)

    Pfam description:
    CAUTION: This Pfam may not find all Leucine Rich Repeats in a protein. Leucine Rich Repeats are short sequence motifs present in a number of proteins with diverse functions and cellular locations. These repeats are usually involved in protein-protein interactions. Each Leucine Rich Repeat is composed of a beta-alpha unit. These units form elongated non-globular structures. Leucine Rich Repeats are often flanked by cysteine rich domains.

    Interpro description:

    Glutamate synthase (GltS)1 is a key enzyme in the early stages of the assimilation of ammonia in bacteria, yeasts, and plants. In bacteria, L-glutamate is involved in osmoregulation, is the precursor for other amino acids, and can be the precursor for haem biosynthesis. In plants, GltS is especially essential in the reassimilation of ammonia released by photorespiration. On the basis of the amino acid sequence and the nature of the electron donor, three different classes of GltS can de defined as follows: 1) ferredoxin-dependent GltS (Fd-GltS), 2) NADPH-dependent GltS (NADPH-GltS), and 3) NADH-dependent GltS (properties of the three classes have been reviewed extensively). The enzyme is a complex iron-sulphur flavoprotein catalysing the reductive transfer of the amido nitrogen from L-glutamine to 2-oxoglutarate to form two molecules of L-glutamate via intramolecular channelling of ammonia from the amidotransferase domain to the FMN-binding domain.

    Reaction of amidotransferase domain:

      L-glutamine + H2O = L-glutamate + NH3 

    Reactions of FMN-binding domain:

      2-oxoglutarate + NH3 = 2-iminoglutarate + H2O 
    2e + FMNox = FMNred  
    2-iminoglutarate + FMNred = L-glutamate + FMNox  

    Proteins where this domain is known:
    PY01144    PY02061    PY02599    PY03257    PY03350    PY04003    PY05962    PY05963    PY05964    PY06687    PY06812    PY06840   

    Proteins where this domain has been detected by our approach:
    PY02453   


    PF00561 - Abhydrolase_1 (Pfam link)

    Interpro entry IPR000073 : (Interpro link)

    Pfam description:
    This catalytic domain is found in a very wide range of enzymes.

    Interpro description:

    The alpha/beta hydrolase fold is common to a number of hydrolytic enzymes of widely differing phylogenetic origin and catalytic function. The core of each enzyme is an alpha/beta-sheet (rather than a barrel), containing 8 strands connected by helices. The enzymes are believed to have diverged from a common ancestor, preserving the arrangement of the catalytic residues. All have a catalytic triad, the elements of which are borne on loops, which are the best conserved structural features of the fold. Esterase (EST) from Pseudomonas putida is a member of the alpha/beta hydrolase fold superfamily of enzymes.

    In most of the family members the beta-strands are parallels, but some have an inversion of the first strands, which gives it an antiparallel orientation. The catalytic triad residues are presented on loops. One of these is the nucleophile elbow and is the most conserved feature of the fold. Some other members lack one or all of the catalytic residues. Some members are therefore inactive but others are involved in surface recognition. The ESTHER database gathers and annotates all the published information related to gene and protein sequences of this superfamily.

    This entry represents fold-1 of alpha/beta hydrolase.

    Proteins where this domain is known:
    PY01235    PY04076    PY04358    PY05418    PY05572    PY06145   


    PF00562 - RNA_pol_Rpb2_6 (Pfam link)

    Interpro entry IPR007120 : DNA-directed RNA polymerase, subunit 2, domain 6 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). This domain represents the hybrid binding domain and the wall domain. The hybrid binding domain binds the nascent RNA strand / template DNA strand in the Pol II transcription elongation complex. This domain contains the important structural motifs, switch 3 and the flap loop and binds an active site metal io. This domain is also involved in binding to Rpb1 and Rpb3. Many of the bacterial members contain large insertions within this domain, as region known as dispensable region 2 (DRII).

    Interpro description:

    DNA-directed RNA polymerases(also known as DNA-dependent RNA polymerases) are responsible for the polymerisation of ribonucleotides into a sequence complementary to the template DNA. In eukaryotes, there are three different forms of DNA-directed RNA polymerases transcribing different sets of genes. Most RNA polymerases are multimeric enzymes and are composed of a variable number of subunits. The core RNA polymerase complex consists of five subunits (two alpha, one beta, one beta-prime and one omega) and is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme. The core RNA polymerase complex forms a "crab claw"-like structure with an internal channel running along the full length. The key functional sites of the enzyme, as defined by mutational and cross-linking analysis, are located on the inner wall of this channel.

    RNA synthesis follows after the attachment of RNA polymerase to a specific site, the promoter, on the template DNA strand. The RNA synthesis process continues until a termination sequence is reached. The RNA product, which is synthesised in the 5' to 3'direction, is known as the primary transcript. Eukaryotic nuclei contain three distinct types of RNA polymerases that differ in the RNA they synthesise:

    Eukaryotic cells are also known to contain separate mitochondrial and chloroplast RNA polymerases. Eukaryotic RNA polymerases, whose molecular masses vary in size from 500 to 700 kD, contain two non-identical large (>100 kDa) subunits and an array of up to 12 different small (less than 50 kDa) subunits.

    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain represents the hybrid-binding domain and the wall domain. The hybrid-binding domain binds the nascent RNA strand/template DNA strand in the Pol II transcription elongation complex. This domain contains the important structural motifs, switch 3 and the flap loop and binds an active site metal ion. This domain is also involved in binding to Rpb1 and Rpb3. Many of the bacterial members contain large insertions within this domain, which are known as dispensable region 2 (DRII).

    Proteins where this domain is known:
    PY01115    PY01847    PY05002   


    PF00565 - SNase (Pfam link)

    Interpro entry IPR006021 : Staphylococcal nuclease (SNase-like) (Interpro link)

    Pfam description:
    Present in all three domains of cellular life. Four copies in the transcriptional coactivator p100: these, however, appear to lack the active site residues of Staphylococcal nuclease. Positions 14 (Asp-21), 34 (Arg-35), 39 (Asp-40), 42 (Glu-43) and 110 (Arg-87) are thought to be involved in substrate-binding and catalysis.

    Interpro description:

    Staphylococcus aureus nuclease (SNase) homologues, previously thought to be restricted to bacteria and archaea, are also in eukaryotes. Staphylococcal nuclease has multidomain organization. The human cellular coactivator p100 contains four repeats, each of which is a SNase homologue. These repeats are unlikely to possess SNase-like activities as each lacks equivalent SNase catalytic residues, yet they may mediate p100's single-stranded DNA-binding function. alA variety of proteins including many that are still uncharacterised belong to this group.

    Proteins where this domain is known:
    PY01228   


    PF00566 - TBC (Pfam link)

    Interpro entry IPR000195 : RabGAP/TBC (Interpro link)

    Pfam description:
    Identification of a TBC domain in GYP6_YEAST and GYP7_YEAST, which are GTPase activator proteins of yeast Ypt6 and Ypt7, imply that these domains are GTPase activator proteins of Rab-like small GTPases.

    Interpro description:
    Identification of a TBC domain in GYP6_YEAST and GYP7_YEAST, which are GTPase activator proteins of yeast Ypt6 and Ypt7, imply that these domains are GTPase activator proteins of Rab-like small GTPases.

    Proteins where this domain is known:
    PY00727    PY01151    PY01735    PY01953    PY03684    PY05115    PY06258   


    PF00567 - TUDOR (Pfam link)

    Interpro entry IPR008191 : (Interpro link)

    Pfam description:
    Domain of unknown function present in several RNA-binding proteins. copies in the Drosophila Tudor protein.

    Interpro description:
    There are multiple copies of this domain in the Drosophila melanogaster tudor protein and it has been identified in several RNA-binding proteins. Although the function of this domain is unknown, in Drosophila melanogaster the tudor protein is required during oogenesis for the formation of primordial germ cells and for normal abdominal segmentation.

    Proteins where this domain is known:
    PY01228   


    PF00569 - ZZ (Pfam link)

    Interpro entry IPR000433 : Zinc finger, ZZ-type (Interpro link)

    Pfam description:
    Zinc finger present in dystrophin, CBP/p300. ZZ in dystrophin binds calmodulin. Putative zinc finger; binding not yet shown. Four to six cysteine residues in its sequence are responsible for coordinating zinc ions, to reinforce the structure.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents ZZ-type zinc finger domains, named because of their ability to bind two zinc ions. These domains contain 4-6 Cys residues that participate in zinc binding (plus additional Ser/His residues), including a Cys-X2-Cys motif found in other zinc finger domains. These zinc fingers are thought to be involved in protein-protein interactions. The structure of the ZZ domain shows that it belongs to the family of cross-brace zinc finger motifs that include the PHD, RING, and FYVE domains. ZZ-type zinc finger domains are found in:

    Single copies of the ZZ zinc finger occur in the transcriptional adaptor/coactivator proteins P300, in cAMP response element-binding protein (CREB)-binding protein (CBP) and ADA2. CBP provides several binding sites for transcriptional coactivators. The site of interaction with the tumour suppressor protein p53 and the oncoprotein E1A with CBP/P300 is a Cys-rich region that incorporates two zinc-binding motifs: ZZ-type and TAZ2-type. The ZZ-type zinc finger of CBP contains two twisted anti-parallel beta-sheets and a short alpha-helix, and binds two zinc ions. One zinc ion is coordinated by four cysteine residues via 2 Cys-X2-Cys motifs, and the third zinc ion via a third Cys-X-Cys motif and a His-X-His motif. The first zinc cluster is strictly conserved, whereas the second zinc cluster displays variability in the position of the two His residues.

    In Arabidopsis thaliana (Mouse-ear cress), the hypersensitive to red and blue 1 (Hrb1) protein, which regulating both red and blue light responses, contains a ZZ-type zinc finger domain.

    ZZ-type zinc finger domains have also been identified in the testis-specific E3 ubiquitin ligase MEX that promotes death receptor-induced apoptosis. MEX has four putative zinc finger domains: one ZZ-type, one SWIM-type and two RING-type. The region containing the ZZ-type and RING-type zinc fingers is required for interaction with UbcH5a and MEX self-association, whereas the SWIM domain was critical for MEX ubiquitination.

    In addition, the Cys-rich domains of dystrophin, utrophin and an 87kDa post-synaptic protein contain a ZZ-type zinc finger with high sequence identity to P300/CBP ZZ-type zinc fingers. In dystrophin and utrophin, the ZZ-type zinc finger lies between a WW domain (flanked by and EF hand) and the C-terminal coiled-coil domain. Dystrophin is thought to act as a link between the actin cytoskeleton and the extracellular matrix, and perturbations of the dystrophin-associated complex, for example, between dystrophin and the transmembrane glycoprotein beta-dystroglycan, may lead to muscular dystrophy. Dystrophin and its autosomal homologue utrophin interact with beta-dystroglycan via their C-terminal regions, which are comprised of a WW domain, an EF hand domain and a ZZ-type zinc finger domain. The WW domain is the primary site of interaction between dystrophin or utrophin and dystroglycan, while the EF hand and ZZ-type zinc finger domains stabilise and strengthen this interaction.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY03808   

    Proteins where this domain has been detected by our approach:
    PY00030    PY01886    PY05553    PY05680   


    PF00570 - HRDC (Pfam link)

    Interpro entry IPR002121 : Helicase and RNase D C-terminal, HRDC (Interpro link)

    Pfam description:
    The HRDC (Helicase and RNase D C-terminal) domain has a putative role in nucleic acid binding. Mutations in the HRDC domain cause human disease. It is interesting to note that the RecQ helicase in Deinococcus radiodurans has three tandem HRDC domains.

    Interpro description:
    The HRDC (Helicase and RNase D C-terminal) domain has a putative role in nucleic acid binding. Mutations in the HRDC domain associated with the human BLM gene result in Bloom Syndrome (BS), an autosomal recessive disorder characterised by proportionate pre- and postnatal growth deficiency; sun-sensitive, telangiectatic, hypo- and hyperpigmented skin; predisposition to malignancy; and chromosomal instability.

    Proteins where this domain has been detected by our approach:
    PY01271    PY03767   


    PF00571 - CBS (Pfam link)

    Interpro entry IPR000644 : (Interpro link)

    Pfam description:
    CBS domains are small intracellular modules that pair together to form a stable globular domain. This family represents a pair of CBS domains, that has been termed a Bateman domain. CBS domains have been shown to bind ligands with an adenosyl group such as AMP, ATP and S-AdoMet. CBS domains are found attached to a wide range of other protein domains suggesting that CBS domains may play a regulatory role making proteins sensitive to adenosyl carrying ligands. The region containing the CBS domains in Cystathionine-beta synthase is involved in regulation by S-AdoMet. CBS domain pairs from AMPK bind AMP or ATP. The CBS domains from IMPDH and the chloride channel CLC2 bind ATP.

    Interpro description:

    CBS (cystathionine-beta-synthase) domains are small intracellular modules, mostly found in two or four copies within a protein, that occur in a variety of proteins in bacteria, archaea, and eukaryotes.

    Tandem pairs of CBS domains can act as binding domains for adenosine derivatives and may regulate the activity of attached enzymatic or other domains. In some cases, CBS domains may act as sensors of cellular energy status by being activated by AMP and inhibited by ATP. In chloride ion channels, the CBS domains have been implicated in intracellular targeting and trafficking, as well as in protein-protein interactions, but results vary with different channels: in the CLC-5 channel, the CBS domain was shown to be required for trafficking, while in the CLC-1 channel, the CBS domain was shown to be critical for channel function, but not necessary for trafficking. Recent experiments revealing that CBS domains can bind adenosine-containing ligands such ATP, AMP, or S-adenosylmethionine have led to the hypothesis that CBS domains function as sensors of intracellular metabolites.

    Crystallographic studies of CBS domains have shown that pairs of CBS sequences form a globular domain where each CBS unit adopts a beta-alpha-beta-beta-alpha pattern. Crystal structure of the CBS domains of the AMP-activated protein kinase in complexes with AMP and ATP shows that the phosphate groups of AMP/ATP lie in a surface pocket at the interface of two CBS domains, which is lined with basic residues, many of which are associated with disease-causing mutations.

    In humans, mutations in conserved residues within CBS domains cause a variety of human hereditary diseases, including (with the gene mutated in parentheses): homocystinuria (cystathionine beta-synthase); Wolff-Parkinson-White syndrome (gamma 2 subunit of AMP-activated protein kinase); retinitis pigmentosa (IMP dehydrogenase-1); congenital myotonia, idiopathic generalized epilepsy, hypercalciuric nephrolithiasis, and classic Bartter syndrome (CLC chloride channel family members).

    Proteins where this domain is known:
    PY03601   

    Proteins where this domain has been detected by our approach:
    PY03209   


    PF00572 - Ribosomal_L13 (Pfam link)

    Interpro entry IPR005822 : Ribosomal protein L13 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L13 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L13 is known to be one of the early assembly proteins of the 50S ribosomal subunit.

    Proteins where this domain is known:
    PY07370   


    PF00573 - Ribosomal_L4 (Pfam link)

    Interpro entry IPR002136 : Ribosomal protein L4/L1e (Interpro link)

    Pfam description:
    This family includes Ribosomal L4/L1 from eukaryotes and archaebacteria and L4 from eubacteria. L4 from yeast has been shown to bind rRNA.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This family includes ribosomal L4/L1 from eukaryotes and plants and L4 from bacteria. L4 from yeast has been shown to bind rRNA. These proteins have 246 (plant) to 427 (human) amino acids.

    Proteins where this domain is known:
    PY04244    PY06225   


    PF00574 - CLP_protease (Pfam link)

    Interpro entry IPR001907 : Peptidase S14, ClpP (Interpro link)

    Pfam description:
    The Clp protease has an active site catalytic triad. In E. coli Clp protease, ser-111, his-136 and asp-185 form the catalytic triad. Swiss:P48254 has lost all of these active site residues and is therefore inactive. Swiss:P42379 contains two large insertions, Swiss:P42380 contains one large insertion.

    Interpro description:

    Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases.

    Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base. The geometric orientations of the catalytic residues are similar between families, despite different protein folds. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC).

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of serine peptidases belong to the MEROPS peptidase family S14 (ClpP endopeptidase family, clan SK). ClpP is an ATP-dependent protease that cleaves a number of proteins, such as casein and albumin. It exists as a heterodimer of ATP-binding regulatory A and catalytic P subunits, both of which are required for effective levels of protease activity in the presence of ATP, although the P subunit alone does possess some catalytic activity. This family of sequences represent the P subunit.

    Proteases highly similar to ClpP have been found to be encoded in the genome of bacteria, metazoa, some viruses and in the chloroplast of plants. A number of the proteins in this family are classified as non-peptidase homologues as they have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for catalytic activity.

    Proteins where this domain is known:
    PY00557    PY06630   


    PF00575 - S1 (Pfam link)

    Interpro entry IPR003029 : S1, RNA binding (Interpro link)

    Pfam description:
    The S1 domain occurs in a wide range of RNA associated proteins. It is structurally similar to cold shock protein which binds nucleic acids. The S1 domain has an OB-fold structure.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    The S1 domain was originally identified in ribosomal protein S1 but is found in a large number of RNA-associated proteins. The structure of the S1 RNA-binding domain from the Escherichia coli polynucleotide phosphorylase has been determined using NMR methods and consists of a five-stranded antiparallel beta barrel. Conserved residues on one face of the barrel and adjacent loops form the putative RNA-binding site.

    The structure of the S1 domain is very similar to that of cold shock proteins. This suggests that they may both be derived from an ancient nucleic acid-binding protein.

    More information about these proteins can be found at Protein of the Month: RNA Exosomes.

    Proteins where this domain is known:
    PY00093    PY00746   

    Proteins where this domain has been detected by our approach:
    PY03311    PY04108    PY04437    PY05877   


    PF00578 - AhpC-TSA (Pfam link)

    Interpro entry IPR000866 : Alkyl hydroperoxide reductase/ Thiol specific antioxidant/ Mal allergen (Interpro link)

    Pfam description:
    This family contains proteins related to alkyl hydroperoxide reductase (AhpC) and thiol specific antioxidant (TSA).

    Interpro description:

    Peroxiredoxins (Prxs) are a ubiquitous family of antioxidant enzymes that also control cytokine-induced peroxide levels which mediate signal transduction in mammalian cells. Prxs can be regulated by changes to phosphorylation, redox and possibly oligomerisation states. Prxs are divided into three classes: typical 2-Cys Prxs; atypical 2-Cys Prxs; and 1-Cys Prxs. All Prxs share the same basic catalytic mechanism, in which an active-site cysteine (the peroxidatic cysteine) is oxidised to a sulphenic acid by the peroxide substrate. The recycling of the sulphenic acid back to a thiol is what distinguishes the three enzyme classes. Using crystal structures, a detailed catalytic cycle has been derived for typical 2-Cys Prxs, including a model for the redox-regulated oligomeric state proposed to control enzyme activity.

    Alkyl hydroperoxide reductase (AhpC) is responsible for directly reducing organic hyperoxides in its reduced dithiol form. Thiol specific antioxidant (TSA) is a physiologically important antioxidant which constitutes an enzymatic defence against sulphur-containing radicals. This family contains AhpC and TSA, as well as related proteins.

    Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee, King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E., Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of the first three letters of the genus; a space; the first letter of the species name; a space and an arabic number. In the event that two species names have identical designations, they are discriminated from one another by adding one or more letters (as necessary) to each species designation.

    The allergens in this family include allergens with the following designations: Asp f 3, Mal f 2 and Mal f 3.

    Proteins where this domain is known:
    PY00414    PY02747    PY03834    PY04285   


    PF00579 - tRNA-synt_1b (Pfam link)

    Interpro entry IPR002305 : Aminoacyl-tRNA synthetase, class Ib (Interpro link)

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

    Proteins where this domain is known:
    PY04194    PY04254    PY06252    PY06291   


    PF00580 - UvrD-helicase (Pfam link)

    Interpro entry IPR000212 : DNA helicase, UvrD/REP type (Interpro link)

    Pfam description:
    The Rep family helicases are composed of four structural domains. The Rep family function as dimers. REP helicases catalyse ATP dependent unwinding of double stranded DNA to single stranded DNA. Swiss:P23478, Swiss:P08394 have large insertions near to the carboxy-terminus relative to other members of the family.

    Interpro description:

    Members of this family are helicases that catalyse ATP dependent unwinding of double stranded DNA to single stranded DNA. THe family includes both Rep and UvrD helcases. The Rep family helicases are composed of four structural domains. The Rep proteins function as dimers.

    Proteins where this domain is known:
    PY01998   


    PF00581 - Rhodanese (Pfam link)

    Interpro entry IPR001763 : (Interpro link)

    Pfam description:
    Rhodanese has an internal duplication. This Pfam represents a single copy of this duplicated domain. The domain is found as a single copy in other proteins, including phosphatases and ubiquitin C-terminal hydrolases.

    Interpro description:

    Rhodanese, a sulphurtransferase involved in cyanide detoxification (see shares evolutionary relationship with a large family of proteins, including

    Rhodanese has an internal duplication. This domain is found as a single copy in other proteins, including phosphatases and ubiquitin C-terminal hydrolases.

    Proteins where this domain is known:
    PY00561   

    Proteins where this domain has been detected by our approach:
    PY01845   


    PF00583 - Acetyltransf_1 (Pfam link)

    Interpro entry IPR000182 : GCN5-related N-acetyltransferase (Interpro link)

    Pfam description:
    This family contains proteins with N-acetyltransferase functions such as Elp3-related proteins.

    Interpro description:

    Histone acetylation is carried out by a class of enzymes known as histone acetyltransferases (HATs), which catalyze the transfer of an acetyl group from acetyl-CoA to the lysine E-amino groups on the N-terminal tails of histone. Early indication that HATs were involved in transcription came from the observation that in actively transcribed regions of chromatin, histones tend to be hyperacetylated, whereas in transcriptionally silent regions histones are hypoacetylated. The histone acetyltransferases are divided into five families. These include the Gcn5-related acetyltransferases (GNATs); the MYST (for 'MOZ, Ybf2/Sas3, Sas2 and Tip60)-related HATs; p300/CBP HATs; the general transcription factor HATs, which include the TFIID subunit TAF250; and the nuclear hormone-related HATs SRC1 and ACTR (SRC3). The GCN5-related N-acetyltransferase superfamily includes such enzymes as the histone acetyltransferases GCN5 and Hat1, the elongator complex subunit Elp3, the mediator-complex subunit Nut1, and Hpa2 .

    Many GNATs share several functional domains, including an N-terminal region of variable length, an acetyltransferase domain that encompasses the conserved sequence motifs described above, a region that interacts with the coactivator Ada2, and a C-terminal bromodomain that is believed to interact with acetyl-lysine residues. Members of the GNAT family are important for the regulation of cell growth and development. In mice, knockouts of Gcn5L are embryonic lethal. Yeast Gcn5 is needed for normal progression through the G2ÂM boundary and mitotic gene expression. The importance of GNATs is probably related to their role in transcription and DNA repair.

    The yeast GCN5 (yGCN5) transcriptional coactivator functions as a histone acetyltransferase (HAT) to promote transcriptional activation. The crystal structure of the yeast histone acetyltransferase Hat1-acetyl coenzyme A (AcCoA) shows that Hat1 has an elongated, curved structure, and the AcCoA molecule is bound in a cleft on the concave surface of the protein, marking the active site of the enzyme. A channel of variable width and depth that runs across the protein is probably the binding site for the histone substrate. The central protein core associated with AcCoA binding that appears to be structurally conserved among a superfamily of N-acetyltransferases, including yeast histone acetyltransferase 1 and Serratia marcescens aminoglycoside 3-N-acetyltransferase.

    Proteins where this domain is known:
    PY00941    PY01428    PY02679    PY04599   

    Proteins where this domain has been detected by our approach:
    PY03999    PY04574   


    PF00584 - SecE (Pfam link)

    Interpro entry IPR001901 : Protein secE/sec61-gamma protein (Interpro link)

    Pfam description:
    SecE is part of the SecYEG complex in bacteria which translocates proteins from the cytoplasm. In eukaryotes the complex, made from Sec61-gamma and Sec61-alpha translocates protein from the cytoplasm to the ER. Archaea have a similar complex.

    Interpro description:

    Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to the translocase component.. From there, the mature proteins are either targeted to the outer membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial chromosome.

    The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of the mature peptide into the periplasm (SecD and SecF). The chaperone protein SecB is a highly acidic homotetrameric protein that exists as a "dimer of dimers" in the bacterial cytoplasm. SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane protein ATPase SecA for secretion. SecE, part of the main SecYEG translocase complex, is ~106 residues in length, and spans the inner membrane of the Gram-negative bacterial envelope. Together with SecY and SecG, SecE forms a multimeric channel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. The latter is mediated by SecA.

    In eukaryotes, the evolutionary related protein sec61-gamma plays a role in protein translocation through the endoplasmic reticulum; it is part of a trimeric complex that also consist of sec61-alpha and beta. Both secE and sec61-gamma are small proteins of about 60 to 90 amino acids that contain a single transmembrane region at their C-terminal extremity (Escherichia coli secE is an exception, in that it possess an extra N-terminal segment of 60 residues that contains two additional transmembrane domains).

    Proteins where this domain is known:
    PY07642   


    PF00586 - AIRS (Pfam link)

    Interpro entry IPR000728 : AIR synthase related protein (Interpro link)

    Pfam description:
    This family includes Hydrogen expression/formation protein HypE Swiss:P24193, AIR synthases Swiss:P08178 EC:6.3.3.1, FGAM synthase Swiss:P35852 EC:6.3.5.3 and selenide, water dikinase Swiss:P16456 EC:2.7.9.3. The N-terminal domain of AIR synthase forms the dimer interface of the protein, and is suggested as a putative ATP binding domain.

    Interpro description:
    This family includes Hydrogen expression/formation protein, HypE, which may be involved in the maturation of NifE hydrogenase; AIR synthase and FGAM synthase, which are involved in de novo purine biosynthesis; and selenide, water dikinase, an enzyme which synthesizes selenophosphate from selenide and ATP.

    Proteins where this domain is known:
    PY05530   


    PF00587 - tRNA-synt_2b (Pfam link)

    Interpro entry IPR002314 : Aminoacyl-tRNA synthetase, class II (G, H, P and S), conserved region (Interpro link)

    Pfam description:
    Other tRNA synthetase sub-families are too dissimilar to be included. This domain is the core catalytic domain of tRNA synthetases and includes glycyl, histidyl, prolyl, seryl and threonyl tRNA synthetases.

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

    This domain includes the glycine, histidine, proline, threonine and serine tRNA synthetases.

    Proteins where this domain is known:
    PY00739    PY00927    PY01198    PY02018    PY03295    PY03706    PY06957   


    PF00588 - SpoU_methylase (Pfam link)

    Interpro entry IPR001537 : tRNA/rRNA methyltransferase, SpoU (Interpro link)

    Pfam description:
    This family of proteins probably use S-AdoMet.

    Interpro description:
    The spoU gene of Escherichia coli codes for a protein that shows strong similarities to previously characterised 2'-O-methyltransferases. The Pet56 protein of Saccharomyces cerevisiae has been shown to be required for ribose methylation at a universally conserved nucleotide in the peptidyl transferase centre of the mitochondrial large ribosomal RNA (21S rRNA). Cells reduced in this activity were deficient in formation of functional large subunits of the mitochondrial ribosome. The Pet56 protein catalyzes the site-specific formation of 2'-O-methylguanosine on in vitro transcripts of both mitochondrial 21S rRNA and E. coli 23S rRNA providing evidence for an essential modified nucleotide in rRNA.

    Proteins where this domain is known:
    PY00525    PY01246    PY02112    PY03647    PY06442   


    PF00589 - Phage_integrase (Pfam link)

    Interpro entry IPR002104 : Integrase, catalytic core, phage (Interpro link)

    Pfam description:
    Members of this family cleave DNA substrates by a series of staggered cuts, during which the protein becomes covalently linked to the DNA through a catalytic tyrosine residue at the carboxy end of the alignment. The catalytic site residues in CRE recombinase (Swiss:P06956) are Arg-173, His-289, Arg-292 and Tyr-324.

    Interpro description:

    Phage integrase proteins cleave DNA substrates by a series of staggered cuts, during which the protein becomes covalently linked to the DNA through a catalytic tyrosine residue at the carboxy end of the alignment.

    The catalytic site residues in CRE recombinase are Arg-173, His-289, Arg-292 and Tyr-324.

    Proteins where this domain is known:
    PY04273   


    PF00590 - TP_methylase (Pfam link)

    Interpro entry IPR000878 : Tetrapyrrole methylase (Interpro link)

    Pfam description:
    This family uses S-AdoMet in the methylation of diverse substrates. This family includes a related group of bacterial proteins of unknown function, including Swiss:P45528. This family includes the methylase Dipthine synthase.

    Interpro description:

    Tetrapyrroles are large macrocyclic compounds derived from a common biosynthetic pathway. The end-product, uroporphyrinogen III, is used to synthesise a number of important molecules, including cobalamin (vitamin B12), haem, sirohaem, chlorophyll, coenzyme F430 and phytochromobilin.

    This entry represents several tetrapyrrole methylases, which consist of two non-similar domains. These enzymes catalyse the methylation of their substrates using S-adenosyl-L-methionine as a methyl source. Enzymes in this family include:

    Proteins where this domain is known:
    PY01561   


    PF00609 - DAGK_acc (Pfam link)

    Interpro entry IPR000756 : Diacylglycerol kinase accessory region (Interpro link)

    Pfam description:
    Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. This domain is assumed to be an accessory domain: its function is unknown.

    Interpro description:

    Protein kinases are a group of enzymes that possess a catalytic subunit which transfers the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. The enzymes fall into two broad classes, characterised with respect to substrate specificity: serine/threonine specific and tyrosine specific.

    Protein kinase function has been evolutionarily conserved from Escherichia coli to human. Protein kinases play a role in a mulititude of cellular processes, including division, proliferation, apoptosis, and differentiation. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins.

    The catalytic subunits of protein kinases are highly conserved, and several structures have been solved, leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases.

    Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. The DAG kinase domain is assumed to be an accessory domain. Upon cell stimulation, DAG kinase converts DAG into phosphatidate, initiating the resynthesis of phosphatidylinositols and attenuating protein kinase C activity. It catalyses the reaction: ATP + 1,2-diacylglycerol = ADP + 1,2-diacylglycerol 3-phosphate. The enzyme is stimulated by calcium and phosphatidylserine and phosphorylated by protein kinase C. This domain is always associated with

    Proteins where this domain is known:
    PY01867    PY06359   


    PF00611 - FCH (Pfam link)

    Interpro entry IPR001060 : (Interpro link)

    Pfam description:
    Alignment extended from. Highly alpha-helical.

    Interpro description:

    The FCH domain is a short conserved region of around 60 amino acids first described as a region of homology between FER and CIP4 proteins. Many proteins containing an FCH domain are involved in the regulation of cytoskeletal rearrangements, vesicular transport and endocytosis. In the CIP4 protein the FCH domain binds to microtubules. The FCH domain is always found N-terminally and is followed by a coiled-coil region.

    Proteins containing an FCH domain can be divided in 3 classes:

    1. A subfamily of protein kinases usually associated with an SH2 domain:
      • Fps/fes (Fujimani poultry sarcoma/feline sarcoma) proto-oncogenes. They are non-receptor protein-tyrosine kinases preferentially expressed in myeloid lineage. The viral oncogene has an unregulated kinase activity which abrogates the need for cytokines and influences differentiation of haematopoietic progenitor cells.
      • Fes related protein (fer). It is an ubiquitously expressed homolog of Fes.
    2. Adaptor proteins usually associated with a C-terminal SH3 domain:
      • Schizosaccharomyces pombe CDC15 protein. It mediates cytoskeletal rearrangements required for cytokinesis. It is essential for viability.
      • CD2 cytoplasmic domain binding protein.
      • Mammalian Cdc42-interacting protein 4 (CIP4). It may act as a link between Cdc42 signaling and regulation of the actin cytoskeleton.
      • Mammalian PACSIN proteins. A family of cytoplasmic phosphoproteins playing a role in vesicle formation and transport.
    3. A subfamily of Rho-GAP proteins:
      • Mammalian RhoGAP4 proteins. They may down-regulate Rho-like GTPases in hematopoietic cells.
      • Yeast hypothetical protein YBR260C.
      • Caenorhabditis elegans hypothetical protein ZK669.1.

    Proteins where this domain is known:
    PY03716   


    PF00612 - IQ (Pfam link)

    Interpro entry IPR000048 : (Interpro link)

    Pfam description:
    Calmodulin-binding motif.

    Interpro description:

    Calmodulin (CaM) is recognized as a major calcium sensor and orchestrator of regulatory events through its interaction with a diverse group of cellular proteins. Three classes of recognition motifs exist for many of the known CaM binding proteins; the IQ motif as a consensus for Ca2+-independent binding and two related motifs for Ca2+-dependent binding, termed 18-14 and 1-5-10 based on the position of conserved hydrophobic residues.

    The regulatory domain of scallop myosin is a three-chain protein complex that switches on this motor in response to Ca2+ binding. Side-chain interactions link the two light chains in tandem to adjacent segments of the heavy chain bearing the IQ-sequence motif. The Ca2+-binding site is a novel EF-hand motif on the essential light chain and is stabilized by linkages involving the heavy chain and both light chains, accounting for the requirement of all three chains for Ca2+ binding and regulation in the intact myosin molecule.

    Proteins where this domain is known:
    PY01039    PY04789    PY06624   


    PF00613 - PI3Ka (Pfam link)

    Interpro entry IPR001263 : Phosphoinositide 3-kinase accessory region PIK (Interpro link)

    Pfam description:
    PIK domain is conserved in all PI3 and PI4-kinases. Its role is unclear but it has been suggested to be involved in substrate presentation.

    Interpro description:

    Phosphatidylinositol 3-kinase (PI3-kinase) is an enzyme that phosphorylates phosphoinositides on the 3-hydroxyl group of the inositol ring. The role of the accessory domain of phosphoinositide 3-kinase (PI3-kinase) is unclear. It may be involved in substrate presentation .

    Proteins where this domain is known:
    PY00334    PY04039   


    PF00614 - PLDc (Pfam link)

    Interpro entry IPR001736 : Phospholipase D/Transphosphatidylase (Interpro link)

    Pfam description:
    Phosphatidylcholine-hydrolysing phospholipase D (PLD) isoforms are activated by ADP-ribosylation factors (ARFs). PLD produces phosphatidic acid from phosphatidylcholine, which may be essential for the formation of certain types of transport vesicles or may be constitutive vesicular transport to signal transduction pathways. PC-hydrolysing PLD is a homologue of cardiolipin synthase, phosphatidylserine synthase, bacterial PLDs, and viral proteins. Each of these appears to possess a domain duplication which is apparent by the presence of two motifs containing well-conserved histidine, lysine, and/or asparagine residues which may contribute to the active site. aspartic acid. An E. coli endonuclease (nuc) and similar proteins appear to be PLD homologues but possess only one of these motifs. The profile contained here represents only the putative active site regions, since an accurate multiple alignment of the repeat units has not been achieved.

    Interpro description:

    Phosphatidylcholine-hydrolysing phospholipase D (PLD) isoforms are activated by ADP-ribosylation factors (ARFs). PLD produces phosphatidic acid from phosphatidylcholine, which may be essential for the formation of certain types of transport vesicles or may be constitutive vesicular transport to signal transduction pathways. PC-hydrolysing PLD is a homologue of cardiolipin synthase, phosphatidylserine synthase, bacterial PLDs, and viral proteins. Each of these appears to possess a domain duplication which is apparent by the presence of two motifs containing well-conserved histidine, lysine, and/or asparagine residues which may contribute to the active site aspartic acid. An Escherichia coli endonuclease (nuc) and similar proteins appear to be PLD homologues but possess only one of these motifs.

    Proteins where this domain is known:
    PY03489    PY06901   


    PF00620 - RhoGAP (Pfam link)

    Interpro entry IPR000198 : RhoGAP (Interpro link)

    Pfam description:
    GTPase activator proteins towards Rho/Rac/Cdc42-like small GTPases.

    Interpro description:
    Members of the Rho family of small G proteins transduce signals from plasma-membrane receptors and control cell adhesion, motility and shape by actin cytoskeleton formation. Like all other GTPases, Rho proteins act as molecular switches, with an active GTP-bound form and an inactive GDP-bound form. The active conformation is promoted by guanine-nucleotide exchange factors, and the inactive state by GTPase-activating proteins (GAPs) which stimulate the intrinsic GTPase activity of small G proteins. This entry is a Rho/Rac/Cdc42-like GAP domain, that is found in a wide variety of large, multi-functional proteins. A number of structure are known for this family. The domain is composed of seven alpha helices. This domain is also known as the breakpoint cluster region-homology (BH) domain.

    Proteins where this domain is known:
    PY01876   


    PF00622 - SPRY (Pfam link)

    Interpro entry IPR003877 : (Interpro link)

    Pfam description:
    SPRY Domain is named from SPla and the RYanodine Receptor. Domain of unknown function. Distant homologues are domains in butyrophilin/marenostrin/pyrin homologues.

    Interpro description:
    The SPRY domain is of unknown function. Distant homologues are domains in butyrophilin/marenostrin/pyrin. Ca2+-release from the sarcoplasmic or endoplasmic reticulum, the intracellular Ca2+ store, is mediated by the ryanodine receptor (RyR) and/or the inositol trisphosphate receptor (IP3R).

    Proteins where this domain is known:
    PY01366    PY04750    PY05573   


    PF00623 - RNA_pol_Rpb1_2 (Pfam link)

    Interpro entry IPR000722 : RNA polymerase, alpha subunit (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). This domain, domain 2, contains the active site. The invariant motif -NADFDGD- binds the active site magnesium ion.

    Interpro description:

    RNA polymerases catalyse the DNA dependent polymerisation of RNA from DNA, using the four ribonucleoside triphosphates as substrates. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Eukaryotic RNA polymerase I is essentially used to transcribe ribosomal RNA units, polymerase II is used for mRNA precursors, and III is used to transcribe 5S and tRNA genes. Each class of RNA polymerase is assembled from nine to fourteen different polypeptides. Members of the family include the largest subunit from eukaryotes; the gamma subunit from Cyanobacteria; the beta' subunit from bacteria; the A' subunit from archaea; and the B'' subunit from chloroplast RNA polymerases.

    Proteins where this domain is known:
    PY01037    PY03187    PY03255    PY04439   


    PF00625 - Guanylate_kin (Pfam link)

    Interpro entry IPR008144 : (Interpro link)

    Interpro description:

    Guanylate kinase (GK) catalyzes the ATP-dependent phosphorylation of GMP into GDP. It is essential for recycling GMP and indirectly, cGMP. In prokaryotes (such as Escherichia coli), lower eukaryotes (such as yeast) and in vertebrates, GK is a highly conserved monomeric protein of about 200 amino acids. GK has been shown to be structurally similar to protein A57R (or SalG2R) from various strains of Vaccinia virus.

    Proteins containing one or more copies of the DHR domain, an SH3 domain as well as a C-terminal GK-like domain, are collectively termed MAGUKs (membrane-associated guanylate kinase homologs), and include Drosophila lethal(1)discs large-1 tumor suppressor protein (gene dlg1); mammalian tight junction protein Zo-1; a family of mammalian synaptic proteins that seem to interact with the cytoplasmic tail of NMDA receptor subunits (SAP90/PSD-95, CHAPSYN-110/PSD-93, SAP97/DLG1 and SAP102); vertebrate 55 kD erythrocyte membrane protein (p55); Caenorhabditis elegans protein lin-2; rat protein CASK; and human proteins DLG2 and DLG3. There is an ATP-binding site (P-loop) in the N-terminal section of GK, which is not conserved in the GK-like domain of the above proteins. However these proteins retain the residues known, in GK, to be involved in the binding of GMP.

    Proteins where this domain is known:
    PY04174   


    PF00626 - Gelsolin (Pfam link)

    Interpro entry IPR007123 : (Interpro link)

    Interpro description:

    Gelsolin is a cytoplasmic, calcium-regulated, actin-modulating protein that binds to the barbed ends of actin filaments, preventing monomer exchange (end-blocking or capping). It can promote nucleation (the assembly of monomers into filaments), as well as sever existing filaments. In addition, this protein binds with high affinity to fibronectin. Plasma gelsolin and cytoplasmic gelsolin are derived from a single gene by alternate initiation sites and differential splicing.

    Sequence comparisons indicate an evolutionary relationship between gelsolin, villin, fragmin and severin. Six large repeating segments occur in gelsolin and villin, and 3 similar segments in severin and fragmin. While the multiple repeats have yet to be related to any known function of the actin-severing proteins, the superfamily appears to have evolved from an ancestral sequence of 120 to 130 amino acid residues.

    Proteins where this domain is known:
    PY01094    PY02497   

    Proteins where this domain has been detected by our approach:
    PY03895   


    PF00627 - UBA (Pfam link)

    Interpro entry IPR000449 : (Interpro link)

    Pfam description:
    This small domain is composed of three alpha helices. This family includes the previously defined UBA and TS-N domains. The UBA-domain (ubiquitin associated domain) is a novel sequence motif found in several proteins having connections to ubiquitin and the ubiquitination pathway. The structure of the UBA domain consists of a compact three helix bundle. This domain is found at the N terminus of EF-TS hence the name TS-N. The structure of EF-TS is known and this domain is implicated in its interaction with EF-TU. The domain has been found in non EF-TS proteins such as alpha-NAC Swiss:P70670 and MJ0280 Swiss:Q57728.

    Interpro description:

    UBA domains are a commonly occurring sequence motif of approximately 45 amino acid residues that are found in diverse proteins involved in the ubiquitin/proteasome pathway, DNA excision-repair, and cell signalling via protein kinases. The human homologue of yeast Rad23A is one example of a nucleotide excision-repair protein that contains both an internal and a C-terminal UBA domain. The solution structure of human Rad23A UBA(2) showed that the domain forms a compact three-helix bundle. Comparison of the structures of UBA(1) and UBA(2) reveals that both form very similar folds and have a conserved large hydrophobic surface patch which may be a common protein-interacting surface present in diverse UBA domains. Evidence that ubiquitin binds to UBA domains leads to the prediction that the hydrophobic surface patch of UBA domains interacts with the hydrophobic surface on the five-stranded beta-sheet of ubiquitin.

    This domain is similar in sequence to the N-terminal domain of translation elongation factor EF1B (or EF-Ts) from bacteria, mitochondria and chloroplasts.

    More information about EF1B (EF-Ts) proteins can be found at Protein of the Month: Elongation Factors.

    Proteins where this domain is known:
    PY00546    PY01513    PY01609    PY03631    PY04045    PY07445   

    Proteins where this domain has been detected by our approach:
    PY01462   


    PF00628 - PHD (Pfam link)

    Interpro entry IPR001965 : Zinc finger, PHD-type (Interpro link)

    Pfam description:
    PHD folds into an interleaved type of Zn-finger chelating 2 Zn ions in a similar manner to that of the RING and FYVE domains. Several PHD fingers have been identified as binding modules of methylated histone H3.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents the PHD (homeodomain) zinc finger domain, which is a C4HC3 zinc-finger-like motif found in nuclear proteins thought to be involved in chromatin-mediated transcriptional regulation. The PHD finger motif is reminiscent of, but distinct from the C3HC4 type RING finger.

    The function of this domain is not yet known but in analogy with the LIM domain it could be involved in protein-protein interaction and be important for the assembly or activity of multicomponent complexes involved in transcriptional activation or repression. Alternatively, the interactions could be intra-molecular and be important in maintaining the structural integrity of the protein. In similarity to the RING finger and the LIM domain, the PHD finger is thought to bind two zinc ions.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY00684    PY00764    PY03412    PY05642   

    Proteins where this domain has been detected by our approach:
    PY03491    PY05680    PY06148   


    PF00630 - Filamin (Pfam link)

    Interpro entry IPR017868 : (Interpro link)

    Interpro description:

    The many different actin cross-linking proteins share a common architecture, consisting of a globular actin-binding domain and an extended rod. Whereas their actin-binding domains consist of two calponin homology domains (see, their rods fall into three families.

    The rod domain of the family including the Dictyostelium discoideum (Slime mould) gelation factor (ABP120) and human filamin (ABP280) is constructed from tandem repeats of a 100-residue motif that is glycine and proline rich. The gelation factor's rod contains 6 copies of the repeat, whereas filamin has a rod constructed from 24 repeats. The resolution of the 3D structure of rod repeats from the gelation factor has shown that they consist of a beta-sandwich, formed by two beta-sheets arranged in an immunoglobulin-like fold. Because conserved residues that form the core of the repeats are preserved in filamin, the repeat structure should be common to the members of the gelation factor/filamin family.

    The head to tail homodimerisation is crucial to the function of the ABP120 and ABP280 proteins. This interaction involves a small portion at the distal end of the rod domains. For the gelation factor it has been shown that the carboxy-terminal repeat 6 dimerises through a double edge-to-edge extension of the beta-sheet and that repeat 5 contributes to dimerisation to some extent.

    Proteins where this domain is known:
    PY00788    PY01084   

    Proteins where this domain has been detected by our approach:
    PY00078   


    PF00632 - HECT (Pfam link)

    Interpro entry IPR000569 : HECT (Interpro link)

    Pfam description:
    The name HECT comes from Homologous to the E6-AP Carboxyl Terminus.

    Interpro description:

    The name HECT comes from 'Homologous to the E6-AP Carboxyl Terminus'. Proteins containing this domain at the C-terminus include ubiquitin-protein ligase, which regulates ubiquitination of CDC25. Ubiquitin-protein ligase accepts ubiquitin from an E2 ubiquitin-conjugating enzyme in the form of a thioester, and then directly transfers the ubiquitin to targeted substrates. A cysteine residue is required for ubiquitin-thiolester formation. Human thyroid receptor interacting protein 12, which also contains this domain, is a component of an ATP-dependent multisubunit protein that interacts with the ligand binding domain of the thyroid hormone receptor. It could be an E3 ubiquitin-protein ligase. Human ubiquitin-protein ligase E3A interacts with the E6 protein of the cancer-associated Human papillomavirus type 16 and Human papillomavirus type 18. The E6/E6-AP complex binds to and targets the P53 tumour-suppressor protein for ubiquitin-mediated proteolysis.

    Proteins where this domain is known:
    PY01030    PY02709    PY05241    PY05840   


    PF00633 - HHH (Pfam link)

    Interpro entry IPR000445 : Helix-hairpin-helix motif (Interpro link)

    Pfam description:
    The helix-hairpin-helix DNA-binding motif is found to be duplicated in the central domain of RuvA.

    Interpro description:
    The HhH motif is an around 20 amino acids domain present in prokaryotic and eukaryotic non-sequence-specific DNA binding proteins. The HhH motif is similar to, but distinct from, the HtH motif. Both of these motifs have two helices connected by a short turn. In the HtH motif the second helix binds to DNA with the helix in the major groove. This allow the contact between specific base and residues throughout the protein. In the HhH motif the second helix does not protrude from the surface of the protein and therefore cannot lie in the major groove of the DNA. Crystallographic studies suggest that the interaction of the HhH domain with DNA is mediated by amino acids located in the strongly conserved loop (L-P-G-V) and at the N-terminal end of the second helix. This interaction could involve the formation of hydrogen bonds between protein backbone nitrogens and DNA phosphate groups. The structural difference between the HtH and HhH domains is reflected at the functional level: whereas the HtH domain, found primarily in gene regulatory proteins, binds DNA in a sequence specific manner, the HhH domain is rather found in proteins involved in enzymatic activities and binds DNA with no sequence specificity.

    Proteins where this domain is known:
    PY05593    PY05666    PY05677   

    Proteins where this domain has been detected by our approach:
    PY03786    PY06905    PY07176   


    PF00634 - BRCA2 (Pfam link)

    Interpro entry IPR002093 : (Interpro link)

    Pfam description:
    The alignment covers only the most conserved region of the repeat.

    Interpro description:

    The breast cancer type 2 susceptibility protein has a number of 39 amino acid repeats that are critical for binding to RAD51 (a key protein in DNA recombinational repair) and resistance to methyl methanesulphonate treatment. BRCA2 is a breast tumour suppressor with a potential function in the cellular response to DNA damage. At the cellular level, expression is regulated in a cell-cycle dependent manner and peak expression of BRCA2 mRNA is found in S phase, suggesting BRCA2 may participate in regulating cell proliferation. There are eight repeats in BRCA2 designated as BRC1 to BRC8. BRC1, BRC2, BRC3, BRC4, BRC7, and BRC8 are highly conserved and bind to Rad51, whereas BRC5 and BRC6 are less well conserved and do not bind to Rad51. It has been suggested that BRCA2 plays a role in positioning Rad51 at the site of DNA repair or in removing Rad51 from DNA once repair has been completed.

    Proteins where this domain is known:
    PY04531    PY04955   


    PF00635 - Motile_Sperm (Pfam link)

    Interpro entry IPR000535 : Major sperm protein (Interpro link)

    Pfam description:
    Major sperm proteins are involved in sperm motility. These proteins oligomerise to form filaments. This family contains many other proteins.

    Interpro description:

    Major sperm proteins (MSP) are central components in molecular interactions underlying sperm motility in Caenorhabditis elegans, whose sperm employ an amoebae-like crawling motion using a MSP-containing lamellipod, rather than the flagellar-based swimming motion associated with other sperm. These proteins oligomerise to form an extensive filament system that extends from sperm villipoda, along the leading edge of the pseudopod. About 30 MSP isoforms may exist in C. elegans.

    MSPs form a fibrous network, whereby MSP dimers form helical subfilaments that coil around one another to produce filaments, which in turn form supercoils to produce bundles. The crystal structure of MSP from C. elegans reveals an immunoglobulin (Ig)-like seven-stranded beta sandwich fold.

    Proteins where this domain is known:
    PY06882   


    PF00637 - Clathrin (Pfam link)

    Interpro entry IPR000547 : Clathrin, heavy chain/VPS, 7-fold repeat (Interpro link)

    Pfam description:
    Each region is about 140 amino acids long. The regions are composed of multiple alpha helical repeats. They occur in the arm region of the Clathrin heavy chain.

    Interpro description:

    Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors.

    Clathrin is a trimer composed of three heavy chains and three light chains, each monomer projecting outwards like a leg; this three-legged structure is known as a triskelion. The heavy chains form the legs, their N-terminal beta-propeller regions extending outwards, while their C-terminal alpha-alpha-superhelical regions form the central hub of the triskelion. Peptide motifs can bind between the beta-propeller blades. The light chains appear to have a regulatory role, and may help orient the assembly and disassembly of clathrin coats as they interact with hsc70 uncoating ATPase. Clathrin triskelia self-polymerise into a curved lattice by twisting individual legs together. The clathrin lattice forms around a vesicle as it buds from the TGN, plasma membrane or endosomes, acting to stabilise the vesicle and facilitate the budding process. The multiple blades created when the triskelia polymerise are involved in multiple protein interactions, enabling the recruitment of different cargo adaptors and membrane attachment proteins.

    This entry represents the 7-fold alpha-alpha-superhelical ARM-type repeat found at the C-terminal of clathrin heavy chains and in VPS (vacuolar protein sorting-associated) proteins. In clathrin heavy chains, the C-terminal 7-fold ARM-type repeats interact to form the central hub of the triskelion. VPS proteins are required for vacuolar assembly and vacuolar traffick, and contain one clathrin-type repeat.

    More information about these proteins can be found at Protein of the Month: Clathrin.

    Proteins where this domain is known:
    PY01854   


    PF00638 - Ran_BP1 (Pfam link)

    Interpro entry IPR000156 : Ran Binding Protein 1 (Interpro link)

    Interpro description:

    Ran is an evolutionary conserved member of the Ras superfamily that regulates all receptor-mediated transport between the nucleus and the cytoplasm. Ran Binding Protein 1 (RanBP1) has guanine nucleotide dissociation inhibitory activity, specific for the GTP form of Ran and also functions to stimulate Ran GTPase activating protein(GAP)-mediated GTP hydrolysis by Ran. RanBP1 contributes to maintaining the gradient of RanGTP across the nuclear envelope high (GDI activity) or the cytoplasmic levels of RanGTP low (GAP cofactor).

    All RanBP1 proteins contain an approx 150 amino acid residue Ran binding domain. Ran BP1 binds directly to RanGTP with high affinity. There are four sites of contact between Ran and the Ran binding domain. One of these involves binding of the C-terminal segment of Ran to a groove on the Ran binding domain that is analogous to the surface utilised in the EVH1Âpeptide interaction. Nup358 contains four Ran binding domains. The structure of the first of these is known.

    Proteins where this domain is known:
    PY07276   


    PF00641 - zf-RanBP (Pfam link)

    Interpro entry IPR001876 : Zinc finger, RanBP2-type (Interpro link)

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents the zinc finger domain found in RanBP2 proteins. Ran is an evolutionary conserved member of the Ras superfamily that regulates all receptor-mediated transport between the nucleus and the cytoplasm. Ran binding protein 2 (RanBP2) is a 358-kDa nucleoporin located on the cytoplasmic side of the nuclear pore complex which plays a role in nuclear protein import. RanBP2 contains multiple zinc fingers which mediate binding to RanGDP.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY02236   


    PF00642 - zf-CCCH (Pfam link)

    Interpro entry IPR000571 : Zinc finger, CCCH-type (Interpro link)

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents C-x8-C-x5-C-x3-H (CCCH) type Zinc finger (Znf) domains. Proteins containing CCCH Znf domains include Znf proteins from eukaryotes involved in cell cycle or growth phase-related regulation, e.g. human TIS11B (butyrate response factor 1), a probable regulatory protein involved in regulating the response to growth factors, and the mouse TTP growth factor-inducible nuclear protein, which has the same function. The mouse TTP protein is induced by growth factors. Another protein containing this domain is the human splicing factor U2AF 35 kD subunit, which plays a critical role in both constitutive and enhancer-dependent splicing by mediating essential protein-protein interactions and protein-RNA interactions required for 3' splice site selection. It has been shown that different CCCH-type Znf proteins interact with the 3'-untranslated region of various mRNA. This type of Znf is very often present in two copies.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY01183    PY01907    PY02869    PY02908    PY03482    PY03562    PY03958    PY04485    PY04677    PY05712    PY06412    PY06948   

    Proteins where this domain has been detected by our approach:
    PY00512    PY01203    PY03224   


    PF00643 - zf-B_box (Pfam link)

    Interpro entry IPR000315 : Zinc finger, B-box (Interpro link)

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents B-box-type zinc finger domains, which are around 40 residues in length. B-box zinc fingers can be divided into two groups, where types 1 and 2 B-box domains differ in their consensus sequence and in the spacing of the 7-8 zinc-binding residues. Several proteins contain both types 1 and 2 B-boxes, suggesting some level of cooperativity between these two domains. B-box domains are found in over 1500 proteins from a variety of organisms. They are found in TRIM (tripartite motif) proteins that consist of an N-terminal RING finger (originally called an A-box), followed by 1-2 B-box domains and a coiled-coil domain (also called RBCC for Ring, B-box, Coiled-Coil). TRIM proteins contain a type 2 B-box domain, and may also contain a type 1 B-box. In proteins that do not contain RING or coiled-coil domains, the B-box domain is primarily type 2. Many type 2 B-box proteins are involved in ubiquitinylation. Proteins containing a B-box zinc finger domain include transcription factors, ribonucleoproteins and proto-oncoproteins; for example, MID1, MID2, TRIM9, TNL, TRIM36, TRIM63, TRIFIC, NCL1 and CONSTANS-like proteins.

    The microtubule-associated E3 ligase MID1 contains a type 1 B-box zinc finger domain. MID1 specifically binds Alpha-4, which in turn recruits the catalytic subunit of phosphatase 2A (PP2Ac). This complex is required for targeting of PP2Ac for proteasome-mediated degradation. The MID1 B-box coordinates two zinc ions and adopts a beta/beta/alpha cross-brace structure similar to that of ZZ, PHD, RING and FYVE zinc fingers.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY01618    PY04301   

    Proteins where this domain has been detected by our approach:
    PY00030   


    PF00644 - PARP (Pfam link)

    Interpro entry IPR012317 : Poly(ADP-ribose) polymerase, catalytic region (Interpro link)

    Pfam description:
    Poly(ADP-ribose) polymerase catalyses the covalent attachment of ADP-ribose units from NAD+ to itself and to a limited number of other DNA binding proteins, which decreases their affinity for DNA. Poly(ADP-ribose) polymerase is a regulatory component induced by DNA damage. The carboxyl-terminal region is the most highly conserved region of the protein. Experiments have shown that a carboxyl 40 kDa fragment is still catalytically active.

    Interpro description:

    Poly(ADP-ribose) polymerases (PARP) are a family of enzymes present in eukaryotes, which catalyze the poly(ADP-ribosyl)ation of a limited number of proteins involved in chromatin architecture, DNA repair, or in DNA metabolism, including PARP itself. PARP, also known as poly(ADP-ribose) synthetase and poly(ADP-ribose) transferase, transfers the ADP-ribose moiety from its substrate, nicotinamide adenine dinucleotide (NAD), to carboxylate groups of aspartic and glutamic residues. Whereas some PARPs might function in genome protection, others appear to play different roles in the cell, including telomere replication and cellular transport. PARP-1 is a multifunctional enzyme. The polypeptide has a highly conserved modular organization consisting of an N-terminal DNA-binding domain, a central regulating segment, and a C-terminal or F region accommodating the catalytic centre. The F region is composed of two parts: a purely alpha-helical N- terminal domain (alpha-hd), and the mixed alpha/beta C-terminal catalytic domain bearing the putative NAD binding site. Although proteins of the PARP family are related through their PARP catalytic domain, they do not resemble each other outside of that region, but rather, they contain unique domains that distinguish them from each other and hint at their discrete functions. Domains with which the PARP catalytic domain is found associated include zinc fingers, SAP, ankyrin, BRCT, Macro, SAM, WWE and UIM domains.

    The alpha-hd domain is about 130 amino acids in length and consists of an up-up-down-up-down-down motif of helices. It is thought to relay the activation signal issued on binding to damaged DNA. The PARP catalytic domain is about 230 residues in length. Its core consists of a five-stranded antiparallel beta-sheet and four-stranded mixed beta-sheet. The two sheets are consecutive and are connected via a single pair of hydrogen bonds between two strands that run at an angle of 90 degrees. These central beta-sheets are surrounded by five alpha-helices, three 3(10)-helices, and by a three- and a two-stranded beta-sheet in a 37-residue excursion between two central beta-strands. The active site, known as the 'PARP signature' is formed by a block of 50 amino acids that is strictly conserved among the vertebrates and highly conserved among all species. The 'PARP signature' is characteristic of all PARP protein family members. It is formed by a segment of conserved amino acid residues formed by a beta-sheet, an alpha-helix, a 3(10)-helix, a beta-sheet, and an alpha-helix.

    Proteins where this domain is known:
    PY01618   


    PF00645 - zf-PARP (Pfam link)

    Interpro entry IPR001510 : Zinc finger, PARP-type (Interpro link)

    Pfam description:
    Poly(ADP-ribose) polymerase is an important regulatory component of the cellular response to DNA damage. The amino-terminal region of Poly(ADP-ribose) polymerase consists of two PARP-type zinc fingers. This region acts as a DNA nick sensor.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents PARP (Poly(ADP) polymerase) type zinc finger domains.

    NAD(+) ADP-ribosyltransferase is a eukaryotic enzyme that catalyses the covalent attachment of ADP-ribose units from NAD(+) to various nuclear acceptor proteins. This post-translational modification of nuclear proteins is dependent on DNA. It appears to be involved in the regulation of various important cellular processes such as differentiation, proliferation and tumour transformation as well as in the regulation of the molecular events involved in the recovery of the cell from DNA damage. Structurally, NAD(+) ADP-ribosyltransferase consists of three distinct domains: an N-terminal zinc-dependent DNA-binding domain, a central automodification domain and a C-terminal NAD-binding domain. The DNA-binding region contains a pair of PARP-type zinc finger domains which have been shown to bind DNA in a zinc-dependent manner. The PARP-type zinc finger domains seem to bind specifically to single-stranded DNA and to act as a DNA nick sensor. DNA ligase III contains, in its N-terminal section, a single copy of a zinc finger highly similar to those of PARP.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain has been detected by our approach:
    PY01533   


    PF00647 - EF1G (Pfam link)

    Interpro entry IPR001662 : Translation elongation factor EF1B, gamma chain, conserved (Interpro link)

    Interpro description:

    Translation elongation factors are responsible for two main processes during protein synthesis on the ribosome. EF1A (or EF-Tu) is responsible for the selection and binding of the cognate aminoacyl-tRNA to the A-site (acceptor site) of the ribosome. EF2 (or EF-G) is responsible for the translocation of the peptidyl-tRNA from the A-site to the P-site (peptidyl-tRNA site) of the ribosome, thereby freeing the A-site for the next aminoacyl-tRNA to bind. Elongation factors are responsible for achieving accuracy of translation and both EF1A and EF2 are remarkably conserved throughout evolution.

    Elongation factor EF1B (also known as EF-Ts or EF-1beta/gamma/delta) is a nucleotide exchange factor that is required to regenerate EF1A from its inactive form (EF1A-GDP) to its active form (EF1A-GTP). EF1A is then ready to interact with a new aminoacyl-tRNA to begin the cycle again. EF1B is more complex in eukaryotes than in bacteria, and can consist of three subunits: EF1B-alpha (or EF-1beta), EF1B-gamma (or EF-1gamma) and EF1B-beta (or EF-1delta).

    This entry represents a conserved domain usually found near the C-terminus of EF1B-gamma chains, a peptide of 410-440 residues. The gamma chain appears to play a role in anchoring the EF1B complex to the beta and delta chains and to other cellular components.

    More information about these proteins can be found at Protein of the Month: Elongation Factors.

    Proteins where this domain is known:
    PY07121   


    PF00648 - Peptidase_C2 (Pfam link)

    Interpro entry IPR001300 : Peptidase C2, calpain (Interpro link)

    Interpro description:

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad.

    This group of cysteine peptidases belong to the MEROPS peptidase family C2 (calpain family, clan CA). A type example is calpain, which is an intracellular protease involved in many important cellular functions that are regulated by calcium. The protein is a complex of 2 polypeptide chains (light and heavy), with three known forms in mammals: a highly calcium-sensitive (i.e., micro-molar range) form known as mu-calpain, mu-CANP or calpain I; a form sensitive to calcium in the milli-molar range, known as m-calpain, m-CANP or calpain II; and a third form, known as p94, which is found in skeletal muscle only.

    All forms have identical light but different heavy chains. Both mu- and m-calpain are heterodimers containing an identical 28-kDa subunit and an 80-kDa subunit that shares 55-65% sequence homology between the two proteases. The crystallographic structure of m-calpain reveals six "domains" in the 80-kDa subunit:

    1. A 19-amino acid NH2-terminal sequence;
    2. Active site domain IIa;
    3. Active site domain IIb.

      Domain 2 shows low levels of sequence similarity to papain; although the catalytic His has not been located by biochemical means, it is likely that calpain and papain are related.

    4. Domain III;
    5. An 18-amino acid extended sequence linking domain III to domain IV;
    6. Domain IV, which resembles the penta EF-hand family of polypeptides, binds calcium and regulates activity. />. Ca2+-binding causes a rearrangement of the protein backbone, the net effect of which is that a Trp side chain, which acts as a wedge between catalytic domains IIa and IIb in the apo state, moves away from the active site cleft allowing for the proper formation of the catalytic triad.

    Calpain-like mRNAs have been identified in other organisms including bacteria, but the molecules encoded by these mRNAs have not been isolated, so little is known about their properties. How calpain activity is regulated in these organisms cells is still unclear In metazoans, the activity of calpain is controlled by a single proteinase inhibitor, calpastatin. The calpastatin gene can produce eight or more calpastatin polypeptides ranging from 17 to 85 kDa by use of different promoters and alternative splicing events. The physiological significance of these different calpastatins is unclear, although all bind to three different places on the calpain molecule; binding to at least two of the sites is Ca2+ dependent. The calpains ostensibly participate in a variety of cellular processes including remodelling of cytoskeletal/membrane attachments, different signal transduction pathways, and apoptosis. Deregulated calpain activity following loss of Ca2+ homeostasis results in tissue damage in response to events such as myocardial infarcts, stroke, and brain trauma.

    Proteins where this domain is known:
    PY00976   


    PF00650 - CRAL_TRIO (Pfam link)

    Interpro entry IPR001251 : (Interpro link)

    Pfam description:
    The original profile has been extended to include the carboxyl domain from the known structure of Sec14. Swiss:P10911 has not been included in the Pfam family because it does not appear to contain a complete structural domain.

    Interpro description:
    This entry defines the C-terminal of various retinaldehyde/retinal-binding proteins that may be functional components of the visual cycle. Cellular retinaldehyde-binding protein (CRALBP) carries 11-cis-retinol or 11-cis-retinaldehyde as endogenous ligands and may function as a substrate carrier protein that modulates interaction of these retinoids with visual cycle enzymes. The multidomain protein Trio binds the LAR transmembrane tyrosine phosphatase, contains a protein kinase domain, and has separate rac-specific and rho-specific guanine nucleotide exchange factor domains. Trio is a multifunctional protein that integrates and amplifies signals involved in coordinating actin remodeling, which is necessary for cell migration and growth.

    Other members of the family are transfer proteins that include, guanine nucleotide exchange factor that may function as an effector of RAC1, phosphatidylinositol/phosphatidylcholine transfer protein that is required for the transport of secretory proteins from the golgi complex and alpha-tocopherol transfer protein that enhances the transfer of the ligand between separate membranes.

    Proteins where this domain is known:
    PY03550    PY03600   


    PF00651 - BTB (Pfam link)

    Interpro entry IPR013069 : BTB/POZ (Interpro link)

    Pfam description:
    The BTB (for BR-C, ttk and bab) or POZ (for Pox virus and Zinc finger) domain is present near the N-terminus of a fraction of zinc finger (Pfam:PF00096) proteins and in proteins that contain the Pfam:PF01344 motif such as Kelch and a family of pox virus proteins. The BTB/POZ domain mediates homomeric dimerisation and in some instances heteromeric dimerisation. The structure of the dimerised PLZF BTB/POZ domain has been solved and consists of a tightly intertwined homodimer. The central scaffolding of the protein is made up of a cluster of alpha-helices flanked by short beta-sheets at both the top and bottom of the molecule. POZ domains from several zinc finger proteins have been shown to mediate transcriptional repression and to interact with components of histone deacetylase co-repressor complexes including N-CoR and SMRT. The POZ or BTB domain is also known as BR-C/Ttk or ZiN

    Interpro description:

    The BTB (for BR-C, ttk and bab) or POZ (for Pox virus and Zinc finger) domain is present near the N terminus of a fraction of zinc finger proteins and in proteins that contain themotif such as Kelch and a family of pox virus proteins. The BTB/POZ domain mediates homomeric dimerisation and in some instances heteromeric dimerisation. The structure of the dimerised PLZF BTB/POZ domain has been solved and consists of a tightly intertwined homodimer. The central scaffolding of the protein is made up of a cluster of alpha-helices flanked by short beta-sheets at both the top and bottom of the molecule. POZ domains from several zinc finger proteins have been shown to mediate transcriptional repression and to interact with components of histone deacetylase co-repressor complexes including N-CoR and SMRT. The POZ or BTB domain is also known as BR-C/Ttk or ZiN.

    Proteins where this domain is known:
    PY01757   

    Proteins where this domain has been detected by our approach:
    PY07030   


    PF00656 - Peptidase_C14 (Pfam link)

    Interpro entry IPR011600 : Peptidase C14, caspase catalytic (Interpro link)

    Interpro description:

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad.

    This group of sequences represent the p20 (20kDa) and p10 (10kDa) subunits of caspases, which together form the catalytic domain of the caspase and are derived from the p45 (45 kDa) precursor.

    Caspases (Cysteine-dependent ASPartyl-specific proteASE) are cysteine peptidases that belong to the MEROPS peptidase family C14 (caspase family, clan CD) based on the architecture of their catalytic dyad or triad. Caspases are tightly regulated proteins that require zymogen activation to become active, and once active can be regulated by caspase inhibitors. Activated caspases act as cysteine proteases, using the sulphydryl group of a cysteine side chain for catalysing peptide bond cleavage at aspartyl residues in their substrates. The catalytic cysteine and histidine residues are on the p20 subunit after cleavage of the p45 precursor.

    Caspases are mainly involved in mediating cell death (apoptosis). They have two main roles within the apoptosis cascade: as initiators that trigger the cell death process, and as effectors of the process itself. Caspase-mediated apoptosis follows two main pathways, one extrinsic and the other intrinsic or mitochondrial-mediated. The extrinsic pathway involves the stimulation of various TNF (tumour necrosis factor) cell surface receptors on cells targeted to die by various TNF cytokines that are produced by cells such as cytotoxic T cells. The activated receptor transmits the signal to the cytoplasm by recruiting FADD, which forms a death-inducing signalling complex (DISC) with caspase-8. The subsequent activation of caspase-8 initiates the apoptosis cascade involving caspases 3, 4, 6, 7, 9 and 10. The intrinsic pathway arises from signals that originate within the cell as a consequence of cellular stress or DNA damage. The stimulation or inhibition of different Bcl-2 family receptors results in the leakage of cytochrome c from the mitochondria, and the formation of an apoptosome composed of cytochrome c, Apaf1 and caspase-9. The subsequent activation of caspase-9 initiates the apoptosis cascade involving caspases 3 and 7, among others. At the end of the cascade, caspases act on a variety of signal transduction proteins, cytoskeletal and nuclear proteins, chromatin-modifying proteins, DNA repair proteins and endonucleases that destroy the cell by disintegrating its contents, including its DNA. The different caspases have different domain architectures depending upon where they fit into the apoptosis cascades, however they all carry the catalytic p10 and p20 subunits.

    Caspases can have roles other than in apoptosis, such as caspase-1 (interleukin-1 beta convertase), which is involved in the inflammatory process. The activation of apoptosis can sometimes lead to caspase-1 activation, providing a link between apoptosis and inflammation, such as during the targeting of infected cells. Caspases may also be involved in cell differentiation.

    Proteins where this domain is known:
    PY00663    PY04718   


    PF00658 - PABP (Pfam link)

    Interpro entry IPR002004 : Polyadenylate-binding protein/Hyperplastic disc protein (Interpro link)

    Pfam description:
    The region featured in this family is found towards the C-terminus of poly(A)-binding proteins (PABPs). These are eukaryotic proteins that, through their binding of the 3\' poly(A) tail on mRNA, have very important roles in the pathways of gene expression. They seem to provide a scaffold on which other proteins can bind and mediate processes such as export, translation and turnover of the transcripts. Moreover, they may act as antagonists to the binding of factors that allow mRNA degradation, regulating mRNA longevity. PABPs are also involved in nuclear transport. PABPs interact with poly(A) tails via RNA-recognition motifs (Pfam:PF00076). Note that the PABP C-terminal region is also found in members of the hyperplastic discs protein (HYD) family of ubiquitin ligases that contain HECT domains - these are also included in this family.

    Interpro description:

    The polyadenylate-binding protein (PABP) has a conserved C-terminal domain (PABC), which is also found in the hyperplastic discs protein (HYD) family of ubiquitin ligases that contain HECT domains. PABP recognises the 3' mRNA poly(A) tail and plays an essential role in eukaryotic translation initiation and mRNA stabilisation/degradation. PABC domains of PABP are peptide-binding domains that mediate PABP homo-oligomerisation and protein-protein interactions. In mammals, the PABC domain of PABP functions to recruit several different translation factors to the mRNA poly(A) tail.

    Proteins where this domain is known:
    PY05398   


    PF00664 - ABC_membrane (Pfam link)

    Interpro entry IPR001140 : ABC transporter, transmembrane region (Interpro link)

    Pfam description:
    This family represents a unit of six transmembrane helices. Many members of the ABC transporter family (Pfam:PF00005) have two such regions.

    Interpro description:

    ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.

    ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain.

    The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site.

    The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly beta-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel beta-sheet of armI by a two-fold axis.

    The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette. More than 50 subfamilies have been described based on a phylogenetic and functional classification; (for further information see http://www.tcdb.org/tcdb/index.php?tc=3.A.1).

    A variety of ATP-binding transport proteins have a six transmembrane helical region. They are all integral membrane proteins involved in a variety of transport systems. Members of this family include; the cystic fibrosis transmembrane conductance regulator (CFTR), bacterial leukotoxin secretion ATP-binding protein, multidrug resistance proteins, the yeast leptomycin B resistance protein, the mammalian sulphonylurea receptor and antigen peptide transporter 2. Many of these proteins have two such regions.

    Proteins where this domain is known:
    PY00245    PY01826    PY06054    PY06546    PY07088   

    Proteins where this domain has been detected by our approach:
    PY05035   


    PF00665 - rve (Pfam link)

    Interpro entry IPR001584 : Integrase, catalytic core (Interpro link)

    Pfam description:
    Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is composed of three domains. The amino-terminal domain is a zinc binding domain Pfam:PF02022. This domain is the central catalytic domain. The carboxyl terminal domain that is a non-specific DNA binding domain Pfam:PF00552. The catalytic domain acts as an endonuclease when two nucleotides are removed from the 3\' ends of the blunt-ended viral DNA made by reverse transcription. This domain also catalyses the DNA strand transfer reaction of the 3\' ends of the viral DNA to the 5\' ends of the integration site.

    Interpro description:

    Integrase comprises three domains capable of folding independently and whose three-dimensional structures are known. However, the manner in which the N-terminal, catalytic, and C-terminal domains interact in the holoenzyme remains obscure. Numerous studies indicate that the enzyme functions as a multimer, minimally a dimer. The integrase proteins from Human immunodeficiency virus 1 (HIV-1) and Avian sarcoma virus (ASV) have been studied most carefully with respect to the structural basis of catalysis. Although the active site of ASV integrase does not undergo significant conformational changes on binding the required metal cofactor, that of HIV-1 does. This active site-mediated conformational change in HIV-1 reorganises the catalytic core and C-terminal domains and appears to promote an interaction that is favourable for catalysis.

    Retroviral integrase is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. The presence of retrovirus integrase-related gene sequences in eukaryotes is known. Bacterial transposases involved in the transposition of the insertion sequence also belong to this group.

    HIV integrase catalyses the incorporation of virally derived DNA into the human genome. This unique step in the virus life cycle provides a variety of points for intervention and hence is an attractive target for the development of new therapeutics for the treatment of AIDS. Substrate recognition by the retroviral integrase enzyme is critical for retroviral integration. To catalyse this recombination event, integrase must recognise and act on two types of substrates, viral DNA and host DNA, yet the necessary interactions exhibit markedly different degrees of specificity.

    Proteins where this domain is known:
    PY07014    PY07288   


    PF00667 - FAD_binding_1 (Pfam link)

    Interpro entry IPR003097 : FAD-binding, type 1 (Interpro link)

    Pfam description:
    This domain is found in sulfite reductase, NADPH cytochrome P450 reductase, Nitric oxide synthase and methionine synthase reductase.

    Interpro description:

    This domain is found in sulphite reductase, NADPH cytochrome P450 reductase, nitric oxide synthase and methionine synthase reductase. Flavoprotein pyridine nucleotide cytochrome reductases (FPNCR) catalyse the interchange of reducing equivalents between one-electron carriers and the two-electron-carrying nicotinamide dinucleotides. The enzymes include ferredoxin:NADP+reductases (FNR), plant and fungal NAD(P)H:nitrate reductases, NADH:cytochrome b5 reductases, NADPH:P450 reductases, NADPH:sulphite reductases, nitric oxide synthases, phthalate dioxygenase reductase, and various other flavoproteins.

    Proteins where this domain is known:
    PY05179   


    PF00670 - AdoHcyase_NAD (Pfam link)

    Interpro entry IPR015878 : (Interpro link)

    Interpro description:

    S-adenosyl-L-homocysteine hydrolase (AdoHcyase) is an enzyme of the activated methyl cycle, responsible for the reversible hydration of S-adenosyl-L-homocysteine into adenosine and homocysteine. AdoHcyase is an ubiquitous enzyme which binds and requires NAD+ as a cofactor. AdoHcyase is a highly conserved protein of about 430 to 470 amino acids.

    This entry represents the glycine-rich region in the central part of AdoHcyase, which is thought to be involved in NAD-binding.

    Proteins where this domain is known:
    PY02893   


    PF00673 - Ribosomal_L5_C (Pfam link)

    Interpro entry IPR002132 : Ribosomal protein L5 (Interpro link)

    Pfam description:
    This region is found associated with Pfam:PF00281.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L5 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L5 is known to be involved in binding 5S RNA to the large ribosomal subunit. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups:

    L5 is a protein of about 180 amino-acid residues.

    Proteins where this domain is known:
    PY02461   


    PF00675 - Peptidase_M16 (Pfam link)

    Interpro entry IPR011765 : Peptidase M16, N-terminal (Interpro link)

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    The majority of the sequences in this entry are metallopeptidases and non-peptidase homologs belong to MEROPS peptidase family M16 (clan ME), subfamilies M16A, M16B and M16C; they include:

    These proteins do not share many regions of sequence similarity; the most noticeable is in the N-terminal section. This region includes a conserved histidine followed, two residues later by a glutamate and another histidine. In pitrilysin, it has been shown that this H-x-x-E-H motif is involved in enzymatic activity; the two histidines bind zinc and the glutamate is necessary for catalytic activity. The proteins classified as non-peptidase homologues either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity.

    Proteins where this domain is known:
    PY00244    PY01832    PY04232    PY04302    PY06052    PY07032   


    PF00676 - E1_dh (Pfam link)

    Interpro entry IPR001017 : Dehydrogenase, E1 component (Interpro link)

    Pfam description:
    This family uses thiamine pyrophosphate as a cofactor. This family includes pyruvate dehydrogenase, 2-oxoglutarate dehydrogenase and 2-oxoisovalerate dehydrogenase.

    Interpro description:
    This entry includes a number of dehydrogenases all of which use thiamine pyrophosphate as a cofactor and are members of a multienzyme complex. Pyruvate dehydrogenase, a component of the multienzyme pyruvate dehydrogenase complex; 2-oxoglutarate dehydrogenase, a component of the multienzyme 2-oxoglutarate dehydrogenase which contains multiple copies of three enzymatic components: 2-oxoglutarate dehydrogenase (E1), dihydrolipoamide succinyltransferase (E2) and lipoamide dehydrogenase (E3); and 2-oxoisovalerate dehydrogenase, a component of the multienzyme branched-chain alpha-keto dehydrogenase complex all belong to this family.

    Proteins where this domain is known:
    PY00819    PY02421    PY05086   


    PF00679 - EFG_C (Pfam link)

    Interpro entry IPR000640 : Translation elongation factor EFG/EF2, C-terminal (Interpro link)

    Pfam description:
    This domain includes the carboxyl terminal regions of Elongation factor G, elongation factor 2 and some tetracycline resistance proteins and adopt a ferredoxin-like fold.

    Interpro description:

    Translation elongation factors are responsible for two main processes during protein synthesis on the ribosome. EF1A (or EF-Tu) is responsible for the selection and binding of the cognate aminoacyl-tRNA to the A-site (acceptor site) of the ribosome. EF2 (or EF-G) is responsible for the translocation of the peptidyl-tRNA from the A-site to the P-site (peptidyl-tRNA site) of the ribosome, thereby freeing the A-site for the next aminoacyl-tRNA to bind. Elongation factors are responsible for achieving accuracy of translation and both EF1A and EF2 are remarkably conserved throughout evolution.

    Elongation factor EF2 (EF-G) is a G-protein. It brings about the translocation of peptidyl-tRNA and mRNA through a ratchet-like mechanism: the binding of GTP-EF2 to the ribosome causes a counter-clockwise rotation in the small ribosomal subunit; the hydrolysis of GTP to GDP by EF2 and the subsequent release of EF2 causes a clockwise rotation of the small subunit back to the starting position. This twisting action destabilises tRNA-ribosome interactions, freeing the tRNA to translocate along the ribosome upon GTP-hydrolysis by EF2. EF2 binding also affects the entry and exit channel openings for the mRNA, widening it when bound to enable the mRNA to translocate along the ribosome.

    This entry represents the C-terminal domain found in EF2 (or EF-G) of both prokaryotes and eukaryotes (also known as eEF2), as well as in some tetracycline-resistance proteins. This domain adopts a ferredoxin-like fold consisting of an alpha/beta sandwich with anti-parallel beta-sheets. It resembles the topology of domain III found in these elongation factors, with which it forms the C-terminal block, but these two domains cannot be superimposed. This domain is often found associated with, which contains the signatures for the N-terminus of the proteins.

    More information about these proteins can be found at Protein of the Month: Elongation Factors.

    Proteins where this domain is known:
    PY00511    PY01864    PY02627    PY02880    PY03426    PY04706    PY05356    PY05417   


    PF00684 - DnaJ_CXXCXGXG (Pfam link)

    Interpro entry IPR001305 : Heat shock protein DnaJ, cysteine-rich region (Interpro link)

    Pfam description:
    The central cysteine-rich (CR) domain of DnaJ proteins contains four repeats of the motif CXXCXGXG where X is any amino acid. The isolated cysteine rich domain folds in zinc dependent fashion. Each set of two repeats binds one unit of zinc. Although this domain has been implicated in substrate binding, no evidence of specific interaction between the isolated DNAJ cysteine rich domain and various hydrophobic peptides has been found.

    Interpro description:

    Molecular chaperones are a diverse family of proteins that function to protect proteins in the intracellular milieu from irreversible aggregation during synthesis and in times of cellular stress. The bacterial molecular chaperone DnaK is an enzyme that couples cycles of ATP binding, hydrolysis, and ADP release by an N-terminal ATP-hydrolyzing domain to cycles of sequestration and release of unfolded proteins by a C-terminal substrate binding domain. Dimeric GrpE is the co-chaperone for DnaK, and acts as a nucleotide exchange factor, stimulating the rate of ADP release 5000-fold. DnaK is itself a weak ATPase; ATP hydrolysis by DnaK is stimulated by its interaction with another co-chaperone, DnaJ. Thus the co-chaperones DnaJ and GrpE are capable of tightly regulating the nucleotide-bound and substrate-bound state of DnaK in ways that are necessary for the normal housekeeping functions and stress-related functions of the DnaK molecular chaperone cycle.

    Besides stimulating the ATPase activity of DnaK through its J-domain, DnaJ also associates with unfolded polypeptide chains and prevents their aggregation. Thus, DnaK and DnaJ may bind to one and the same polypeptide chain to form a ternary complex. The formation of a ternary complex may result in cis-interaction of the J-domain of DnaJ with the ATPase domain of DnaK. An unfolded polypeptide may enter the chaperone cycle by associating first either with ATP-liganded DnaK or with DnaJ. DnaK interacts with both the backbone and side chains of a peptide substrate; it thus shows binding polarity and admits only L-peptide segments. In contrast, DnaJ has been shown to bind both L- and D-peptides and is assumed to interact only with the side chains of the substrate.

    Proteins where this domain is known:
    PY02476    PY04093   

    Proteins where this domain has been detected by our approach:
    PY03544   


    PF00687 - Ribosomal_L1 (Pfam link)

    Interpro entry IPR002143 : Ribosomal protein L1 (Interpro link)

    Pfam description:
    This family includes prokaryotic L1 and eukaryotic L10.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L1 is the largest protein from the large ribosomal subunit. The L1 protein contains two domains: 2-layer alpha/beta domain and a 3-layer alpha/beta domain (interrupts the first domain). In Escherichia coli, L1 is known to bind to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups:

    Proteins where this domain is known:
    PY03485    PY05328    PY05525    PY05786   


    PF00689 - Cation_ATPase_C (Pfam link)

    Interpro entry IPR006068 : ATPase, P-type cation-transporter, C-terminal (Interpro link)

    Pfam description:
    Members of this families are involved in Na+/K+, H+/K+, Ca++ and Mg++ transport. This family represents 5 transmembrane helices.

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    P-ATPases (sometime known as E1-E2 ATPases) are found in bacteria and in a number of eukaryotic plasma membranes and organelles. P-ATPases function to transport a variety of different compounds, including ions and phospholipids, across a membrane using ATP hydrolysis for energy. There are many different classes of P-ATPases, each of which transports a specific type of ion: H+, Na+, K+, Mg2+, Ca2+, Ag+ and Ag2+, Zn2+, Co2+, Pb2+, Ni2+, Cd2+, Cu+ and Cu2+. P-ATPases can be composed of one or two polypeptides, and can usually assume two main conformations called E1 and E2.

    This entry represents the conserved C-terminal region found in several classes of cation-transporting P-type ATPases, including those that transport H+, Na+, Ca2+, Na+/K+, and H+/K+. In the H+/K+- and Na+/K+-exchange P-ATPases, this domain is found in the catalytic alpha chain.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY03970    PY05776   


    PF00690 - Cation_ATPase_N (Pfam link)

    Interpro entry IPR004014 : ATPase, P-type cation-transporter, N-terminal (Interpro link)

    Pfam description:
    Members of this families are involved in Na+/K+, H+/K+, Ca++ and Mg++ transport.

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    P-ATPases (sometime known as E1-E2 ATPases) are found in bacteria and in a number of eukaryotic plasma membranes and organelles. P-ATPases function to transport a variety of different compounds, including ions and phospholipids, across a membrane using ATP hydrolysis for energy. There are many different classes of P-ATPases, each of which transports a specific type of ion: H+, Na+, K+, Mg2+, Ca2+, Ag+ and Ag2+, Zn2+, Co2+, Pb2+, Ni2+, Cd2+, Cu+ and Cu2+. P-ATPases can be composed of one or two polypeptides, and can usually assume two main conformations called E1 and E2.

    This entry represents the conserved N-terminal region found in several classes of cation-transporting P-type ATPases, including those that transport H+, Na+, Ca2+, Na+/K+, and H+/K+. In the H+/K+- and Na+/K+-exchange P-ATPases, this domain is found in the catalytic alpha chain. In gastric H+/K+-ATPases, this domain undergoes reversible sequential phosphorylation inducing conformational changes that may be important for regulating the function of these ATPases.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY03970    PY05776   

    Proteins where this domain has been detected by our approach:
    PY01447    PY04047   


    PF00692 - dUTPase (Pfam link)

    Interpro entry IPR008180 : DeoxyUTP pyrophosphatase (Interpro link)

    Pfam description:
    dUTPase hydrolyses dUTP to dUMP and pyrophosphate.

    Interpro description:

    Synonym(s): dUTP diphosphatase, Deoxyuridine-triphosphatase

    The essential enzyme dUTP pyrophosphatase is specific for dUTP and is critical for the fidelity of DNA replication and repair. dUTPase hydrolyzes dUTP to dUMP and pyrophosphate, simultaneously reducing dUTP levels and providing the dUMP for dTTP biosynthesis. dUTPase decreases the intracellular concentration of dUPT so that uracil cannot be incorporated into DNA.

    The crystal structure of human dUTPase reveals that each subunit of the dUTPase trimer folds into an eight-stranded jelly-roll beta barrel, with the C-terminal beta strands interchanged among the subunits. The structure is similar to that of the Escherichia coli enzyme, despite low sequence homology between the two enzymes.

    Other enzymes like deoxycytidine triphosphate deaminase (dCTP) that specifically bind uridine also belong to this group suggesting that the signature may recognise a putative uridine-binding motif.

    Some retroviruses encode dUTPases. Retroviral dUTPase is synthesised as part of POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, dUTPase and RNase H.

    Proteins where this domain is known:
    PY05536   


    PF00694 - Aconitase_C (Pfam link)

    Interpro entry IPR000573 : Aconitase A/isopropylmalate dehydratase small subunit, swivel (Interpro link)

    Pfam description:
    Members of this family usually also match to Pfam:PF00330. This domain undergoes conformational change in the enzyme mechanism.

    Interpro description:

    3-isopropylmalate dehydratase (or isopropylmalate isomerase; catalyses the stereo-specific isomerisation of 2-isopropylmalate and 3-isopropylmalate, via the formation of 2-isopropylmaleate. This enzyme performs the second step in the biosynthesis of leucine, and is present in most prokaryotes and many fungal species. The prokaryotic enzyme is a heterodimer composed of a large (LeuC) and small (LeuD) subunit, while the fungal form is a monomeric enzyme. Both forms of isopropylmalate are related and are part of the larger aconitase family. Aconitases are mostly monomeric proteins which share four domains in common and contain a single, labile [4Fe-4S] cluster. Three structural domains (1, 2 and 3) are tightly packed around the iron-sulphur cluster, while a fourth domain (4) forms a deep active-site cleft. The prokaryotic enzyme is encoded by two adjacent genes, leuC and leuD, corresponding to aconitase domains 1-3 and 4 respectively. LeuC does not bind an iron-sulphur cluster. It is thought that some prokaryotic isopropylamalate dehydrogenases can also function as homoaconitase converting cis-homoaconitate to homoisocitric acid in lysine biosynthesis. Homoaconitase has been identified in higher fungi (mitochondria) and several archaea and one thermophilic species of bacteria, Thermus thermophilus.

    Aconitase (aconitate hydratase; is an iron-sulphur protein that contains a [4Fe-4S]-cluster and catalyses the interconversion of isocitrate and citrate via a cis-aconitate intermediate. Aconitase functions in both the TCA and glyoxylate cycles, however unlike the majority of iron-sulphur proteins that function as electron carriers, the [4Fe-4S]-cluster of aconitase reacts directly with an enzyme substrate. In eukaryotes there is a cytosolic form (cAcn) and a mitochondrial form (mAcn) of the enzyme. In bacteria there are also 2 forms, aconitase A (AcnA) and B (AcnB). Several aconitases are known to be multi-functional enzymes with a second non-catalytic, but essential function that arises when the cellular environment changes, such as when iron levels drop. Eukaryotic cAcn and mAcn, and bacterial AcnA have the same domain organisation, consisting of three N-terminal alpha/beta/alpha domains, a linker region, followed by a C-terminal 'swivel' domain with a beta/beta/alpha structure (1-2-3-linker-4), although mAcn is small than cAcn. However, bacterial AcnB has a different organisation: it contains an N-terminal HEAT-like domain, followed by the 'swivel' domain, then the three alpha/beta/alpha domains (HEAT-4-1-2-3). Below is a description of some of the multi-functional activities associated with different aconitases.

    This entry represents the 'swivel' domain found at the C-terminal of eukaryotic mAcn, cAcn/IPR1 and IRP2, and bacterial AcnA. This domain has a three layer beta/beta/alpha structure, and in cytosolic Acn is known to rotate between the cAcn and IRP1 forms of the enzyme. This domain is also found in the small subunit of isopropylmalate dehydratase (LeuD).

    More information about these proteins can be found at Protein of the Month: Aconitase.

    Proteins where this domain is known:
    PY00319   


    PF00702 - Hydrolase (Pfam link)

    Interpro entry IPR005834 : Haloacid dehalogenase-like hydrolase (Interpro link)

    Pfam description:
    This family are structurally different from the alpha/ beta hydrolase family (Pfam:PF00561). This family includes L-2-haloacid dehalogenase, epoxide hydrolases and phosphatases. The structure of the family consists of two domains. One is an inserted four helix bundle, which is the least well conserved region of the alignment, between residues 16 and 96 of Swiss:P24069. The rest of the fold is composed of the core alpha/beta domain.

    Interpro description:

    This group of hydrolase enzymes is structurally different from the alpha/beta hydrolase family (abhydrolase). This group includes L-2-haloacid dehalogenase, epoxide hydrolases and phosphatases. The structure consists of two domains. One is an inserted four helix bundle, which is the least well conserved region of the alignment, between residues 16 and 96 of HAD1_PSESP. The rest of the fold is composed of the core alpha/beta domain.

    Proteins where this domain is known:
    PY00066    PY01300    PY01447    PY03970    PY04231    PY04459    PY05776   


    PF00704 - Glyco_hydro_18 (Pfam link)

    Interpro entry IPR001223 : Glycoside hydrolase, family 18, catalytic domain (Interpro link)

    Interpro description:

    O-Glycosyl hydrolasesare a widespread group of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl hydrolases, based on sequence similarity, has led to the definition of 85 different families. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site. Because the fold of proteins is better conserved than their sequences, some of the families can be grouped in 'clans'.

    Some members of this family belong to the chitinase class II group which includes chitinase, chitodextrinase and the killer toxin of Kluyveromyces lactis. The chitinases hydrolyse chitin oligosaccharides. The family also includes various glycoproteins from mammals; cartilage glycoprotein and the oviduct-specific glycoproteins are two examples.

    Proteins where this domain is known:
    PY00008    PY07200   


    PF00705 - PCNA_N (Pfam link)

    Interpro entry IPR000730 : Proliferating cell nuclear antigen, PCNA (Interpro link)

    Pfam description:
    N-terminal and C-terminal domains of PCNA are topologically identical. Three PCNA molecules are tightly associated to form a closed ring encircling duplex DNA.

    Interpro description:

    Proliferating cell nuclear antigen (PCNA), or cyclin, is a non-histone acidic nuclear protein that plays a key role in the control of eukaryotic DNA replication. It acts as a co-factor for DNA polymerase delta, which is responsible for leading strand DNA replication. The sequence of PCNA is well conserved between plants and animals, indicating a strong selective pressure for structure conservation, and suggesting that this type of DNA replication mechanism is conserved throughout eukaryotes. In Saccharomyces cerevisiae (Baker's yeast), POL30, is associated with polymerase III, the yeast analog of polymerase delta.

    Homologues of PCNA have also been identified in the archaea (Euryarchaeota and Crenarchaeota) and in Paramecium bursaria Chlorella virus 1 (PBCV-1) and in nuclear polyhedrosis viruses.

    Proteins where this domain is known:
    PY01758    PY06718   


    PF00709 - Adenylsucc_synt (Pfam link)

    Interpro entry IPR001114 : Adenylosuccinate synthetase (Interpro link)

    Interpro description:

    Adenylosuccinate synthetase plays an important role in purine biosynthesis, by catalysing the GTP-dependent conversion of IMP and aspartic acid to AMP. Adenylosuccinate synthetase has been characterised from various sources ranging from Escherichia coli (gene purA) to vertebrate tissues. In vertebrates, two isozymes are present: one involved in purine biosynthesis and the other in the purine nucleotide cycle.

    The crystal structure of adenylosuccinate synthetase from E. coli reveals that the dominant structural element of each monomer of the homodimer is a central beta-sheet of 10 strands. The first nine strands of the sheet are mutually parallel with right-handed crossover connections between the strands. The 10th strand is antiparallel with respect to the first nine strands. In addition, the enzyme has two antiparallel beta-sheets, comprised of two strands and three strands each, 11 alpha-helices and two short 3/10-helices. Further, it has been suggested that the similarities in the GTP-binding domains of the synthetase and the p21ras protein are an example of convergent evolution of two distinct families of GTP-binding proteins. Structures of adenylosuccinate synthetase from Triticum aestivum and Arabidopsis thaliana when compared with the known structures from E. coli reveals that the overall fold is very similar to that of the E. coli protein.

    Proteins where this domain is known:
    PY03301   


    PF00717 - Peptidase_S24 (Pfam link)

    Interpro entry IPR011056 : (Interpro link)

    Interpro description:

    Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases.

    Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base. The geometric orientations of the catalytic residues are similar between families, despite different protein folds. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC).

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This entry represents the C-terminal domain of the Escherichia coli LexA protein and the C-terminal domain of the E. coli signal peptidase (SPase). They share the same structural topology, consisting of a complex fold made of several coiled beta-sheets, and containing an SH3-like beta-barrel. This entry is associated with serine peptidases belong to MEROPS peptidase families: S24 (LexA family, clan SF); S26A (signal peptidase I) and S26B (signalase).

    The S26 family includes E. coli signal peptidase, SPase, which is a membrane-bound endopeptidase, with two N-terminal transmembrane segments and a C-terminal catalytic region. SPase functions to release proteins that have been translocated into the inner membrane from the cell interior, by cleaving off their signal peptides.

    The S24 family includes:

    All of these proteins, with the possible exception of RulA, interact with RecA, which activates self cleavage either derepressing transcription in the case of CI and LexA or activating the lesion-bypass polymerase in the case of UmuD and MucA. UmuD'2, is the homodimeric component of DNA pol V, which is produced from UmuD by RecA-facilitated self-cleavage. The first 24 N-terminal residues of UmuD are removed; UmuD'2 is a DNA lesion bypass polymerase. MucA, like UmuD, is a plasmid encoded a DNA polymerase (pol RI) which is converted into the active lesion-bypass polymerase by a self-cleavage reaction involving RecA

    This group of proteins also contains proteins not recognised as peptidases as well as those classified as non-peptidase homologues as they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for catalytic activity.

    Proteins where this domain is known:
    PY00480   


    PF00719 - Pyrophosphatase (Pfam link)

    Interpro entry IPR008162 : Inorganic pyrophosphatase (Interpro link)

    Interpro description:

    Inorganic pyrophosphatase (PPase) is the enzyme responsible for the hydrolysis of pyrophosphate (PPi) which is formed principally as the product of the many biosynthetic reactions that utilise ATP. All known PPases require the presence of divalent metal cations, with magnesium conferring the highest activity. Among other residues, a lysine has been postulated to be part of or close to the active site. PPases have been sequenced from bacteria such as Escherichia coli (homohexamer), Bacillus PS3 (Thermophilic bacterium PS-3) and Thermus thermophilus, from the archaebacteria Thermoplasma acidophilum, from fungi (homodimer), from a plant, and from bovine retina. In yeast, a mitochondrial isoform of PPase has been characterised which seems to be involved in energy production and whose activity is stimulated by uncouplers of ATP synthesis.

    The sequences of PPases share some regions of similarities, among which is a region that contains three conserved aspartates that are involved in the binding of cations.

    Proteins where this domain is known:
    PY03581   


    PF00730 - HhH-GPD (Pfam link)

    Interpro entry IPR003265 : HhH-GPD domain (Interpro link)

    Pfam description:
    This family contains a diverse range of structurally related DNA repair proteins. The superfamily is called the HhH-GPD family after its hallmark Helix-hairpin-helix and Gly/Pro rich loop followed by a conserved aspartate. This includes endonuclease III, EC:4.2.99.18 and MutY an A/G-specific adenine glycosylase, both have a C terminal 4Fe-4S cluster. The family also includes 8-oxoguanine DNA glycosylases such as Swiss:P53397. The methyl-CPG binding protein MBD4 Swiss:Q9Z2D7 also contains a related domain that is a thymine DNA glycosylase. The family also includes DNA-3-methyladenine glycosylase II EC:3.2.2.21 and other members of the AlkA family.

    Interpro description:

    Endonuclease III is a DNA repair enzyme which removes a number of damaged pyrimidines from DNA via its glycosylase activity and also cleaves the phosphodiester backbone at apurinic / apyrimidinic sites via a beta-elimination mechanism. The structurally related DNA glycosylase MutY recognises and excises the mutational intermediate 8-oxoguanine-adenine mispair. The 3-D structures of Escherichia coli endonuclease III and catalytic domain of MutY have been determined. The structures contain two all-alpha domains: a sequence-continuous, six-helix domain (residues 22-132) and a Greek-key, four-helix domain formed by one N-terminal and three C-terminal helices (residues 1-21 and 133-211) together with the [Fe4S4] cluster. The cluster is bound entirely within the C-terminal loop by four cysteine residues with a ligation pattern Cys-(Xaa)6-Cys-(Xaa)2-Cys-(Xaa)5-Cys which is distinct from all other known Fe4S4 proteins. This structural motif is referred to as a [Fe4S4] cluster loop (FCL). Two DNA-binding motifs have been proposed, one at either end of the interdomain groove: the helix-hairpin-helix (HhH) and FCL motifs (see. The primary role of the iron-sulphur cluster appears to involve positioning conserved basic residues for interaction with the DNA phosphate backbone by forming the loop of the FCL motif.

    The HhH-GPD domain gets its name from its hallmark helix-hairpin-helix and Gly/Pro rich loop followed by a conserved aspartate. This domain is found in a diverse range of structurally related DNA repair proteins that include: endonuclease III,and DNA glycosylase MutY, an A/G-specific adenine glycosylase. Both of these enzymes have a C terminal iron-sulphur cluster loop (FCL). The methyl-CPG binding protein (MBD4) also contain a related domain that is a thymine DNA glycosylase. The family also includes DNA-3-methyladenine glycosylase II 8-oxoguanine DNA glycosylases and other members of the AlkA family.

    Proteins where this domain is known:
    PY05666    PY05677    PY07176   


    PF00733 - Asn_synthase (Pfam link)

    Interpro entry IPR001962 : Asparagine synthase (Interpro link)

    Pfam description:
    This family is always found associated with Pfam:PF00310. Members of this family catalyse the conversion of aspartate to asparagine.

    Interpro description:
    This domain is always found associated with. Family members that contain this domain catalyse the conversion of aspartate to asparagine. Asparagine synthetase B catalyzes the assembly of asparagine from aspartate, Mg(2+)ATP, and glutamine. The three-dimensional architecture of the N-terminal domain of asparagine synthetase B is similar to that observed for glutamine phosphoribosylpyrophosphate amidotransferase while the molecular motif of the C-domain is reminiscent to that observed for GMP synthetase.

    Proteins where this domain is known:
    PY02906   


    PF00736 - EF1_GNE (Pfam link)

    Interpro entry IPR014038 : Translation elongation factor EF1B, beta and delta chains, guanine nucleotide exchange (Interpro link)

    Pfam description:
    This family is the guanine nucleotide exchange domain of EF-1 beta and EF-1 delta chains.

    Interpro description:

    Translation elongation factors are responsible for two main processes during protein synthesis on the ribosome. EF1A (or EF-Tu) is responsible for the selection and binding of the cognate aminoacyl-tRNA to the A-site (acceptor site) of the ribosome. EF2 (or EF-G) is responsible for the translocation of the peptidyl-tRNA from the A-site to the P-site (peptidyl-tRNA site) of the ribosome, thereby freeing the A-site for the next aminoacyl-tRNA to bind. Elongation factors are responsible for achieving accuracy of translation and both EF1A and EF2 are remarkably conserved throughout evolution.

    Elongation factor EF1B (also known as EF-Ts or EF-1beta/gamma/delta) is a nucleotide exchange factor that is required to regenerate EF1A from its inactive form (EF1A-GDP) to its active form (EF1A-GTP). EF1A is then ready to interact with a new aminoacyl-tRNA to begin the cycle again. EF1B is more complex in eukaryotes than in bacteria, and can consist of three subunits: EF1B-alpha (or EF-1beta), EF1B-gamma (or EF-1gamma) and EF1B-beta (or EF-1delta).

    This entry represents the guanine nucleotide exchange domain of the beta (EF-1beta, also known as EF1B-alpha) and delta (EF-1delta, also known as EF1B-beta) chains of EF1B proteins from eukaryotes and archaea. The beta and delta chains have exchange activity, which mainly resides in their homologous guanine nucleotide exchange domains, found in the C-terminal region of the peptides. Their N-terminal regions may be involved in interactions with the gamma chain (EF-1gamma).

    More information about these proteins can be found at Protein of the Month: Elongation Factors.

    Proteins where this domain is known:
    PY06160   


    PF00746 - Gram_pos_anchor (Pfam link)

    Interpro entry IPR001899 : Surface protein from Gram-positive cocci, anchor region (Interpro link)

    Interpro description:

    Viruses, parasites and bacteria are covered in protein and sugar molecules that help them gain entry into a host by counteracting the host's defences. One such molecule is the M protein produced by certain streptococcal bacteria. M proteins embody a motif that is now known to be shared by many Gram-positive bacterial surface proteins. The motif includes a conserved hexapeptide, which precedes a hydrophobic C-terminal membrane anchor, which itself precedes a cluster of basic residues. This structure is represented in the following schematic representation:

    It has been proposed that this hexapeptide sequence is responsible for a post- translational modification necessary for the proper anchoring of the proteins which bear it, to the cell wall.

    Proteins where this domain has been detected by our approach:
    PY04147   


    PF00749 - tRNA-synt_1c (Pfam link)

    Interpro entry IPR000924 : Glutamyl/glutaminyl-tRNA synthetase, class Ic (Interpro link)

    Pfam description:
    Other tRNA synthetase sub-families are too dissimilar to be included. This family includes only glutamyl and glutaminyl tRNA synthetases. In some organisms, a single glutamyl-tRNA synthetase aminoacylates both tRNA(Glu) and tRNA(Gln).

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

    Glutamyl-tRNA synthetase is a class Ic synthetase and shows several similarities with glutaminyl-tRNA synthetase concerning structure and catalytic properties. It is an alpha2 dimer. To date one crystal structure of a glutamyl-tRNA synthetase (Thermus thermophilus) has been solved. The molecule has the form of a bent cylinder and consists of four domains. The N-terminal half (domains 1 and 2) contains the 'Rossman fold' typical for class I synthetases and resembles the corresponding part of Escherichia coli GlnRS, whereas the C-terminal half exhibits a GluRS-specific structure.

    Proteins where this domain is known:
    PY00363    PY02178    PY02891   


    PF00750 - tRNA-synt_1d (Pfam link)

    Interpro entry IPR015945 : Arginyl-tRNA synthetase, class Ic, core (Interpro link)

    Pfam description:
    Other tRNA synthetase sub-families are too dissimilar to be included. This family includes only arginyl tRNA synthetase.

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

    This entry represents the core region of arginyl-tRNA synthetase, which has been crystallized and preliminary X-ray crystallographic analysis of yeast arginyl-tRNA synthetase-yeast tRNAArg complexes is available.

    Proteins where this domain is known:
    PY01800    PY02481    PY02482   


    PF00752 - XPG_N (Pfam link)

    Interpro entry IPR006085 : XPG N-terminal (Interpro link)

    Interpro description:

    Xeroderma pigmentosum (XP) is a human autosomal recessive disease, characterised by a high incidence of sunlight-induced skin cancer. People's skin cells with this condition are hypersensitive to ultraviolet light, due to defects in the incision step of DNA excision repair. There are a minimum of seven genetic complementation groups involved in this pathway: XP-A to XP-G. XP-G is one of the most rare and phenotypically heterogeneous of XP, showing anything from slight to extreme dysfunction in DNA excision repair. XP-G can be corrected by a 133 Kd nuclear protein, XPGC. XPGC is an acidic protein that confers normal UV resistance in expressing cells. It is a magnesium-dependent, single-strand DNA endonuclease that makes structure-specific endonucleolytic incisions in a DNA substrate containing a duplex region and single-stranded arms. XPGC cleaves one strand of the duplex at the border with the single-stranded region.

    XPG belongs to a family of proteins that includes RAD2 from Saccharomyces cerevisiae (Baker's yeast) and rad13 from Schizosaccharomyces pombe (Fission yeast), which are single-stranded DNA endonucleases; mouse and human FEN-1, a structure-specific endonuclease; RAD2 from fission yeast and RAD27 from budding yeast; fission yeast exo1, a 5'-3' double-stranded DNA exonuclease that may act in a pathway that corrects mismatched base pairs; yeast DHS1, and yeast DIN7. Sequence alignment of this family of proteins reveals that similarities are largely confined to two regions. The first is located at the N-terminal extremity (N-region) and corresponds to the first 95 to 105 amino acids. The second region is internal (I-region) and found towards the C-terminus; it spans about 140 residues and contains a highly conserved core of 27 amino acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). It is possible that the conserved acidic residues are involved in the catalytic mechanism of DNA excision repair in XPG. The amino acids linking the N- and I-regions are not conserved.

    This entry represents the N terminal of XPG.

    Proteins where this domain is known:
    PY02238   

    Proteins where this domain has been detected by our approach:
    PY00765   


    PF00753 - Lactamase_B (Pfam link)

    Interpro entry IPR001279 : Beta-lactamase-like (Interpro link)

    Interpro description:
    Apart from the beta-lactamases a number of other proteins contain this domain . These proteins include thiolesterases, members of the glyoxalase II family, that catalyse the hydrolysis of S-D-lactoyl-glutathione to form glutathione and D-lactic acid and a competence protein that is essential for natural transformation in Neisseria gonorrhoeae and could be a transporter involved in DNA uptake. Except for the competence protein these proteins bind two zinc ions per molecule as cofactor.

    Proteins where this domain is known:
    PY00665    PY00757    PY01776    PY04408    PY05100    PY05503   


    PF00754 - F5_F8_type_C (Pfam link)

    Interpro entry IPR000421 : Coagulation factor 5/8 type, C-terminal (Interpro link)

    Pfam description:
    This domain is also known as the discoidin (DS) domain family.

    Interpro description:
    Blood coagulation factors V and VIII contain a C-terminal, twice repeated, domain of about 150 amino acids, which is called F5/8 type C, FA58C, or C1/C2- like domain. In the Dictyostelium discoideum (Slime mold) cell adhesion protein discoidin, a related domain, named discoidin I-like domain, DLD, or DS, has been found which shares a common C-terminal region of about 110 amino acids with the FA58C domain, but whose N-terminal 40 amino acids are much less conserved. Similar domains have been detected in other extracellular and membrane proteins In coagulation factors V and VIII the repeated domains compose part of a larger functional domain which promotes binding to anionic phospholipids on the surface of platelets and endothelial cells. The C-terminal domain of the second FA58C repeat (C2) of coagulation factor VIII has been shown to be responsible for phosphatidylserine-binding and essential for activity. It forms an amphipathic alpha-helix, which binds to the membrane. FA58C contains two conserved cysteines in most proteins, which link the extremities of the domain by a disulphide bond. A further disulphide bond is located near the C-terminal of the second FA58C domain in MFGM

    Proteins where this domain is known:
    PY01580    PY05090    PY05554   


    PF00762 - Ferrochelatase (Pfam link)

    Interpro entry IPR001015 : Ferrochelatase (Interpro link)

    Interpro description:
    Synonym(s): Protohaem ferro-lyase, Iron chelatase, etc.

    Ferrochelatase catalyses the last step in haem biosynthesis: the chelation of a ferrous ion to proto-porphyrin IX, to form protohaem. In eukaryotic cells, it binds to the mitochondrial inner membrane with its active site on the matrix side of the membrane.

    The X-ray structure of Bacillus subtilis and human ferrochelatase have been solved. The human enzyme exists as a homodimer. Each subunit contains one [Fe2S2] cluster. The monomer is folded into two similar domains, each with a four-stranded parallel beta-sheet flanked by an alpha-helix in a beta-alpha-beta motif that is reminiscent of the fold found in the periplasmic binding proteins. The topological similarity between the domains suggests that they have arisen from a gene duplication event. However, significant differences exist between the two domains, including an N-terminal section (residues 80-130) that forms part of the active site pocket, and a C-terminal extension (residues 390-423) that is involved in coordination of the [Fe2S2]cluster and in stabilisation of the homodimer. The [Fe2S2] cluster ligands are Cys196, Cys403, Cys406 and Cys411. The experiments with Co(II) binding show that His230 and Asp383 are part of the enzyme active site.

    Ferrochelatase seems to have a structurally conserved core region that is common to the enzyme from bacteria, plants and mammals. Porphyrin binds in the identified cleft; this cleft also includes the metal-binding site of the enzyme. It is likely that the structure of the cleft region will have different conformations upon substrate binding and release.

    Proteins where this domain is known:
    PY02777   


    PF00773 - RNB (Pfam link)

    Interpro entry IPR001900 : Ribonuclease II and R (Interpro link)

    Pfam description:
    This domain is the catalytic domain of ribonuclease II.

    Interpro description:

    This group of bacterial and eukaryotic proteins represent both characterised and related sequences to exoribonuclease II (RNase II)and ribonuclease R; a bacterial 3' --> 5' exoribonuclease homologous to RNase II.

    The size of these proteins range from 644 residues (rnb) to 1250 (SSD1). While their sequence is highly divergent they share a conserved domain in their C-terminal section. It is possible that this domain plays a role in the exonuclease function.

    Proteins where this domain is known:
    PY02885    PY03694    PY04959    PY06658   


    PF00781 - DAGK_cat (Pfam link)

    Interpro entry IPR001206 : Diacylglycerol kinase, catalytic region (Interpro link)

    Pfam description:
    Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. The catalytic domain is assumed from the finding of bacterial homologues. YegS is the Escherichia coli protein in this family whose crystal structure reveals an active site in the inter-domain cleft formed by four conserved sequence motifs, revealing a novel metal-binding site. The residues of this site are conserved across the family.

    Interpro description:

    Diacylglycerol kinase (DGK) phosphorylates diacylglycerol (DAG) to yield phosphatidic acid. This enzyme initiates resynthesis of phosphoinositides consumed by phospholipase C during cellular signal transduction. Mammalian DGK consists of nine isozymes encoded by separate genes. In addition to PKC-like zinc fingers and catalytic regions commonly conserved in all DGKs, these isozymes contain a variety of regulatory domains of known and/or predicted functions. The mammalian isozymes are named according to the order of their cDNA cloning and are subdivided into five groups based on their characteristic structural features. Each DGK isozyme is a critical downstream component of a DAG-dependent signalling system.

    Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. The catalytic domain is assumed from the finding of bacterial homologues. YegS is the Escherichia coli protein in this family whose crystal structure reveals an active site in the inter-domain cleft formed by four conserved sequence motifs, revealing a novel metal-binding site. The residues of this site are conserved across the family.

    This domain is usually associated with an accessory domain.

    Proteins where this domain is known:
    PY01867    PY06359   


    PF00782 - DSPc (Pfam link)

    Interpro entry IPR000340 : Protein-tyrosine phosphatase, dual specificity (Interpro link)

    Pfam description:
    Ser/Thr and Tyr protein phosphatases. The enzyme\'s tertiary fold is highly similar to that of tyrosine-specific phosphatases, except for a "recognition" region.

    Interpro description:

    Protein tyrosine (pTyr) phosphorylation is a common post-translational modification which can create novel recognition motifs for protein interactions and cellular localisation, affect protein stability, and regulate enzyme activity. Consequently, maintaining an appropriate level of protein tyrosine phosphorylation is essential for many cellular functions. Tyrosine-specific protein phosphatases (PTPase; catalyse the removal of a phosphate group attached to a tyrosine residue, using a cysteinyl-phosphate enzyme intermediate. These enzymes are key regulatory components in signal transduction pathways (such as the MAP kinase pathway) and cell cycle control, and are important in the control of cell growth, proliferation, differentiation and transformation. The PTP superfamily can be divided into four subfamilies:

    Based on their cellular localisation, PTPases are also classified as:

    All PTPases carry the highly conserved active site motif C(X)5R (PTP signature motif), employ a common catalytic mechanism, and share a similar core structure made of a central parallel beta-sheet with flanking alpha-helices containing a beta-loop-alpha-loop that encompasses the PTP signature motif. Functional diversity between PTPases is endowed by regulatory domains and subunits.

    This entry represents dual specificity protein-tyrosine phosphatases. Ser/Thr and Tyr dual specificity phosphatases are a group of enzymes with both Ser/Thr and tyrosine specific protein phosphatase activity able to remove both the serine/threonine or tyrosine-bound phosphate group from a wide range of phosphoproteins, including a number of enzymes which have been phosphorylated under the action of a kinase. Dual specificity protein phosphatases (DSPs) regulate mitogenic signal transduction and control the cell cycle. The crystal structure of a human DSP, vaccinia H1-related phosphatase (or VHR), has been determined at 2.1 angstrom resolution. A shallow active site pocket in VHR allows for the hydrolysis of phosphorylated serine, threonine, or tyrosine protein residues, whereas the deeper active site of protein tyrosine phosphatases (PTPs) restricts substrate specificity to only phosphotyrosine. Positively charged crevices near the active site may explain the enzyme's preference for substrates with two phosphorylated residues. The VHR structure defines a conserved structural scaffold for both DSPs and PTPs. A "recognition region" connecting helix alpha1 to strand beta1, may determine differences in substrate specificity between VHR, the PTPs, and other DSPs.

    These proteins may also have inactive phosphatase domains, and dependent on the domain composition this loss of catalytic activity has different effects on protein function. Inactive single domain phosphatases can still specifically bind substrates, and protect again dephosphorylation, while the inactive domains of tandem phosphatases can be further subdivided into two classes. Those which bind phosphorylated tyrosine residues may recruit multi-phosphorylated substrates for the adjacent active domains and are more conserved, while the other class have accumulated several variable amino acid substitutions and have a complete loss of tyrosine binding capability. The second class shows a release of evolutionary constraint for the sites around the catalytic centre, which emphasises a difference in function from the first group. There is a region of higher conservation common to both classes, suggesting a new regulatory centre.

    Proteins where this domain is known:
    PY00863    PY03455    PY05421    PY05564   


    PF00787 - PX (Pfam link)

    Interpro entry IPR001683 : Phox-like (Interpro link)

    Pfam description:
    PX domains bind to phosphoinositides.

    Interpro description:

    The PX (phox) domain occurs in a variety of eukaryotic proteins and have been implicated in highly diverse functions such as cell signalling, vesicular trafficking, protein sorting and lipid modification. PX domains are important phosphoinositide-binding modules that have varying lipid-binding specificities. The PX domain is approximately 120 residues long, and folds into a three-stranded beta-sheet followed by three -helices and a proline-rich region that immediately preceeds a membrane-interaction loop and spans approximately eight hydrophobic and polar residues. The PX domain of p47phox binds to the SH3 domain in the same protein. Phosphorylation of p47(phox), a cytoplasmic activator of the microbicidal phagocyte oxidase (phox), elicits interaction of p47(phox) with phoinositides. The protein phosphorylation-driven conformational change of p47(phox) enables its PX domain to bind to phosphoinositides, the interaction of which plays a crucial role in recruitment of p47(phox) from the cytoplasm to membranes and subsequent activation of the phagocyte oxidase. The lipid-binding activity of this protein is normally suppressed by intramolecular interaction of the PX domain with the C-terminal Src homology 3 (SH3) domain.

    The PX domain is conserved from yeast to human. A recent multiple alignment of representative PX domain sequences can be found in, although showing relatively little sequence conservation, their structure appears to be highly conserved. Although phosphatidylinositol-3-phosphate (PtdIns(3)P) is the primary target of PX domains, binding to phosphatidic acid, phosphatidylinositol-3,4-bisphosphate (PtdIns(3,4)P2), phosphatidylinositol-3,5-bisphosphate (PtdIns(3,5)P2), phosphatidylinositol-4,5-bisphosphate (PtdIns(4,5)P2), and phosphatidylinositol-3,4,5-trisphosphate (PtdIns(3,4,5)P3) has been reported as well. The PX-domain is also a protein-protein interaction domain.

    Proteins where this domain is known:
    PY01946   


    PF00789 - UBX (Pfam link)

    Interpro entry IPR001012 : (Interpro link)

    Pfam description:
    This domain is present in ubiquitin-regulatory proteins and is a general Cdc48-interacting module.

    Interpro description:
    The UBX domain is found in ubiquitin-regulatory proteins, which are members of the ubiquitination pathway, as well as a number of other proteins including FAF-1 (FAS-associated factor 1), the human Rep-8 reproduction protein and several hypothetical proteins from yeast. The function of the UBX domain is not known although the fragment of avian FAF-1 containing the UBX domain causes apoptosis of transfected cells.

    Proteins where this domain is known:
    PY01891    PY05236   


    PF00792 - PI3K_C2 (Pfam link)

    Interpro entry IPR002420 : Phosphoinositide 3-kinase, C2 (Interpro link)

    Pfam description:
    Phosphoinositide 3-kinase region postulated to contain a C2 domain. Outlier of Pfam:PF00168 family.

    Interpro description:

    Phosphatidylinositol 3-kinase (PI3-kinase) is an enzyme that phosphorylates phosphoinositides on the 3-hydroxyl group of the inositol ring. The usually N-terminal C2 domain interacts mainly with the scaffolding helical domain of the enzyme, and exhibits only minor interactions with the catalytic domain. The domain consists of two four-stranded antiparallel beta-sheets that form a beta-sandwich. Isolated C2 domain binds multilamellar phospholipid vesicles which suggests that this domain could play a role in membrane association. Membrane attachment by C2 domains is typically mediated by the loops connecting beta-strand regions that in other C2 domain-containing proteins are calcium-binding region

    Proteins where this domain has been detected by our approach:
    PY00334   


    PF00804 - Syntaxin (Pfam link)

    Interpro entry IPR006011 : Syntaxin, N-terminal (Interpro link)

    Pfam description:
    Syntaxins are the prototype family of SNARE proteins. They usually consist of three main regions - a C-terminal transmembrane region, a central SNARE domain which is characteristic of and conserved in all syntaxins (Pfam:PF05739), and an N-terminal domain that is featured in this entry. This domain varies between syntaxin isoforms; in syntaxin 1A (Swiss:O35526) it is found as three alpha-helices with a left-handed twist. It may fold back on the SNARE domain to allow the molecule to adopt a \'closed\' configuration that prevents formation of the core fusion complex - it thus has an auto-inhibitory role. The function of syntaxins is determined by their localisation. They are involved in neuronal exocytosis, ER-Golgi transport and Golgi-endosome transport, for example. They also interact with other proteins as well as those involved in SNARE complexes. These include vesicle coat proteins, Rab GTPases, and tethering factors.

    Interpro description:

    Syntaxins A and B are nervous system-specific proteins implicated in the docking of synaptic vesicles with the presynaptic plasma membrane. Syntaxins are a family of receptors for intracellular transport vesicles. Each target membrane may be identified by a specific member of the syntaxin family. Members of the syntaxin family have a size ranging from 30 Kd to 40 Kd; a C-terminal extremity which is highly hydrophobic and anchors the protein on the cytoplasmic surface of cellular membranes; a central, well conserved region, which seems to be in a coiled-coil conformation.

    Proteins where this domain is known:
    PY03571   


    PF00806 - PUF (Pfam link)

    Interpro entry IPR001313 : Pumilio RNA-binding region (Interpro link)

    Pfam description:
    Puf repeats (aka PUM-HD, Pumilio homology domain) are necessary and sufficient for sequence specific RNA binding in fly Pumilio and worm FBF-1 and FBF-2. Both proteins function as translational repressors in early embryonic development by binding sequences in the 3\' UTR of target mRNAs (e.g. the nanos response element (NRE) in fly Hunchback mRNA, or the point mutation element (PME) in worm fem-3 mRNA). Other proteins that contain Puf domains are also plausible RNA binding proteins. Swiss:P47135, for instance, appears to also contain a single RRM domain by HMM analysis. Puf domains usually occur as a tandem repeat of 8 domains. The Pfam model does not necessarily recognise all 8 repeats in all sequences; some sequences appear to have 5 or 6 repeats on initial analysis, but further analysis suggests the presence of additional divergent repeats. Structures of PUF repeat proteins show they consist of a two helix structure.

    Interpro description:

    The drosophila pumilio gene codes for an unusual protein that binds through the Puf domain that usually occurs as a tandem repeat of eight domains. The FBF-2 protein of Caenorhabditis elegans also has a Puf domain. Both proteins function as translational repressors in early embryonic development by binding sequences in the 3' UTR of target mRNAs. The same type of repetitive domain has been found in in a number of other proteins from all eukaryotic kingdoms. The Puf proteins characterised to date have been reported to bind to 3'-untranslated region (UTR) sequences encompassing a so-called UGUR tetranucleotide motif and thereby to repress gene expression by affecting mRNA translation or stability.

    In Saccharomyces cerevisiae (Baker's yeast), five proteins, termed Puf1p to Puf5p, bear six to eight Puf repeats. Puf3p binds nearly exclusively to cytoplasmic mRNAs that encode mitochondrial proteins; Puf1p and Puf2p interact preferentially with mRNAs encoding membrane-associated proteins; Puf4p preferentially binds mRNAs encoding nucleolar ribosomal RNA-processing factors; and Puf5p is associated with mRNAs encoding chromatin modifiers and components of the spindle pole body. This suggests the existence of an extensive network of RNA-protein interactions that coordinate the post-transcriptional fate of large sets of cytotopically and functionally related RNAs through each stage of its lifecycle.

    Proteins where this domain is known:
    PY04072    PY04369   


    PF00808 - CBFD_NFYB_HMF (Pfam link)

    Interpro entry IPR003958 : Transcription factor CBF/NF-Y/archaeal histone (Interpro link)

    Pfam description:
    This family includes archaebacterial histones and histone like transcription factors from eukaryotes.

    Interpro description:

    The CCAAT-binding factor (CBF) is a mammalian transcription factor that binds to a CCAAT motif in the promoters of a wide variety of genes, including type I collagen and albumin. The factor is a heteromeric complex of A and B subunits, both of which are required for DNA-binding. The subunits can interact in the absence of DNA-binding, conserved regions in each being important in mediating this interaction.

    The A subunit can be split into 3 domains on the basis of sequence similarity, a non-conserved N-terminal 'A domain'; a highly-conserved central 'B domain' involved in DNA-binding; and a C-terminal 'C domain', which contains a number of glutamine and acidic residues involved in protein-protein interactions. The A subunit shows striking similarity to the HAP3 subunit of the yeast CCAAT-binding heterotrimeric transcription factor. The Kluyveromyces lactis HAP3 protein has been predicted to contain a 4-cysteine zinc finger, which is thought to be present in similar HAP3 and CBF subunit A proteins, in which the third cysteine is replaced by a serine. This domain is found in the CCAAT transcription factor and archaeal histones.

    Proteins where this domain is known:
    PY05259    PY06256   


    PF00809 - Pterin_bind (Pfam link)

    Interpro entry IPR000489 : Dihydropteroate synthase, DHPS (Interpro link)

    Pfam description:
    This family includes a variety of pterin binding enzymes that all adopt a TIM barrel fold. The family includes dihydropteroate synthase EC:2.5.1.15 as well as a group methyltransferase enzymes including methyltetrahydrofolate, corrinoid iron-sulfur protein methyltransferase (MeTr) Swiss:Q46389 that catalyses a key step in the Wood-Ljungdahl pathway of carbon dioxide fixation. It transfers the N5-methyl group from methyltetrahydrofolate (CH3-H4folate) to a cob(I)amide centre in another protein, the corrinoid iron-sulfur protein. MeTr is a member of a family of proteins that includes methionine synthase and methanogenic enzymes that activate the methyl group of methyltetra-hydromethano(or -sarcino)pterin.

    Interpro description:

    All organisms require reduced folate cofactors for the synthesis of a variety of metabolites. Most microorganisms must synthesize folate de novo because they lack the active transport system of higher vertebrate cells that allows these organisms to use dietary folates. Proteins containing this domain include dihydropteroate synthase as well as a group of methyltransferase enzymes including methyltetrahydrofolate, corrinoid iron-sulphur protein methyltransferase (MeTr)that catalyses a key step in the Wood-Ljungdahl pathway of carbon dioxide fixation.

    Dihydropteroate synthase (DHPS) catalyses the condensation of 6-hydroxymethyl-7,8-dihydropteridine pyrophosphate to para-aminobenzoic acid to form 7,8-dihydropteroate. This is the second step in the three-step pathway leading from 6-hydroxymethyl-7,8-dihydropterin to 7,8-dihydrofolate. DHPS is the target of sulphonamides, which are substrate analogues that compete with para-aminobenzoic acid. Bacterial DHPS (gene sul or folP) is a protein of about 275 to 315 amino acid residues that is either chromosomally encoded or found on various antibiotic resistance plasmids. In the lower eukaryote Pneumocystis carinii, DHPS is the C-terminal domain of a multifunctional folate synthesis enzyme (gene fas).

    Proteins where this domain is known:
    PY02226   


    PF00810 - ER_lumen_recept (Pfam link)

    Interpro entry IPR000133 : ER lumen protein retaining receptor (Interpro link)

    Interpro description:

    Proteins resident in the lumen of the endoplasmic reticulum (ER) contain a C-terminal tetrapeptide, commonly known as Lys-Asp-Glu-Leu (KDEL) in mammals and His-Asp-Glu-Leu (HDEL) in yeast (Saccharomyces cerevisiae) that acts as a signal for their retrieval from subsequent compartments of the secretory pathway. The receptor for this signal is a ~26 kDa Golgi membrane protein, initially identified as the ERD2 gene product in S. cerevisiae. The receptor molecule, known variously as the ER lumen protein retaining receptor or the 'KDEL receptor', is believed to cycle between the cis side of the Golgi apparatus and the ER. It has also been characterised in a number of other species, including plants, Plasmodium, Drosophila and mammals. In mammals, 2 highly related forms of the receptor are known.

    The KDEL receptor is a highly hydrophobic protein of 220 residues; its sequence exhibits 7 hydrophobic regions, all of which have been suggested to traverse the membrane. More recently, however, it has been suggested that only 6 of these regions are transmembrane (TM), resulting in both N- and C-termini on the cytoplasmic side of the membrane.

    Proteins where this domain is known:
    PY05552    PY06403   


    PF00814 - Peptidase_M22 (Pfam link)

    Interpro entry IPR000905 : Peptidase M22, glycoprotease (Interpro link)

    Pfam description:
    The Peptidase M22 proteins are part of the HSP70-actin superfamily ). The region represented here is an insert into the fold and is not found in the rest of the family (beyond the Peptidase M22 family). Included in this family are the Rhizobial NodU proteins and the HypF regulator. This region also contains the histidine dyad believed to coordinate the metal ion and hence provide catalytic activity. Interestingly the histidines are not well conserved, and there is a lack of experimental evidence to support peptidase activity as a general property of this family. There also appear to be instances of this domain outside of the HSP70-actin superfamily (e.g. Swiss:Q9ZM49).

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of metallopeptidases belong to MEROPS peptidase family M22 (clan MK). The type example being O-sialoglycoprotein endopeptidase from Pasteurella haemolytica (Mannheimia haemolytica).

    O-Sialoglycoprotein endopeptidase is secreted by the bacterium P. haemolytica, and digests only proteins that are heavily sialylated, in particular those with sialylated serine and threonine residues. Substrate proteins include glycophorin A and leukocyte surface antigens CD34, CD43, CD44 and CD45. Removal of glycosylation, by treatment with neuraminidase, completely negates susceptibility to O-sialoglycoprotein endopeptidase digestion.

    Sequence similarity searches have revealed other members of the M22 family, from yeast, Mycobacterium, Haemophilus influenzae and the cyanobacterium Synechocystis. The zinc-binding and catalytic residues of this family have not been determined, although the motif HMEGH may be a zinc-binding region.

    Proteins where this domain is known:
    PY00451    PY00526   


    PF00827 - Ribosomal_L15e (Pfam link)

    Interpro entry IPR000439 : Ribosomal protein L15e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. One of these families consists of:

  • Mammalian L15.
  • Insect L15.
  • Plant L15.
  • Yeast YL10 (L13) (Rp15r).
  • Archaebacterial L15e.
  • These proteins have about 200 amino acid residues.

    Proteins where this domain is known:
    PY02811   


    PF00828 - Ribosomal_L18e (Pfam link)

    Interpro entry IPR000039 : Ribosomal protein L18e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Members of this family are large subunit ribosomal proteins which are found in the Eukaryota and Archaea. These proteins have 115 to 187 amino-acid residues. The family consists of:

    Proteins where this domain is known:
    PY01390   


    PF00829 - Ribosomal_L21p (Pfam link)

    Interpro entry IPR001787 : Ribosomal protein L21 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L21 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L21 is known to bind to the 23S rRNA in the presence of L20. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups:

    Bacterial L21 is a protein of about 100 amino-acid residues, the mature form of the spinach chloroplast L21 has 200 residues.

    Proteins where this domain is known:
    PY04436    PY04493   


    PF00830 - Ribosomal_L28 (Pfam link)

    Interpro entry IPR001383 : Ribosomal protein L28 (Interpro link)

    Pfam description:
    The ribosomal 28 family includes L28 proteins from bacteria and chloroplasts. The L24 protein from yeast Swiss:P36525 also contains a region of similarity to prokaryotic L28 proteins. L24 from yeast is also found in the large ribosomal subunit

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    The ribosomal L28 protein family include proteins from bacteria and chloroplasts. The L24 protein from yeast, found in the large subunit of the mitochodrial ribosome, contains a region similar to the bacterial L28 protein.

    Proteins where this domain is known:
    PY04664   


    PF00831 - Ribosomal_L29 (Pfam link)

    Interpro entry IPR001854 : Ribosomal protein L29 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L29 is one of the proteins from the large ribosomal subunit. L29 belongs to a family of ribosomal proteins of 63 to 138 amino-acid residues which, on the basis of sequence similarities, groups:

    Proteins where this domain is known:
    PY03511   


    PF00833 - Ribosomal_S17e (Pfam link)

    Interpro entry IPR001210 : Ribosomal protein S17e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaebacterial ribosomal proteins can be grouped in this family of ribosomal proteins, S17e. They include, vertebrate, Drosophila and Neurospora crassa (crp-3) S17's as well as yeast S17a (RP51A) and S17b (RP51B) and archaebacterial S17e.

    Proteins where this domain is known:
    PY04103   


    PF00834 - Ribul_P_3_epim (Pfam link)

    Interpro entry IPR000056 : Ribulose-phosphate 3-epimerase (Interpro link)

    Pfam description:
    This enzyme catalyses the conversion of D-ribulose 5-phosphate into D-xylulose 5-phosphate.

    Interpro description:
    Ribulose-phosphate 3-epimerase (also known as pentose-5-phosphate 3-epimerase or PPE) is the enzyme that converts D-ribulose 5-phosphate into D-xylulose 5-phosphate in Calvin's reductive pentose phosphate cycle. In Ralstonia eutropha (Alcaligenes eutrophus) two copies of the gene coding for PPE are known, one is chromosomally encoded the other one is on a plasmid PPE has been found in a wide range of bacteria, archaebacteria, fungi and plants. All the proteins have from 209 to 241 amino acid residues. The enzyme has a TIM barrel structure.

    Proteins where this domain is known:
    PY01807   


    PF00838 - TCTP (Pfam link)

    Interpro entry IPR001983 : Translationally controlled tumour-associated TCTP (Interpro link)

    Interpro description:

    Mammalian translationally controlled tumour protein (TCTP) (or P23) is a protein which has been found to be preferentially synthesised in cells during the early growth phase of some types of tumour, but which is also expressed in normal cells. The physiological function of TCTP is still not known. It was first identified as a histamine-releasing factor, acting in IgE +-dependent allergic reactions. In addition, TCTP has been shown to bind to tubulin in the cytoskeleton, has a high affinity for calcium, is the binding target for the antimalarial compound artemisinin, and is induced in vitamin D-dependent apoptosis. TCTP production is thought to be controlled at the translational as well as the transcriptional level.

    TCTP is a hydrophilic protein of 18 to 20 Kd. TCTPs do not share significant sequence similarity with any other class of proteins. Recently, the structure of TCTP was determined and exhibited significant structural similarity to the human protein Mss4, which is a guanine nucleotide-free chaperone of the Rab protein. Close homologues have been found in plants, earthworm, Caenorhabditis elegans (F52H2.11), Hydra, Saccharomyces cerevisiae (YKL056c) and Schizosaccharomyces pombe (SpAC1F12.02c).

    Proteins where this domain is known:
    PY04896   


    PF00847 - AP2 (Pfam link)

    Interpro entry IPR001471 : Pathogenesis-related transcriptional factor and ERF, DNA-binding (Interpro link)

    Pfam description:
    This 60 amino acid residue domain can bind to DNA and is found in transcription factor proteins.

    Interpro description:

    Pathogenesis-related genes transcriptional activator binds to the GCC-box pathogenesis-related promoter element and activates the plant's defence genes. Ethylene, chemically the simplest plant hormone, participates in a number of stress responses and developmental processes: e.g., fruit ripening, inhibition of stem and root elongation, promotion of seed germination and flowering, senescence of leaves and flowers, and sex determination. DNA sequence elements that confer ethylene responsiveness have been shown to contain two 11bp GCC boxes, which are necessary and sufficient for transcriptional control by ethylene. Ethylene responsive element binding proteins (EREBPs) have now been identified in a variety of plants. The proteins share a similar domain of around 59 amino acids, which interacts directly with the GCC box in the ERE.

    Proteins where this domain is known:
    PY00007    PY00247    PY00668    PY00689    PY00769    PY00782    PY01086    PY01234    PY01345    PY01583    PY03117    PY03447    PY03803    PY05930    PY06100    PY06328    PY06790    PY06900    PY06979    PY07039    PY07205    PY07598   


    PF00849 - PseudoU_synth_2 (Pfam link)

    Interpro entry IPR006145 : Pseudouridine synthase (Interpro link)

    Pfam description:
    Members of this family are involved in modifying bases in RNA molecules. They carry out the conversion of uracil bases to pseudouridine. This family includes RluD Swiss:P33643, a pseudouridylate synthase that converts specific uracils to pseudouridine in 23S rRNA. RluA from E. coli converts bases in both rRNA and tRNA.

    Interpro description:
    Pseudouridine synthases are responsible for synthesis of pseudouridine from uracil in 23S rRNA. Proteins belonging to the family of pseudouridine synthases have been shown to share regions of similarities. These include Escherichia coli and Haemophilus influenzae ribosomal large subunit pseudouridine synthase A (gene rluA), C (gene rluC) and D (gene rluD); yeast DRAP deaminase (gene RIB2); Escherichia coli hypothetical protein yqcB and HI1435, the corresponding Haemophilus influenzae protein; Bacillus subtilis hypothetical proteins yhcT, yjbO and ylyB; Helicobacter pylori hypothetical proteins HP0347; HP0745 and HP0956; Mycoplasma genitalium hypothetical proteins MG209 and MG370; Synechocystis sp. (strain PCC 6803) hypothetical proteins slr1592 and slr1629; yeast hypothetical proteins YDL036c, YGR169c and SpAC18B11.02c; and Caenorhabditis elegans hypothetical protein K07E8.7. These are proteins of from 21 to 50 kDa which contain a number of conserved regions in their central section. This domain includes members of both the Rsu and Rlu families.

    Proteins where this domain is known:
    PY00873    PY02484    PY04000    PY04257    PY04495   


    PF00850 - Hist_deacetyl (Pfam link)

    Interpro entry IPR000286 : (Interpro link)

    Pfam description:
    Histones can be reversibly acetylated on several lysine residues. Regulation of transcription is caused in part by this mechanism. Histone deacetylases catalyse the removal of the acetyl group. Histone deacetylases are related to other proteins.

    Interpro description:
    Histones can be reversibly acetylated on several lysine residues. Regulation of transcription is caused in part by this mechanism. Histone deacetylases catalyse the removal of the acetyl group. Histone deacetylases, acetoin utilization proteins and acetylpolyamine amidohydrolases are all members of this ancient protein superfamily.

    Proteins where this domain is known:
    PY03877    PY05395    PY06259    PY07179   


    PF00856 - SET (Pfam link)

    Interpro entry IPR001214 : (Interpro link)

    Pfam description:
    SET domains are protein lysine methyltransferase enzymes. SET domains appear to be protein-protein interaction domains. It has been demonstrated that SET domains mediate interactions with a family of proteins that display similarity with dual-specificity phosphatases (dsPTPases). A subset of SET domains have been called PR domains. These domains are divergent in sequence from other SET domains, but also appear to mediate protein-protein interaction. The SET domain consists of two regions known as SET-N and SET-C. SET-C forms an unusual and conserved knot-like structure of probably functional importance. Additionally to SET-N and SET-C, an insert region (SET-I) and flanking regions of high structural variability form part of the overall structure.

    Interpro description:

    The SET domain appears generally as one part of a larger multidomain protein, and recently there were described three structures of very different proteins with distinct domain compositions: Neurospora crassa DIM-5, a member of the Su(var) family of HKMTs which methylate histone H3 on lysine 9,human SET7 (also called SET9), which methylates H3 on lysine 4 and garden pea Rubisco LSMT, an enzyme that does not modify histones, but instead methylates lysine 14 in the flexible tail of the large subunit of the enzyme Rubisco. The SET domain itself turned out to be an uncommon structure. Although in all three studies, electron density maps revealed the location of the AdoMet or AdoHcy cofactor, the SET domain bears no similarity at all to the canonical/AdoMet-dependent methyltransferase fold. Strictly conserved in the C-terminal motif of the SET domain tyrosine could be involved in abstracting a proton from the protonated amino group of the substrate lysine, promoting its nucleophilic attack on the sulphonium methyl group of the AdoMet cofactor. In contrast to the AdoMet-dependent protein methyltranferases of the classical type, which tend to bind their polypeptide substrates on top of the cofactor, it is noted from the Rubisco LSMT structure that the AdoMet seems to bind in a separate cleft, suggesting how a polypeptide substrate could be subjected to multiple rounds of methylation without having to be released from the enzyme. In contrast, SET7/9 is able to add only a single methyl group to its substrate. It has been demonstrated that association of SET domain and myotubularin-related proteins modulates growth control. The SET domain-containing Drosophila melanogaster (Fruit fly) protein, enhancer of zeste, has a function in segment determination and the mammalian homologue may be involved in the regulation of gene transcription and chromatin structure.

    Histone lysine methylation is part of the histone code that regulated chromatin function and epigenetic control of gene function. Histone lysine methyltransferases (HMTase) differ both in their substrate specificity for the various acceptor lysines as well as in their product specificity for the number of methyl groups (one, two, or three) they transfer. With just one exception, the HMTases belong to SET family that can be classified according to the sequences surrounding the SET domain. Structural studies on the human SET7/9, a mono-methylase, have revealed the molecular basis for the specificity of the enzyme for the histone-target and the roles of the invariant residues in the SET domain in determining the methylation specificities.

    The pre-SET domain, as found in the SUV39 SET family, contains nine invariant cysteine residues that are grouped into two segments separated by a region of variable length. These 9 cysteines coordinate 3 zinc ions to form to form a triangular cluster, where each of the zinc ions is coordinated by 4 four cysteines to give a tetrahedral configuration. The function of this domain is structural, holding together 2 long segments of random coils.

    The C-terminal region including the post-SET domain is disordered when not interacting with a histone tail and in the absence of zinc. The three conserved cysteines in the post-SET domain form a zinc-binding site when coupled to a fourth conserved cysteine in the knot-like structure close to the SET domain active site. The structured post-SET region brings in the C-terminal residues that participate in S-adenosylmethine-binding and histone tail interactions. The three conserved cysteine residues are essential for HMTase activity, as replacement with serine abolishes HMTase activity.

    Proteins where this domain is known:
    PY00637    PY01349    PY02230    PY04885   

    Proteins where this domain has been detected by our approach:
    PY03491   


    PF00857 - Isochorismatase (Pfam link)

    Interpro entry IPR000868 : Isochorismatase hydrolase (Interpro link)

    Pfam description:
    This family are hydrolase enzymes.

    Interpro description:
    This is a family of hydrolase enzymes. Isochorismatase, also known as 2,3 dihydro-2,3 dihydroxybenzoate synthase catalyses the conversion of isochorismate, in the presence of water, to 2,3-dihydroxybenzoate and pyruvate.

    Proteins where this domain is known:
    PY05269   


    PF00861 - Ribosomal_L18p (Pfam link)

    Interpro entry IPR005484 : Ribosomal protein L18/L5 (Interpro link)

    Pfam description:
    This family includes ribosomal proteins from the large subunit. This family includes L18 from bacteria and L5 from eukaryotes. It has been shown that the amino terminal 93 amino acids of Swiss:P09895 are necessary and sufficient to bind 5S rRNA in vitro. Suggesting that the entire family has a function in rRNA binding.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This family includes L18 from bacteria and L5 from eukaryotes. The ribosomal 5S RNA is the only known rRNA species to bind a ribosomal protein before its assembly into the ribosomal subunits . In eukaryotes, the 5S rRNA molecule binds one protein species, a 34-kDa protein which has been implicated in the intracellular transport of 5 S rRNA, while in bacteria it binds two or three different protein species .

    Proteins where this domain is known:
    PY01671    PY06459   


    PF00867 - XPG_I (Pfam link)

    Interpro entry IPR006086 : XPG I (Interpro link)

    Interpro description:

    Xeroderma pigmentosum (XP) is a human autosomal recessive disease, characterised by a high incidence of sunlight-induced skin cancer. People's skin cells with this condition are hypersensitive to ultraviolet light, due to defects in the incision step of DNA excision repair. There are a minimum of seven genetic complementation groups involved in this pathway: XP-A to XP-G. XP-G is one of the most rare and phenotypically heterogeneous of XP, showing anything from slight to extreme dysfunction in DNA excision repair. XP-G can be corrected by a 133 Kd nuclear protein, XPGC. XPGC is an acidic protein that confers normal UV resistance in expressing cells. It is a magnesium-dependent, single-strand DNA endonuclease that makes structure-specific endonucleolytic incisions in a DNA substrate containing a duplex region and single-stranded arms. XPGC cleaves one strand of the duplex at the border with the single-stranded region.

    XPG belongs to a family of proteins that includes RAD2 from Saccharomyces cerevisiae (Baker's yeast) and rad13 from Schizosaccharomyces pombe (Fission yeast), which are single-stranded DNA endonucleases; mouse and human FEN-1, a structure-specific endonuclease; RAD2 from fission yeast and RAD27 from budding yeast; fission yeast exo1, a 5'-3' double-stranded DNA exonuclease that may act in a pathway that corrects mismatched base pairs; yeast DHS1, and yeast DIN7. Sequence alignment of this family of proteins reveals that similarities are largely confined to two regions. The first is located at the N-terminal extremity (N-region) and corresponds to the first 95 to 105 amino acids. The second region is internal (I-region) and found towards the C-terminus; it spans about 140 residues and contains a highly conserved core of 27 amino acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). It is possible that the conserved acidic residues are involved in the catalytic mechanism of DNA excision repair in XPG. The amino acids linking the N- and I-regions are not conserved.

    Proteins where this domain is known:
    PY00765    PY01122    PY02238   


    PF00875 - DNA_photolyase (Pfam link)

    Interpro entry IPR006050 : DNA photolyase, N-terminal (Interpro link)

    Pfam description:
    This domain binds a light harvesting cofactor.

    Interpro description:

    DNA photolyases are enzymes that bind to DNA containing pyrimidine dimers: on absorption of visible light, they catalyse dimer splitting into the constituent monomers, a process called photoreactivation. This is a DNA repair mechanism, repairing mismatched pyrimidine dimers induced by exposure to ultra-violet light. The precise mechanisms involved in substrate binding, conversion of light energy to the mechanical energy needed to rupture the cyclobutane ring, and subsequent release of the product are uncertain. Analysis of DNA lyases has revealed the presence of an intrinsic chromophore, all monomers containing a reduced FAD moiety, and, in addition, either a reduced pterin or 8-hydroxy-5-diazaflavin as a second chromophore. Either chromophore may act as the primary photon acceptor, peak absorptions occurring in the blue region of the spectrum and in the UV-B region, at a wavelength around 290nm.

    This domain binds a light harvesting cofactor.

    Proteins where this domain has been detected by our approach:
    PY02821   


    PF00883 - Peptidase_M17 (Pfam link)

    Interpro entry IPR000819 : Peptidase M17, leucyl aminopeptidase, C-terminal (Interpro link)

    Pfam description:
    The two associated zinc ions and the active site are entirely enclosed within the C-terminal catalytic domain in leucine aminopeptidase.

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of metallopeptidases belong to the MEROPS peptidase family M17 (leucyl aminopeptidase family, clan MF), the type example being leucyl aminopeptidase from Bos taurus (Bovine).

    Aminopeptidases are exopeptidases involved in the processing and regular turnover of intracellular proteins, although their precise role in cellular metabolism is unclear. Leucine aminopeptidases cleave leucine residues from the N-terminal of polypeptide chains, but substantial rates are evident for all amino acids.

    The enzymes exist as homo-hexamers, comprising 2 trimers stacked on top of one another. Each monomer binds 2 zinc ions and folds into 2 alpha/beta-type quasi-spherical globular domains, producing a comma-like shape. The N-terminal 150 residues form a 5-stranded beta-sheet with 4 parallel and 1 anti-parallel strand sandwiched between 4 alpha-helices. An alpha-helix extends into the C-terminal domain, which comprises a central 8-stranded saddle-shaped beta-sheet sandwiched between groups of helices, forming the monomer hydrophobic core. A 3-stranded beta-sheet resides on the surface of the monomer, where it interacts with other members of the hexamer. The 2 zinc ions and the active site are entirely located in the C-terminal catalytic domain.

    Proteins where this domain is known:
    PY01898   


    PF00887 - ACBP (Pfam link)

    Interpro entry IPR000582 : Acyl-CoA-binding protein, ACBP (Interpro link)

    Interpro description:

    Acyl-CoA-binding protein (ACBP) is a small (10 Kd) protein that binds medium- and long-chain acyl-CoA esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters. ACBP is also known as diazepam binding inhibitor (DBI) or endozepine (EP) because of its ability to displace diazepam from the benzodiazepine (BZD) recognition site located on the GABA type A receptor. It is therefore possible that this protein also acts as a neuropeptide to modulate the action of the GABA receptor.

    ACBP is a highly conserved protein of about 90 residues that is found in all four eukaryotic kingdoms, Animalia, Plantae, Fungi and Protista, and in some eubacterial species.

    Although ACBP occurs as a completely independent protein, intact ACB domains have been identified in a number of large, multifunctional proteins in a variety of eukaryotic species. These include large membrane-associated proteins with N-terminal ACB domains, multifunctional enzymes with both ACB and peroxisomal enoyl-CoA Delta(3), Delta(2)-enoyl-CoA isomerase domains, and proteins with both an ACB domain and ankyrin repeats.

    The ACB domain consists of four alpha-helices arranged in a bowl shape with a highly exposed acyl-CoA-binding site. The ligand is bound through specific interactions with residues on the protein, most notably several conserved positive charges that interact with the phosphate group on the adenosine-3'phosphate moiety, and the acyl chain is sandwiched between the hydrophobic surfaces of CoA and the protein.

    Other proteins containing an ACB domain include:

    Proteins where this domain is known:
    PY01656   

    Proteins where this domain has been detected by our approach:
    PY04674   


    PF00888 - Cullin (Pfam link)

    Interpro entry IPR001373 : Cullin, N-terminal region (Interpro link)

    Interpro description:

    Cullins are a family of hydrophobic proteins that act as scaffolds for ubiquitin ligases (E3). Cullins are found throughout eukaryotes. Humans express seven cullins (Cul1, 2, 3, 4A, 4B, 5 and 7), each forming part of a multi-subunit ubiquitin complex. Cullin-RING ubiquitin ligases (CRLs), such as Cul1 (SCF), play an essential role in targeting proteins for ubiquitin-mediated destruction; as such, they are diverse in terms of composition and function, regulating many different processes from glucose sensing and DNA replication to limb patterning and circadian rhythms. The catalytic core of CRLs consists of a RING protein and a cullin family member. For Cul1, the C-terminal cullin-homology domain binds the RING protein. The RING protein appears to function as a docking site for ubiquitin-conjugating enzymes (E2s). Other proteins contain a cullin-homology domain, such as the APC2 subunit of the anaphase-promoting complex/cyclosome and the p53 cytoplasmic anchor PARC; both APC2 and PARC have ubiquitin ligase activity. The N-terminal region of cullins is more variable, and is used to interact with specific adaptor proteins.

    This entry represents the N-terminal region of cullin proteins, which consists of several domains, including cullin repeat domain, a 4-helical bundle domain, an alpha+beta domain, and a winged helix-like domain.

    Proteins where this domain is known:
    PY01171    PY02229   


    PF00889 - EF_TS (Pfam link)

    Interpro entry IPR014039 : Translation elongation factor EFTs/EF1B, dimerisation (Interpro link)

    Interpro description:

    Translation elongation factors are responsible for two main processes during protein synthesis on the ribosome. EF1A (or EF-Tu) is responsible for the selection and binding of the cognate aminoacyl-tRNA to the A-site (acceptor site) of the ribosome. EF2 (or EF-G) is responsible for the translocation of the peptidyl-tRNA from the A-site to the P-site (peptidyl-tRNA site) of the ribosome, thereby freeing the A-site for the next aminoacyl-tRNA to bind. Elongation factors are responsible for achieving accuracy of translation and both EF1A and EF2 are remarkably conserved throughout evolution.

    Elongation factor EF1B (also known as EF-Ts or EF-1beta/gamma/delta) is a nucleotide exchange factor that is required to regenerate EF1A from its inactive form (EF1A-GDP) to its active form (EF1A-GTP). EF1A is then ready to interact with a new aminoacyl-tRNA to begin the cycle again. EF1B is more complex in eukaryotes than in bacteria, and can consist of three subunits: EF1B-alpha (or EF-1beta), EF1B-gamma (or EF-1gamma) and EF1B-beta (or EF-1delta).

    This entry represents the C-terminal dimerisation domain found primarily in EF-Tu (EF1A) proteins from bacteria, mitochondria and chloroplasts.

    More information about these proteins can be found at Protein of the Month: Elongation Factors.

    Proteins where this domain is known:
    PY07445   


    PF00890 - FAD_binding_2 (Pfam link)

    Interpro entry IPR003953 : Fumarate reductase/succinate dehydrogenase flavoprotein, N-terminal (Interpro link)

    Pfam description:
    This family includes members that bind FAD. This family includes the flavoprotein subunits from succinate and fumarate dehydrogenase, aspartate oxidase and the alpha subunit of adenylylsulphate reductase.

    Interpro description:

    In bacteria two distinct, membrane-bound, enzyme complexes are responsible for the interconversion of fumarate and succinate : fumarate reductase (Frd) is used in anaerobic growth, and succinate dehydrogenase (Sdh) is used in aerobic growth. Both complexes consist of two main components: a membrane-extrinsic component composed of a FAD-binding flavoprotein and an iron-sulphur protein; and an hydrophobic component composed of a membrane anchor protein and/or a cytochrome B.

    In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone) is an enzyme composed of two subunits: a FAD flavoprotein and and iron-sulphur protein.

    The flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is covalently bound to a histidine residue which is located in the N-terminal section of the protein. The sequence around that histidine is well conserved in Frd and Sdh from various bacterial and eukaryotic species.

    This family includes members that bind FAD such as the flavoprotein subunits from succinate and fumarate dehydrogenase, aspartate oxidase and the alpha subunit of adenylylsulphate reductase.

    Proteins where this domain is known:
    PY05468   


    PF00892 - DUF6 (Pfam link)

    Interpro entry IPR000620 : Protein of unknown function DUF6, transmembrane (Interpro link)

    Pfam description:
    This family includes many hypothetical membrane proteins of unknown function. Many of the proteins contain two copies of the aligned region.

    Interpro description:
    This domain is found in proteins including the Erwinia chrysanthemi PecM protein, which is involved in pectinase, cellulase and blue pigment regulation; and the Salmonella typhimurium PagO protein, the function of which is unknown. Many members of this family have no known function and are predicted to be integral membrane proteins and many of the proteins contain two copies of the domain.

    Proteins where this domain is known:
    PY00389   

    Proteins where this domain has been detected by our approach:
    PY01812   


    PF00899 - ThiF (Pfam link)

    Interpro entry IPR000594 : UBA/THIF-type NAD/FAD binding fold (Interpro link)

    Pfam description:
    This family contains a repeated domain in ubiquitin activating enzyme E1 and members of the bacterial ThiF/MoeB/HesA family.

    Interpro description:
    Ubiquitin-activating enzyme (E1 enzyme) activates ubiquitin by first adenylating with ATP its C-terminal glycine residue and thereafter linking this residue to the side chain of a cysteine residue in E1, yielding an ubiquitin-E1 thiolester and free AMP. Later the ubiquitin moiety is transferred to a cysteine residue on one of the many forms of ubiquitin- conjugating enzymes (E2).

    The family of ubiquitin-activating enzymes shares in its catalytic domain significant similarity with a large family of NAD/FAD-binding proteins. This domain is based on the common NAD/FAD-binding fold and finds members of several families, including UBA ubiquitin activating enzymes; the hesA/moeB/thiF family; NADH peroxidases; the LDH family; sarcosin oxidase; phytoene dehydrogenases; alanine dehydrogenases; hydroxyacyl-CoA dehydrogenases and many other NAD/FAD dependent dehydrogenases and oxidases.

    Proteins where this domain is known:
    PY01851    PY01879    PY02846    PY05539    PY06413    PY06467   


    PF00900 - Ribosomal_S4e (Pfam link)

    Interpro entry IPR013845 : (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaeal ribosomal proteins can be grouped on the basis of sequence similarities. One of these families includes yeast S7 (YS6); archaeal S4e; and mammalian and plant cytoplasmic S4. Two highly similar isoforms of mammalian S4 exist, one coded by a gene on chromosome Y, and the other on chromosome X. These proteins have 233 to 264 amino acids.

    This entry represents the central region of these proteins.

    Proteins where this domain is known:
    PY03779   


    PF00903 - Glyoxalase (Pfam link)

    Interpro entry IPR004360 : (Interpro link)

    Interpro description:
    Glyoxalase I (lactoylglutathione lyase) catalyzes the first step of the glyoxal pathway. S-lactoylglutathione is then converted by glyoxalase II to lactic acid. Glyoxalase I is an ubiquitous enzyme which binds one mole of zinc per subunit. The bacterial and yeast enzymes are monomeric while the mammalian one is homodimeric. The sequence of glyoxalase I is well conserved. This domain is found in other related proteins including the Bleomycin resistance protein and dioxygenases eg. 4-hydroxyphenylpyruvate dioxygenase.

    Proteins where this domain is known:
    PY00733    PY04823   


    PF00916 - Sulfate_transp (Pfam link)

    Interpro entry IPR011547 : Sulphate transporter (Interpro link)

    Pfam description:
    Mutations in Swiss:P50443 lead to several human diseases.

    Interpro description:

    A number of proteins involved in the transport of sulphate across a membrane as well as some yet uncharacterised proteins have been shown to be evolutionary related. These proteins are:

    These proteins are highly hydrophobic and seem to contain about 12 transmembrane domains.

    Proteins where this domain is known:
    PY07224   


    PF00917 - MATH (Pfam link)

    Interpro entry IPR002083 : (Interpro link)

    Pfam description:
    This motif has been called the Meprin And TRAF-Homology (MATH) domain. This domain is hugely expanded in the nematode C. elegans.

    Interpro description:

    Although apparently functionally unrelated, intracellular TRAFs and extracellular meprins share a conserved region of about 180 residues, the meprin and TRAF homology (MATH) domain. Meprins are mammalian tissue-specific metalloendopeptidases of the astacin family implicated in developmental, normal and pathological processes by hydrolysing a variety of proteins. Various growth factors, cytokines, and extracellular matrix proteins are substrates for meprins. They are composed of five structural domains: an N-terminal endopeptidase domain, a MAM domain (see, a MATH domain, an EGF-like domain (see and a C-terminal transmembrane region. Meprin A and B form membrane bound homotetramer whereas homooligomers of meprin A are secreted. A proteolitic site adjacent to the MATH domain, only present in meprin A, allows the release of the protein from the membrane.

    TRAF proteins were first isolated by their ability to interact with TNF receptors . They promote cell survival by the activation of downstream protein kinases and, finally, transcription factors of the NF-kB and AP-1 family. The TRAF proteins are composed of 3 structural domains: a RING finger (see in the N-terminal part of the protein, one to seven TRAF zinc fingers (see in the middle and the MATH domain in the C-terminal part . The MATH domain is necessary and sufficient for self-association and receptor interaction. From the structural analysis two consensus sequence recognized by the TRAF domain have been defined: a major one, [PSAT]x[QE]E and a minor one, PxQxxD.

    The structure of the TRAF2 protein reveals a trimeric self-association of the MATH domain. The domain forms a new, light-stranded antiparallel beta sandwich structure. A coiled-coil region adjacent to the MATH domain is also important for the trimerisation. The oligomerisation is essential for establishing appropriate connections to form signalling complexes with TNF receptor-1. The ligand binding surface of TRAF proteins is located in beta-strands 6 and 7.

    Proteins where this domain is known:
    PY02170   


    PF00919 - UPF0004 (Pfam link)

    Interpro entry IPR013848 : (Interpro link)

    Pfam description:
    This family is the N terminal half of the Prosite family. The C-terminal half has been shown to be related to MiaB proteins. This domain is a nearly always found in conjunction with Pfam:PF04055 and Pfam:PF01938 although its function is uncertain.

    Interpro description:

    This entry represents an N-terminal domain found in a family of proteins defined by sequence similarity. Most of these proteins are not yet characterised, but those that are include

    It is almost always found in conjunction with a radical SAM domain and a TRAM domain.

    Proteins where this domain is known:
    PY01291   


    PF00924 - MS_channel (Pfam link)

    Interpro entry IPR006685 : Mechanosensitive ion channel MscS (Interpro link)

    Pfam description:
    Two members of this protein family: Swiss:Q57634 and Swiss:Q58543 of M. jannaschii have been functionally characterised. Both proteins form mechanosensitive (MS) ion channels upon reconstitution into liposomes and functional examination by the patch-clamp technique. Therefore this family are likely to also be MS channel proteins.

    Interpro description:

    Mechanosensitive (MS) channels provide protection against hypo-osmotic shock, responding both to stretching of the cell membrane and to membrane depolarisation. They are present in the membranes of organisms from the three domains of life: bacteria, archaea, and eukarya. There are two families of MS channels: large-conductance MS channels (MscL) and small-conductance MS channels (MscS or YGGB). The pressure threshold for MscS opening is 50% that of MscL. The MscS family is much larger and more variable in size and sequence than the MscL family. Much of the diversity in MscS proteins occurs in the size of the transmembrane regions, which ranges from three to eleven transmembrane helices, although the three C-terminal helices are conserved. This family contains sequences form the MscS family of proteins.

    MscS folds as a homo-heptamer with a cylindrical shape, and can be divided into transmembrane and extramembrane regions: an N-terminal periplasmic region, a transmembrane region, and a C-terminal cytoplasmic region (middle and C-terminal domains). The transmembrane region forms a channel through the membrane that opens into a chamber enclosed by the extramembrane portion, the latter connecting to the cytoplasm through distinct portals.

    Proteins where this domain is known:
    PY05855   


    PF00928 - Adap_comp_sub (Pfam link)

    Interpro entry IPR008968 : Clathrin adaptor, mu subunit, C-terminal (Interpro link)

    Pfam description:
    This family also contains members which are coatomer subunits.

    Interpro description:

    Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors.

    AP (adaptor protein) complexes are found in coated vesicles and clathrin-coated pits. AP complexes connect cargo proteins and lipids to clathrin at vesicle budding sites, as well as binding accessory proteins that regulate coat assembly and disassembly (such as AP180, epsins and auxilin). There are different AP complexes in mammals. AP1 is responsible for the transport of lysosomal hydrolases between the TGN and endosomes. AP2 associates with the plasma membrane and is responsible for endocytosis. AP3 is responsible for protein trafficking to lysosomes and other related organelles. AP4 is less well characterised. AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). For example, in AP1 these subunits are gamma-1-adaptin, beta-1-adaptin, mu-1 and sigma-1, while in AP2 they are alpha-adaptin, beta-2-adaptin, mu-2 and sigma-2. Each subunit has a specific function. Adaptins recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal ear (appendage) domains. Mu recognises tyrosine-based sorting signals within the cytoplasmic domains of transmembrane cargo proteins. One function of clathrin and AP2 complex-mediated endocytosis is to regulate the number of GABA(A) receptors available at the cell surface .

    This entry represents the C-terminal domain of the mu subunit from various clathrin adaptors (AP1, AP2 and AP3). The C-teminal domain has an immunoglobulin-like beta-sandwich fold consisting of 9 strands in 2 sheets with a Greek key topology, similar to that found in cytochrome f and certain transcription factors. The mu subunit regulates the coupling of clathrin lattices with particular membrane proteins by self-phosphorylation via a mechanism that is still unclear. The mu subunit possesses a highly conserved N-terminal domain of around 230 amino acids, which may be the region of interaction with other AP proteins; a linker region of between 10 and 42 amino acids; and a less well-conserved C-terminal domain of around 190 amino acids, which may be the site of specific interaction with the protein being transported in the vesicle.

    More information about these proteins can be found at Protein of the Month: Clathrin.

    Proteins where this domain is known:
    PY02583    PY02804    PY05839    PY06523   

    Proteins where this domain has been detected by our approach:
    PY00471   


    PF00929 - Exonuc_X-T (Pfam link)

    Interpro entry IPR013520 : (Interpro link)

    Pfam description:
    This family includes a variety of exonuclease proteins, such as ribonuclease T and the epsilon subunit of DNA polymerase III.;

    Interpro description:
    This entry includes a variety of exonuclease proteins, such as ribonuclease T and the epsilon subunit of DNA polymerase III. Ribonuclease T is responsible for the end-turnover of tRNA,and removes the terminal AMP residue from uncharged tRNA. DNA polymerase III is a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria, and also exhibits 3' to 5' exonuclease activity.

    Proteins where this domain is known:
    PY04969   


    PF00935 - Ribosomal_L44 (Pfam link)

    Interpro entry IPR000552 : Ribosomal protein L44e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaeal ribosomal proteins can be grouped on the basis of sequence similarities. One of these families consists of mammalian, Trypanosoma brucei, Caenorhabditis elegans and fungal L44, and Haloarcula marismortui LA.

    Proteins where this domain is known:
    PY03170   


    PF00940 - RNA_pol (Pfam link)

    Interpro entry IPR002092 : DNA-directed RNA polymerase, bacteriophage type (Interpro link)

    Pfam description:
    This is a family of single chain RNA polymerases.

    Interpro description:

    DNA-directed RNA polymerases(also known as DNA-dependent RNA polymerases) are responsible for the polymerisation of ribonucleotides into a sequence complementary to the template DNA. In eukaryotes, there are three different forms of DNA-directed RNA polymerases transcribing different sets of genes. Most RNA polymerases are multimeric enzymes and are composed of a variable number of subunits. The core RNA polymerase complex consists of five subunits (two alpha, one beta, one beta-prime and one omega) and is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme. The core RNA polymerase complex forms a "crab claw"-like structure with an internal channel running along the full length. The key functional sites of the enzyme, as defined by mutational and cross-linking analysis, are located on the inner wall of this channel.

    RNA synthesis follows after the attachment of RNA polymerase to a specific site, the promoter, on the template DNA strand. The RNA synthesis process continues until a termination sequence is reached. The RNA product, which is synthesised in the 5' to 3'direction, is known as the primary transcript. Eukaryotic nuclei contain three distinct types of RNA polymerases that differ in the RNA they synthesise:

    Eukaryotic cells are also known to contain separate mitochondrial and chloroplast RNA polymerases. Eukaryotic RNA polymerases, whose molecular masses vary in size from 500 to 700 kD, contain two non-identical large (>100 kDa) subunits and an array of up to 12 different small (less than 50 kDa) subunits.

    This is a family of single chain polymerases, which are evolutionary related, and which are related to the T3/T7 bacteriophage polymerases.

    Proteins where this domain is known:
    PY06559   


    PF00953 - Glycos_transf_4 (Pfam link)

    Interpro entry IPR018481 : (Interpro link)

    Interpro description:

    This entry represents a conserved region found in a family of UDP-GlcNAc/MurNAc: polyisoprenol-P GlcNAc/MurNAc-1-P transferases. Members of the family include eukaryotic N-acetylglucosamine-1-phosphate transferases, which catalyse the conversion of UDP-N-acteyl-D-glucosamine and dolichyl phosphate to UMP and N-acetyl-D-glucosaminyl-diphosphodolichol in the glycosylation pathway; and bacterial phospho-N-acetylmuramoyl-pentapeptide-transferases, which catalyse the first step of the lipid cycle reactions in the biosynthesis of cell wall peptidoglycan.

    Proteins where this domain is known:
    PY00432   


    PF00956 - NAP (Pfam link)

    Interpro entry IPR002164 : Nucleosome assembly protein (NAP) (Interpro link)

    Pfam description:
    NAP proteins are involved in moving histones into the nucleus, nucleosome assembly and chromatin fluidity. They affect the transcription of many genes.

    Interpro description:

    It is thought that NAPs act as histone chaperones, shuttling both core and linker histones from their site of synthesis in the cytoplasm to the nucleus. The proteins may be involved in regulating gene expression and therefore cellular differentiation.

    The centrosomal protein c-Nap1, also known as Cep250, has been implicated in the cell-cycle-regulated cohesion of microtubule-organizing centres. This 281 kDa protein consists mainly of domains predicted to form coiled coil structures. The C-terminal region defines a novel histone-binding domain that is responsible for targeting CNAP1, and possibly condensin, to mitotic chromosomes. During interphase, C-Nap1 localizes to the proximal ends of both parental centrioles, but it dissociates from these structures at the onset of mitosis. Re-association with centrioles then occurs in late telophase or at the very beginning of G1 phase, when daughter cells are still connected by post-mitotic bridges. Electron microscopic studies performed on isolated centrosomes suggest that a proteinaceous linker connects parental centrioles and C-Nap1 may be part of a linker structure that assures the cohesion of duplicated centrosomes during interphase, but that is dismantled upon centrosome separation at the onset of mitosis.

    Proteins where this domain is known:
    PY01605    PY05915   


    PF00957 - Synaptobrevin (Pfam link)

    Interpro entry IPR001388 : Synaptobrevin (Interpro link)

    Interpro description:

    Synaptobrevin is an intrinsic membrane protein of small synaptic vesicles, specialised secretory organelles of neurons that actively accumulate neurotransmitters and participate in their calcium-dependent release by exocytosis. Vesicle function is mediated by proteins in their membranes, although the precise nature of the protein-protein interactions underlying this are still uncertain. Synaptobrevin may play a role in the molecular events underlying neurotransmitter release and vesicle recycling and may be involved in the regulation of membrane flow in the nerve terminal, a process mediated by interaction with low molecular weight GTP-binding proteins. Synaptic vesicle-associated membrane proteins (VAMPs) from Torpedo californica (Pacific electric ray) and SNC1 from yeast are related to synaptobrevin.

    Proteins where this domain is known:
    PY01819    PY03887   


    PF00958 - GMP_synt_C (Pfam link)

    Interpro entry IPR001674 : GMP synthase, C-terminal (Interpro link)

    Pfam description:
    GMP synthetase is a glutamine amidotransferase from the de novo purine biosynthetic pathway. This family is the C-terminal domain specific to the GMP synthases Swiss:P49915 EC:6.3.5.2. In prokaryotes this domain mediates dimerisation. Eukaryotic GMP synthases are monomers. This domain in eukaryotes includes several large insertions that may form globular domains.

    Interpro description:

    The amidotransferase family of enzymes utilises the ammonia derived from the hydrolysis of glutamine for a subsequent chemical reaction catalyzed by the same enzyme. The ammonia intermediate does not dissociate into solution during the chemical transformations. GMP synthetase is a glutamine amidotransferase from the de novo purine biosynthetic pathway. The C-terminal domain is specific to the GMP synthases In prokaryotes this domain mediates dimerisation. Eukaryotic GMP synthases are monomers. This domain in eukaryotes includes several large insertions that may form globular domains.

    Proteins where this domain is known:
    PY03479   


    PF00962 - A_deaminase (Pfam link)

    Interpro entry IPR001365 : Adenosine/AMP deaminase (Interpro link)

    Interpro description:

    Adenosine deaminase catalyzes the hydrolytic deamination of adenosine into inosine and AMP deaminase catalyzes the hydrolytic deamination of AMP into IMP. It has been shown that these two enzymes share three regions of sequence similarities; these regions are centred on residues which are proposed to play an important role in the catalytic mechanism of these two enzymes.

    Proteins where this domain is known:
    PY01044    PY02076   


    PF00970 - FAD_binding_6 (Pfam link)

    Interpro entry IPR008333 : Oxidoreductase FAD-binding region (Interpro link)

    Interpro description:

    These sequences contain an oxidoreductase FAD-binding domain.

    To date, the 3D-structures of the flavoprotein domain of Zea mays (Maize) nitrate reductase and of pig NADH:cytochrome b5 reductase have been solved. The overall fold is similar to that of ferredoxin:NADP+ reductase: the FAD-binding domain (N-terminal) has the topology of an anti-parallel beta-barrel, while the NAD(P)-binding domain (C-terminal) has the topology of a classical pyridine dinucleotide-binding fold (i.e. a central parallel beta-sheet flanked by 2 helices on each side).

    Proteins where this domain is known:
    PY03083   


    PF00986 - DNA_gyraseB_C (Pfam link)

    Interpro entry IPR002288 : DNA topoisomerase, type IIA, subunit B, C-terminal (Interpro link)

    Pfam description:
    The amino terminus of eukaryotic and prokaryotic DNA topoisomerase II are similar, but they have a different carboxyl terminus. The amino-terminal portion of the DNA gyrase B protein is thought to catalyse the ATP-dependent super-coiling of DNA. See Pfam:PF00204. The carboxyl-terminal end supports the complexation with the DNA gyrase A protein and the ATP-independent relaxation. This family also contains Topoisomerase IV. This is a bacterial enzyme that is closely related to DNA gyrase,.

    Interpro description:

    DNA topoisomerases regulate the number of topological links between two DNA strands (i.e. change the number of superhelical turns) by catalysing transient single- or double-strand breaks, crossing the strands through one another, then resealing the breaks. These enzymes have several functions: to remove DNA supercoils during transcription and DNA replication; for strand breakage during recombination; for chromosome condensation; and to disentangle intertwined DNA during mitosis. DNA topoisomerases are divided into two classes: type I enzymes (topoisomerases I, III and V) break single-strand DNA, and type II enzymes (topoisomerases II, IV and VI) break double-strand DNA.

    Type II topoisomerases are ATP-dependent enzymes, and can be subdivided according to their structure and reaction mechanisms: type IIA (topoisomerase II or gyrase, and topoisomerase IV) and type IIB (topoisomerase VI). These enzymes are responsible for relaxing supercoiled DNA as well as for introducing both negative and positive supercoils.

    Type IIA topoisomerases together manage chromosome integrity and topology in cells. Topoisomerase II (called gyrase in bacteria) primarily introduces negative supercoils into DNA. In bacteria, topoisomerase II consists of two polypeptide subunits, gyrA and gyrB, which form a heterotetramer: (BA)2. In most eukaryotes, topoisomerase II consists of a single polypeptide, where the N- and C-terminal regions correspond to gyrB and gyrA, respectively; this topoisomerase II forms a homodimer that is equivalent to the bacterial heterotetramer. There are four functional domains in topoisomerase II: domain 1 (N-terminal of gyrB) is an ATPase, domain 2 (C-terminal of gyrB) is responsible for subunit interactions, domain 3 (N-terminal of gyrA) is responsible for the breaking-rejoining function through its capacity to form protein-DNA bridges, and domain 4 (C-terminal of gyrA) is able to non-specifically bind DNA.

    Topoisomerase IV primarily decatenates DNA and relaxes positive supercoils, which is important in bacteria, where the circular chromosome becomes catenated, or linked, during replication. Topoisomerase IV consists of two polypeptide subunits, parE and parC, where parC is homologous to gyrA and parE is homologous to gyrB.

    This entry represents the C-terminal region (C-terminal part of domain 2) of subunit B found in topoisomerase II (gyrB) and topoisomerase IV (parE), which are primarily of bacterial origin. It does not include the topoisomerase II enzymes composed of a single polypeptide, as are found in most eukaryotes. This region is involved in subunit interaction, which accounts for the difference between subunit B and single polypeptide topoisomerase II.

    More information about this protein can be found at Protein of the Month: DNA Topoisomerase.

    Proteins where this domain is known:
    PY04024   


    PF00988 - CPSase_sm_chain (Pfam link)

    Interpro entry IPR002474 : Carbamoyl phosphate synthase, small subunit, N-terminal (Interpro link)

    Pfam description:
    The carbamoyl-phosphate synthase domain is in the amino terminus of protein. Carbamoyl-phosphate synthase catalyses the ATP-dependent synthesis of carbamyl-phosphate from glutamine or ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis of arginine and/or pyrimidines. The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large chain. The small chain promotes the hydrolysis of glutamine to ammonia, which is used by the large chain to synthesise carbamoyl phosphate. See Pfam:PF00289. The small chain has a GATase domain in the carboxyl terminus. See Pfam:PF00117.

    Interpro description:

    Carbamoyl phosphate synthase (CPSase) is a heterodimeric enzyme composed of a small and a large subunit (with the exception of CPSase III, see below). CPSase catalyses the synthesis of carbamoyl phosphate from biocarbonate, ATP and glutamine or ammonia, and represents the first committed step in pyrimidine and arginine biosynthesis in prokaryotes and eukaryotes, and in the urea cycle in most terrestrial vertebrates. CPSase has three active sites, one in the small subunit and two in the large subunit. The small subunit contains the glutamine binding site and catalyses the hydrolysis of glutamine to glutamate and ammonia. The large subunit has two homologous carboxy phosphate domains, both of which have ATP-binding sites; however, the N-terminal carboxy phosphate domain catalyses the phosphorylation of biocarbonate, while the C-terminal domain catalyses the phosphorylation of the carbamate intermediate. The carboxy phosphate domain found duplicated in the large subunit of CPSase is also present as a single copy in the biotin-dependent enzymes acetyl-CoA carboxylase (ACC), propionyl-CoA carboxylase (PCCase), pyruvate carboxylase (PC) and urea carboxylase.

    Most prokaryotes carry one form of CPSase that participates in both arginine and pyrimidine biosynthesis, however certain bacteria can have separate forms. The large subunit in bacterial CPSase has four structural domains: the carboxy phosphate domain 1, the oligomerisation domain, the carbamoyl phosphate domain 2 and the allosteric domain. CPSase heterodimers from Escherichia coli contain two molecular tunnels: an ammonia tunnel and a carbamate tunnel. These inter-domain tunnels connect the three distinct active sites, and function as conduits for the transport of unstable reaction intermediates (ammonia and carbamate) between successive active sites. The catalytic mechanism of CPSase involves the diffusion of carbamate through the interior of the enzyme from the site of synthesis within the N-terminal domain of the large subunit to the site of phosphorylation within the C-terminal domain.

    Eukaryotes have two distinct forms of CPSase: a mitochondrial enzyme (CPSase I) that participates in both arginine biosynthesis and the urea cycle; and a cytosolic enzyme (CPSase II) involved in pyrimidine biosynthesis. CPSase II occurs as part of a multi-enzyme complex along with aspartate transcarbamoylase and dihydroorotase; this complex is referred to as the CAD protein. The hepatic expression of CPSase is transcriptionally regulated by glucocorticoids and/or cAMP. There is a third form of the enzyme, CPSase III, found in fish, which uses glutamine as a nitrogen source instead of ammonia. CPSase III is closely related to CPSase I, and is composed of a single polypeptide that may have arisen from gene fusion of the glutaminase and synthetase domains.

    This entry represents the N-terminal domain of the small subunit of carbamoyl phosphate synthase. The small subunit catalyses the hydrolysis of glutamine to ammonia, which in turn used by the large chain to synthesize carbamoyl phosphate. The small subunit has a 3-layer beta/beta/alpha structure, and is thought to be mobile in most proteins that carry it. The C-terminal domain of the small subunit of CPSase has glutamine amidotransferase activity.

    Proteins where this domain is known:
    PY04781   


    PF00992 - Troponin (Pfam link)

    Interpro entry IPR001978 : (Interpro link)

    Pfam description:
    Troponin (Tn) contains three subunits, Ca2+ binding (TnC), inhibitory (TnI), and tropomyosin binding (TnT). this Pfam contains members of the TnT subunit. Troponin is a complex of three proteins, Ca2+ binding (TnC), inhibitory (TnI), and tropomyosin binding (TnT). The troponin complex regulates Ca++ induced muscle contraction. This family includes troponin T and troponin I. Troponin I binds to actin and troponin T binds to tropomyosin.

    Interpro description:
    The troponin (Tn) complex regulates Ca2+ induced muscle contraction. Tn contains three subunits, Ca2+ binding (TnC), inhibitory (TnI), and tropomyosin binding (TnT). This family includes troponin T and troponin I. Troponin I binds to actin and troponin T binds to tropomyosin.

    Proteins where this domain has been detected by our approach:
    PY05845   


    PF00995 - Sec1 (Pfam link)

    Interpro entry IPR001619 : Sec1-like protein (Interpro link)

    Interpro description:

    Sec1-like molecules have been implicated in a variety of eukaryotic vesicle transport processes including neurotransmitter release by exocytosis. They regulate vesicle transport by binding to a t-SNARE from the syntaxin family. This process is thought to prevent SNARE complex formation, a protein complex required for membrane fusion. Whereas Sec1 molecules are essential for neurotransmitter release and other secretory events, their interaction with syntaxin molecules seems to represent a negative regulatory step in secretion.

    Proteins where this domain is known:
    PY00467    PY01196    PY01251    PY05107   


    PF00996 - GDI (Pfam link)

    Interpro entry IPR018203 : (Interpro link)

    Interpro description:
    Rab proteins constitute a family of small GTPases that serve a regulatory role in vesicular membrane traffic; C-terminal geranylgeranylation is crucial for their membrane association and function. This post-translational modification is catalysed by Rab geranylgeranyl transferase (Rab-GGTase), a multi-subunit enzyme that contains a catalytic heterodimer and an accessory component, termed Rab escort protein (REP)-1. REP-1 presents newly- synthesised Rab proteins to the catalytic component, and forms a stable complex with the prenylated proteins following the transfer reaction.

    The mechanism of REP-1-mediated membrane association of Rab5 is similar to that mediated by Rab GDP dissociation inhibitor (GDI). REP-1 and Rab GDI also share other functional properties, including the ability to inhibit the release of GDP and to remove Rab proteins from membranes.

    The crystal structure of the bovine alpha-isoform of Rab GDI has been determined to a resolution of 1.81A. The protein is composed of two main structural units: a large complex multi-sheet domain I, and a smaller alpha-helical domain II.

    The structural organisation of domain I is closely related to FAD-containing monooxygenases and oxidases. Conserved regions common to GDI and the choroideraemia gene product, which delivers Rab to catalytic subunits of Rab geranylgeranyltransferase II, are clustered on one face of the domain. The two most conserved regions form a compact structure at the apex of the molecule; site-directed mutagenesis has shown these regions to play a critical role in the binding of Rab proteins.

    Proteins where this domain is known:
    PY04102    PY06809   


    PF00999 - Na_H_Exchanger (Pfam link)

    Interpro entry IPR006153 : Cation/H+ exchanger (Interpro link)

    Pfam description:
    Na/H antiporters are key transporters in maintaining the pH of actively metabolising cells. The molecular mechanisms of antiport are unclear. These antiporters contain 10-12 transmembrane regions (M) at the amino-terminus and a large cytoplasmic region at the carboxyl terminus. The transmembrane regions M3-M12 share identity with other members of the family. The M6 and M7 regions are highly conserved. Thus, this is thought to be the region that is involved in the transport of sodium and hydrogen ions. The cytoplasmic region has little similarity throughout the family.

    Interpro description:

    Sodium proton exchangers (NHEs) constitute a large family of integral membrane protein transporters that are responsible for the counter-transport of protons and sodium ions across lipid bilayers. These proteins are found in organisms across all domains of life. In archaea, bacteria, yeast and plants, these exchangers provide increased salt tolerance by removing sodium in exchanger for extracellular protons. In mammals they participate in the regulation of cell pH, volume, and intracellular sodium concentration, as well as for the reabsorption of NaCl across renal, intestinal, and other epithelia. Human NHE is also involved in heart disease, cell growth and in cell differentiation. The removal of intracellular protons in exchange for extracellular sodium effectively eliminates excess acid from actively metabolising cells. In mammalian cells, NHE activity is found in both the plasma membrane and inner mitochondrial membrane. To date, nine mammalian isoforms have been identified (designated NHE1-NHE9). These exchangers are highly-regulated (glyco)phosphoproteins, which, based on their primary structure, appear to contain 10-12 membrane-spanning regions (M) at the N-terminus and a large cytoplasmic region at the C-terminus. The transmembrane regions M3-M12 share identity with other members of the family. The M6 and M7 regions are highly conserved. Thus, this is thought to be the region that is involved in the transport of sodium and hydrogen ions. The cytoplasmic region has little similarity throughout the family. There is some evidence that the exchangers may exist in the cell membrane as homodimers, but little is currently known about the mechanism of their antiport.

    This entry represents a number of cation/proton exchangers, including Na+/H+ exchangers, K+/H+ exchangers and Na+(K+,Li+,Rb+)/H+ exchangers.

    Proteins where this domain is known:
    PY02931   


    PF01000 - RNA_pol_A_bac (Pfam link)

    Interpro entry IPR011262 : DNA-directed RNA polymerase, insert (Interpro link)

    Pfam description:
    Members of this family include: alpha subunit from eubacteria alpha subunits from chloroplasts Rpb3 subunits from eukaryotes RpoD subunits from archaeal

    Interpro description:

    DNA-directed RNA polymerases(also known as DNA-dependent RNA polymerases) are responsible for the polymerisation of ribonucleotides into a sequence complementary to the template DNA. In eukaryotes, there are three different forms of DNA-directed RNA polymerases transcribing different sets of genes. Most RNA polymerases are multimeric enzymes and are composed of a variable number of subunits. The core RNA polymerase complex consists of five subunits (two alpha, one beta, one beta-prime and one omega) and is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme. The core RNA polymerase complex forms a "crab claw"-like structure with an internal channel running along the full length. The key functional sites of the enzyme, as defined by mutational and cross-linking analysis, are located on the inner wall of this channel.

    RNA synthesis follows after the attachment of RNA polymerase to a specific site, the promoter, on the template DNA strand. The RNA synthesis process continues until a termination sequence is reached. The RNA product, which is synthesised in the 5' to 3'direction, is known as the primary transcript. Eukaryotic nuclei contain three distinct types of RNA polymerases that differ in the RNA they synthesise:

    Eukaryotic cells are also known to contain separate mitochondrial and chloroplast RNA polymerases. Eukaryotic RNA polymerases, whose molecular masses vary in size from 500 to 700 kD, contain two non-identical large (>100 kDa) subunits and an array of up to 12 different small (less than 50 kDa) subunits.

    RNA polymerase (RNAP) II, which is responsible for all mRNA synthesis in eukaryotes, consists of 12 subunits. Subunits Rpb3 and Rpb11 form a heterodimer that is functionally analogous to the archaeal RNAP D/L heterodimer, and to the prokaryotic RNAP alpha (RpoA) subunit homodimer. In each case, they play a key role in RNAP assembly by forming a platform on which the catalytic subunits (eukaryotic Rpb1/Rpb2, and prokaryotic beta/beta') can interact.

    The dimerisation domains differ between the different subunit families. In eukaryotic Rpb3, archaeal D and bacterial RpoA subunits, the dimerisation domain is comprised of a central insert domain, which interrupts an Rpb11-like domain, dividing it into two halves. In eukaryotic Rpb11 and archaeal L subunits, the insert domain is lacking, leaving the Rpb11-like domain intact and contiguous.

    Proteins where this domain is known:
    PY00033    PY01971    PY02415    PY04295   


    PF01008 - IF-2B (Pfam link)

    Interpro entry IPR000649 : Initiation factor 2B related (Interpro link)

    Pfam description:
    This family includes initiation factor 2B alpha, beta and delta subunits from eukaryotes, initiation factor 2B subunits 1 and 2 from archaebacteria and some proteins of unknown function from prokaryotes. Initiation factor 2 binds to Met-tRNA, GTP and the small ribosomal subunit. Members of this family have also been characterised as 5-methylthioribose- 1-phosphate isomerases, an enzyme of the methionine salvage pathway. The crystal structure of Ypr118w, a non-essential, low-copy number gene product from Saccharomyces cerevisiae, reveals a dimeric protein with two domains and a putative active site cleft.

    Interpro description:

    Initiation factor 2 binds to Met-tRNA, GTP and the small ribosomal subunit. The eukaryotic translation initiation factor EIF-2B is a complex made up of five different subunits, alpha, beta, gamma, delta and epsilon, and catalyses the exchange of EIF-2-bound GDP for GTP. This family includes initiation factor 2B alpha, beta and delta subunits from eukaryotes; related proteins from archaebacteria and IF-2 from prokaryotes and also contains a subfamily of proteins in eukaryotes, archaeae (e.g. Pyrococcus furiosus), or eubacteria such as Bacillus subtilis and Thermotoga maritima. Many of these proteins were initially annotated as putative translation initiation factors despite the fact that there is no evidence for the requirement of an IF2 recycling factor in prokaryotic translation initiation. Recently, one of these proteins from B. subtilis has been functionally characterised as a 5-methylthioribose-1-phosphate isomerase (MTNA). This enzyme participates in the methionine salvage pathway catalysing the isomerisation of 5-methylthioribose-1-phosphate to 5-methylthioribulose-1-phosphate. The methionine salvage pathway leads to the synthesis of methionine from methylthioadenosine, the end product of the spermidine and spermine anabolism in many species.

    Proteins where this domain is known:
    PY01920    PY03599    PY07384   


    PF01011 - PQQ (Pfam link)

    Interpro entry IPR002372 : (Interpro link)

    Pfam description:
    The family represent a single repeat of a beta propeller. This propeller has been found in several enzymes which utilise pyrrolo-quinoline quinone as a prosthetic group.

    Interpro description:
    Pyrrolo-quinoline quinone (PQQ) is a redox coenzyme, which serves as a cofactor for a number of enzymes (quinoproteins) and particularly for some bacterial dehydrogenases. A number of bacterial quinoproteins belong to this family.

    Enzymes in this group have repeats of a beta propeller.

    Proteins where this domain has been detected by our approach:
    PY00595    PY01173    PY03677   


    PF01015 - Ribosomal_S3Ae (Pfam link)

    Interpro entry IPR001593 : Ribosomal protein S3Ae (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. One of these families consists of proteins that have from 220 to 250 amino acids.

    Proteins where this domain is known:
    PY02290    PY02291   


    PF01016 - Ribosomal_L27 (Pfam link)

    Interpro entry IPR001684 : Ribosomal protein L27 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    L27 is a protein from the large (50S) subunit; it is essential for ribosome function, but its exact role is unclear. It belongs to a family of ribosomal proteins, examples of which are found in bacteria, chloroplasts of plants and red algae and the mitochondria of fungi (e.g. MRP7 from yeast mitochondria). The schematic relationship between these groups of proteins is shown below.

    Proteins where this domain is known:
    PY00465   


    PF01018 - GTP1_OBG (Pfam link)

    Interpro entry IPR006169 : GTP1/OBG subdomain (Interpro link)

    Pfam description:
    The N-terminal domain of Swiss:P20964 has the OBG fold, which is formed by three glycine-rich regions inserted into a small 8-stranded beta-sandwich these regions form six left-handed collagen-like helices packed and H-bonded together.

    Interpro description:

    Several proteins have recently been shown to contain the 5 structural motifs characteristic of GTP-binding proteins. These include murine DRG protein; GTP1 protein from Schizosaccharomyces pombe; OBG protein from Bacillus subtilis; and several others. Although the proteins contain GTP-binding motifs and are similar to each other, they do not share sequence similarity to other GTP-binding proteins, and have thus been classed as a novel group, the GTP1/OBG family. As yet, the functions of these proteins is uncertain, but they have been shown to be important in development and normal cell metabolism.

    Proteins where this domain is known:
    PY00161    PY02080   

    Proteins where this domain has been detected by our approach:
    PY04853   


    PF01025 - GrpE (Pfam link)

    Interpro entry IPR000740 : GrpE nucleotide exchange factor (Interpro link)

    Interpro description:

    Molecular chaperones are a diverse family of proteins that function to protect proteins in the intracellular milieu from irreversible aggregation during synthesis and in times of cellular stress. The bacterial molecular chaperone DnaK is an enzyme that couples cycles of ATP binding, hydrolysis, and ADP release by an N-terminal ATP-hydrolysing domain to cycles of sequestration and release of unfolded proteins by a C-terminal substrate binding domain. In prokaryotes the grpE protein. Dimeric GrpE is the co-chaperone for DnaK, and acts as a nucleotide exchange factor, stimulating the rate of ADP release 5000-fold. DnaK is itself a weak ATPase; ATP hydrolysis by DnaK is stimulated by its interaction with another co-chaperone, DnaJ. Thus the co-chaperones DnaJ and GrpE are capable of tightly regulating the nucleotide-bound and substrate-bound state of DnaK in ways that are necessary for the normal housekeeping functions and stress-related functions of the DnaK molecular chaperone cycle.

    The X-ray crystal structure of GrpE in complex with the ATPase domain of DnaK revealed that GrpE is an asymmetric homodimer, bent in a manner that favours extensive contacts with only one DnaKATPase monomer. GrpE does not actively compete for the atomic positions occupied by the nucleotide. GrpE and ADP mutually reduce one another's affinity for DnaK 200-fold, and ATP instantly dissociates GrpE from DnaK.

    Proteins where this domain is known:
    PY00817   


    PF01026 - TatD_DNase (Pfam link)

    Interpro entry IPR001130 : Deoxyribonuclease, TatD-related (Interpro link)

    Pfam description:
    This family of proteins are related to a large superfamily of metalloenzymes. TatD, a member of this family has been shown experimentally to be a DNase enzyme.

    Interpro description:
    This family of proteins are related to a large superfamily of metalloenzymes. TatD, a member of this family has been shown experimentally to be a DNase enzyme. Allantoinase N-isopropylammelide isopropyl amidohydrolaseand the SCN1 protein from fission yeast belong to this family.

    Proteins where this domain is known:
    PY01446   


    PF01027 - UPF0005 (Pfam link)

    Interpro entry IPR006214 : (Interpro link)

    Pfam description:
    The Pfam entry finds members not in the Prosite definition.

    Interpro description:

    This family of proteins of unknown function contains a subset of Bax inhibitor-1 proteins.

    Proteins where this domain is known:
    PY00378    PY05518   


    PF01028 - Topoisom_I (Pfam link)

    Interpro entry IPR013500 : DNA topoisomerase I, catalytic core, eukaryotic-type (Interpro link)

    Pfam description:
    Topoisomerase I promotes the relaxation of DNA superhelical tension by introducing a transient single-stranded break in duplex DNA and are vital for the processes of replication, transcription, and recombination.

    Interpro description:

    DNA topoisomerases regulate the number of topological links between two DNA strands (i.e. change the number of superhelical turns) by catalysing transient single- or double-strand breaks, crossing the strands through one another, then resealing the breaks. These enzymes have several functions: to remove DNA supercoils during transcription and DNA replication; for strand breakage during recombination; for chromosome condensation; and to disentangle intertwined DNA during mitosis. DNA topoisomerases are divided into two classes: type I enzymes (topoisomerases I, III and V) break single-strand DNA, and type II enzymes (topoisomerases II, IV and VI) break double-strand DNA.

    Type I topoisomerases are ATP-independent enzymes (except for reverse gyrase), and can be subdivided according to their structure and reaction mechanisms: type IA (bacterial and archaeal topoisomerase I, topoisomerase III and reverse gyrase) and type IB (eukaryotic topoisomerase I and topoisomerase V). These enzymes are primarily responsible for relaxing positively and/or negatively supercoiled DNA, except for reverse gyrase, which can introduce positive supercoils into DNA.

    This entry represents the catalytic core of eukaryotic and viral topoisomerase I (type IB) enzymes, which occurs near the C-terminal region of the protein.

    Human topoisomerase I has been shown to be inhibited by camptothecin (CPT), a plant alkaloid with antitumour activity. The crystal structures of human topoisomerase I comprising the core and carboxyl-terminal domains in covalent and noncovalent complexes with 22-base pair DNA duplexes reveal an enzyme that "clamps" around essentially B-form DNA. The core domain and the first eight residues of the carboxyl-terminal domain of the enzyme, including the active-site nucleophile tyrosine-723, share significant structural similarity with the bacteriophage family of DNA integrases. A binding mode for the anticancer drug camptothecin has been proposed on the basis of chemical and biochemical information combined with the three-dimensional structures of topoisomerase I-DNA complexes.

    Vaccinia virus, a cytoplasmically-replicating poxvirus, encodes a type I DNA topoisomerase that is biochemically similar to eukaryotic-like DNA topoisomerases I, and which has been widely studied as a model topoisomerase. It is the smallest topoisomerase known and is unusual in that it is resistant to the potent chemotherapeutic agent camptothecin. The crystal structure of an amino-terminal fragment of vaccinia virus DNA topoisomerase I shows that the fragment forms a five-stranded, antiparallel beta-sheet with two short alpha-helices and connecting loops. Residues that are conserved between all eukaryotic-like type I topoisomerases are not clustered in particular regions of the structure.

    More information about this protein can be found at Protein of the Month: DNA Topoisomerase.

    Proteins where this domain is known:
    PY05226   


    PF01031 - Dynamin_M (Pfam link)

    Interpro entry IPR000375 : Dynamin central region (Interpro link)

    Pfam description:
    This region lies between the GTPase domain, see Pfam:PF00350, and the pleckstrin homology (PH) domain, see Pfam:PF00169.

    Interpro description:
    Dynamin is a microtubule-associated force-producing protein of 100 Kd which is involved in the production of microtubule bundles. At the N terminus of dynamin is a GTPase domain (see, and at the C-terminus is a PH domain (see. Between these two domains lies a central region of unknown function.

    Proteins where this domain is known:
    PY00714    PY04073    PY07647   


    PF01039 - Carboxyl_trans (Pfam link)

    Interpro entry IPR000022 : Carboxyl transferase (Interpro link)

    Pfam description:
    All of the members in this family are biotin dependent carboxylases. The carboxyl transferase domain carries out the following reaction; transcarboxylation from biotin to an acceptor molecule. There are two recognised types of carboxyl transferase. One of them uses acyl-CoA and the other uses 2-oxoacid as the acceptor molecule of carbon dioxide. All of the members in this family utilise acyl-CoA as the acceptor molecule.

    Interpro description:

    Members in this domain include biotin dependent carboxylases. The carboxyl transferase domain carries out the following reaction; transcarboxylation from biotin to an acceptor molecule. There are two recognised types of carboxyl transferase. One of them uses acyl-CoA and the other uses 2-oxo acid as the acceptor molecule of carbon dioxide. All of the members in this family utilise acyl-CoA as the acceptor molecule.

    Proteins where this domain is known:
    PY01695   


    PF01040 - UbiA (Pfam link)

    Interpro entry IPR000537 : UbiA prenyltransferase (Interpro link)

    Interpro description:

    The COX10/ctaB/cyoE signature is found in prenyltransferases including bacterial 4-hydroxybenzoate octaprenyltransferase (gene ubiA); yeast mitochondrial para-hydroxybenzoate--polyprenyltransferase (gene COQ2); and protohaem IX farnesyltransferase (haem O synthase) from yeast and mammals(gene COX10), and from bacteria (genes cyoE or ctaB). These are integral membrane proteins, which probably contain seven transmembrane segments. The signature is also found in cytochrome C oxidase assembly factor. The complexity of cytochrome C oxidase requires assistance in building the complex, and this is carried out by the cytochrome C oxidase assembly factor.

    Proteins where this domain is known:
    PY01399    PY06214   


    PF01048 - PNP_UDP_1 (Pfam link)

    Interpro entry IPR000845 : Nucleoside phosphorylase (Interpro link)

    Pfam description:
    Members of this family include: purine nucleoside phosphorylase (PNP) Uridine phosphorylase (UdRPase) 5\'-methylthioadenosine phosphorylase (MTA phosphorylase)

    Interpro description:

    Phosphorylases in this entry include:

    Proteins where this domain is known:
    PY04622   


    PF01061 - ABC2_membrane (Pfam link)

    Interpro entry IPR013525 : ABC-2 type transporter (Interpro link)

    Interpro description:

    ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.

    ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain.

    The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site.

    The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly beta-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel beta-sheet of armI by a two-fold axis.

    The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette. More than 50 subfamilies have been described based on a phylogenetic and functional classification; (for further information see http://www.tcdb.org/tcdb/index.php?tc=3.A.1).

    A number of bacterial transport systems have been found to contain integral membrane components that have similar sequences: these systems fit the characteristics of ATP-binding cassette transporters. The proteins form homo- or hetero-oligomeric channels, allowing ATP-mediated transport. Hydropathy analysis of the proteins has revealed the presence of 6 possible transmembrane regions. These proteins belong to family 2 of ABC transporters.

    Proteins where this domain is known:
    PY00207   


    PF01063 - Aminotran_4 (Pfam link)

    Interpro entry IPR001544 : Aminotransferase, class IV (Interpro link)

    Pfam description:
    The D-amino acid transferases (D-AAT) are required by bacteria to catalyse the synthesis of D-glutamic acid and D-alanine, which are essential constituents of bacterial cell wall and are the building block for other D-amino acids. Despite the difference in the structure of the substrates, D-AATs and L-ATTs have strong similarity.

    Interpro description:

    Aminotransferases share certain mechanistic features with other pyridoxal-phosphate dependent enzymes, such as the covalent binding of the pyridoxal-phosphate group to a lysine residue. On the basis of sequence similarity, these various enzymes can be grouped into subfamilies.

    One of these, called class-IV, currently consists of proteins of about 270 to 415 amino-acid residues that share a few regions of sequence similarity. Surprisingly, the best conserved region does not include the lysine residue to which the pyridoxal-phosphate group is known to be attached, in ilvE, but is located some 40 residues at the C terminus side of the pyridoxal-phosphate-lysine. The D-amino acid transferases (D-AAT), which are among the members of this entry, are required by bacteria to catalyse the synthesis of D-glutamic acid and D-alanine, which are essential constituents of bacterial cell wall and are the building block for other D-amino acids. Despite the difference in the structure of the substrates, D-AATs and L-ATTs have strong similarity.

    Proteins where this domain is known:
    PY06026   


    PF01064 - Activin_recp (Pfam link)

    Interpro entry IPR000472 : TGF-beta receptor/activin receptor, type I/II (Interpro link)

    Pfam description:
    This Pfam entry consists of both TGF-beta receptor types. This is an alignment of the hydrophilic cysteine-rich ligand-binding domains, Both receptor types, (type I and II) posses a 9 amino acid cysteine box, with the the consensus CCX{4-5}CN. The type I receptors also possess 7 extracellular residues preceding the cysteine box.

    Interpro description:
    Transforming growth factor-beta (TGF-beta) forms a family with other growth factors described in The receptors for most of the members of this growth factor family are related. These proteins are receptor-type kinases of Ser/Thr type, which have a single transmembrane domain and a specific hydrophilic Cys-rich ligand-binding domain. The C-terminal part of the extracellular domain is conserved. Some of the receptors of this family contain subclass-specific N-terminal extensions of this homology domain. The type I receptors also possess 7 extracellular residues preceding the cysteine box.

    Proteins where this domain has been detected by our approach:
    PY06752   


    PF01066 - CDP-OH_P_transf (Pfam link)

    Interpro entry IPR000462 : CDP-alcohol phosphatidyltransferase (Interpro link)

    Pfam description:
    All of these members have the ability to catalyse the displacement of CMP from a CDP-alcohol by a second alcohol with formation of a phosphodiester bond and concomitant breaking of a phosphoride anhydride bond.

    Interpro description:
    A number of phosphatidyltransferases, which are all involved in phospholipid biosynthesis and that share the property of catalyzing the displacement of CMP from a CDP-alcohol by a second alcohol with formation of a phosphodiester bond and concomitant breaking of a phosphoride anhydride bond share a conserved sequence region. These enzymes are proteins of from 200 to 400 amino acid residues. The conserved region contains three aspartic acid residues and is located in the N-terminal section of the sequences.

    Proteins where this domain is known:
    PY05544   


    PF01067 - Calpain_III (Pfam link)

    Interpro entry IPR001300 : Peptidase C2, calpain (Interpro link)

    Pfam description:
    The function of the domain III and I are currently unknown. Domain II is a cysteine protease and domain IV is a calcium binding domain. Calpains are believed to participate in intracellular signaling pathways mediated by calcium ions.

    Interpro description:

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad.

    This group of cysteine peptidases belong to the MEROPS peptidase family C2 (calpain family, clan CA). A type example is calpain, which is an intracellular protease involved in many important cellular functions that are regulated by calcium. The protein is a complex of 2 polypeptide chains (light and heavy), with three known forms in mammals: a highly calcium-sensitive (i.e., micro-molar range) form known as mu-calpain, mu-CANP or calpain I; a form sensitive to calcium in the milli-molar range, known as m-calpain, m-CANP or calpain II; and a third form, known as p94, which is found in skeletal muscle only.

    All forms have identical light but different heavy chains. Both mu- and m-calpain are heterodimers containing an identical 28-kDa subunit and an 80-kDa subunit that shares 55-65% sequence homology between the two proteases. The crystallographic structure of m-calpain reveals six "domains" in the 80-kDa subunit:

    1. A 19-amino acid NH2-terminal sequence;
    2. Active site domain IIa;
    3. Active site domain IIb.

      Domain 2 shows low levels of sequence similarity to papain; although the catalytic His has not been located by biochemical means, it is likely that calpain and papain are related.

    4. Domain III;
    5. An 18-amino acid extended sequence linking domain III to domain IV;
    6. Domain IV, which resembles the penta EF-hand family of polypeptides, binds calcium and regulates activity. />. Ca2+-binding causes a rearrangement of the protein backbone, the net effect of which is that a Trp side chain, which acts as a wedge between catalytic domains IIa and IIb in the apo state, moves away from the active site cleft allowing for the proper formation of the catalytic triad.

    Calpain-like mRNAs have been identified in other organisms including bacteria, but the molecules encoded by these mRNAs have not been isolated, so little is known about their properties. How calpain activity is regulated in these organisms cells is still unclear In metazoans, the activity of calpain is controlled by a single proteinase inhibitor, calpastatin. The calpastatin gene can produce eight or more calpastatin polypeptides ranging from 17 to 85 kDa by use of different promoters and alternative splicing events. The physiological significance of these different calpastatins is unclear, although all bind to three different places on the calpain molecule; binding to at least two of the sites is Ca2+ dependent. The calpains ostensibly participate in a variety of cellular processes including remodelling of cytoskeletal/membrane attachments, different signal transduction pathways, and apoptosis. Deregulated calpain activity following loss of Ca2+ homeostasis results in tissue damage in response to events such as myocardial infarcts, stroke, and brain trauma.

    Proteins where this domain has been detected by our approach:
    PY00976   


    PF01068 - DNA_ligase_A_M (Pfam link)

    Interpro entry IPR012310 : ATP dependent DNA ligase, central (Interpro link)

    Pfam description:
    This domain belongs to a more diverse superfamily, including Pfam:PF01331 and Pfam:PF01653.

    Interpro description:

    This domain belongs to a more diverse superfamily, including catalytic domain of the mRNA capping enzyme and NAD-dependent DNA ligase.

    Proteins where this domain is known:
    PY01533   


    PF01084 - Ribosomal_S18 (Pfam link)

    Interpro entry IPR001648 : Ribosomal protein S18 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Evidence suggests that, in prokaryotes, the peptidyl transferase reaction is performed by the large subunit 23S rRNA, whereas proteins probably have a greater role in eukaryotic ribosomes. Most of the proteins lie close to, or on the surface of, the 30S subunit, arranged peripherally around the rRNA. The small subunit ribosomal proteins can be categorised as primary binding proteins, which bind directly and independently to 16S rRNA; secondary binding proteins, which display no specific affinity for 16S rRNA, but its assembly is contingent upon the presence of one or more primary binding proteins; and tertiary binding proteins, which require the presence of one or more secondary binding proteins and sometimes other tertiary binding proteins.

    The small ribosomal subunit protein S18 is known to be involved in binding the aminoacyl-tRNA complex in Escherichia coli, and appears to be situated at the tRNA A-site. Experimental evidence has revealed that S18 is well exposed on the surface of the E. coli ribosome, and is a secondary rRNA binding protein. S18 belongs to a family of ribosomal proteins that includes: eubacterial S18; metazoan mitochondrial S18, algal and plant chloroplast S18; and cyanelle S18.

    Proteins where this domain is known:
    PY01223   


    PF01088 - Peptidase_C12 (Pfam link)

    Interpro entry IPR001578 : Peptidase C12, ubiquitin carboxyl-terminal hydrolase 1 (Interpro link)

    Interpro description:

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad.

    This group of cysteine peptidases belong to the MEROPS peptidase family C12 (ubiquitin C-terminal hydrolase family, clan CA). Families within the CA clan are loosely termed papain-like as protein fold of the peptidase unit resembles that of papain, the type example for clan CA. The type example is the human ubiquitin C-terminal hydrolase UCH-L1.

    Ubiquitin is highly conserved, commonly found conjugated to proteins in eukaryotic cells, where it may act as a marker for rapid degradation, or it may have a chaperone function in protein assembly. The ubiquitin is released by cleavage from the bound protein by a protease. A number of deubiquitinising proteases are known: all are activated by thiol compounds, and inhibited by thiol-blocking agents and ubiquitin aldehyde, and as such have the properties of cysteine proteases.

    The deubiquitinsing proteases can be split into 2 size ranges (20-30 kDa and 100-200 kDa): this family are the 20-30 kDa ppeptides which includes the yeast yuh1. Yeast yuh1 protease is known to be active only against small ubiquitin conjugates, being inactive against conjugated beta-galactosidase. A mammalian homologue, UCH (ubiquitin conjugate hydrolase), is one of the most abundant proteins in the brain. Only one conserved cysteine can be identified, along with two conserved histidines. The spacing between the cysteine and the second histidine is thought to be more representative of the cysteine/histidine spacing of a cysteine protease catalytic dyad.

    Proteins where this domain is known:
    PY01755    PY04400   


    PF01090 - Ribosomal_S19e (Pfam link)

    Interpro entry IPR001266 : Ribosomal protein S19e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This family includes a number of eukaryotic and archaebacterial ribosomal proteins; mammalian S19, Drosophila S19, Ascaris lumbricoides S19g (ALEP-1) and S19s, yeast YS16 (RP55A and RP55B), Aspergillus S16 and Haloarcula marismortui HS12.

    Proteins where this domain is known:
    PY04062   


    PF01092 - Ribosomal_S6e (Pfam link)

    Interpro entry IPR001377 : Ribosomal protein S6e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaebacterial ribosomal proteins have been grouped on the basis of sequence similarities. Ribosomal protein S6 is the major substrate of protein kinases in eukaryotic ribosomes and may play an important role in controlling cell growth and proliferation through the selective translation of particular classes of mRNA.

    Proteins where this domain is known:
    PY06397   


    PF01096 - TFIIS_C (Pfam link)

    Interpro entry IPR001222 : Zinc finger, TFIIS-type (Interpro link)

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents a zinc finger motif found in transcription factor IIs (TFIIS). In eukaryotes the initiation of transcription of protein encoding genes by polymerase II (Pol II) is modulated by general and specific transcription factors. The general transcription factors operate through common promoters elements (such as the TATA box). At least eight different proteins associate to form the general transcription factors: TFIIA, -IIB, -IID, -IIE, -IIF, -IIG, -IIH and -IIS. During mRNA elongation, Pol II can encounter DNA sequences that cause reverse movement of the enzyme. Such backtracking involves extrusion of the RNA 3'-end into the pore, and can lead to transcriptional arrest. Escape from arrest requires cleavage of the extruded RNA with the help of TFIIS, which induces mRNA cleavage by enhancing the intrinsic nuclease activity of RNA polymerase (Pol) II, past template-encoded pause sites. TFIIS extends from the polymerase surface via a pore to the internal active site. Two essential and invariant acidic residues in a TFIIS loop complement the Pol II active site and could position a metal ion and a water molecule for hydrolytic RNA cleavage. TFIIS also induces extensive structural changes in Pol II that would realign nucleic acids in the active centre.

    TFIIS is a protein of about 300 amino acids. It contains three regions: a variable N-terminal domain not required for TFIIS activity; a conserved central domain required for Pol II binding; and a conserved C-terminal C4-type zinc finger essential for RNA cleavage. The zinc finger folds in a conformation termed a zinc ribbon characterised by a three-stranded antiparallel beta-sheet and two beta-hairpins. A backbone model for Pol II-TFIIS complex was obtained from X-ray analysis. It shows that a beta hairpin protrudes from the zinc finger and complements the pol II active site.

    Some viral proteins also contain the TFIIS zinc ribbon C-terminal domain. The Vaccinia virus protein, unlike its eukaryotic homologue, is an integral RNA polymerase subunit rather than a readily separable transcription factor.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY01302    PY05691   


    PF01105 - EMP24_GP25L (Pfam link)

    Interpro entry IPR000348 : emp24/gp25L/p24 (Interpro link)

    Pfam description:
    Members of this family are implicated in bringing cargo forward from the ER and binding to coat proteins by their cytoplasmic domains. This domain corresponds closely to the beta-strand rich GOLD domain described in. The GOLD domain is always found combined with lipid- or membrane-association domains.

    Interpro description:

    p24 proteins are major membrane components of COPI- and COPII-coated vesicles and are implicated in cargo selectivity of ER to Golgi transport. Multiple members of the p24 family are found in all eukaryotes, from yeast to mammals. Members of the p24 family are type I membrane proteins with a signal peptide at the amino terminus, a lumenal coiled-coil (extracytosolic) domain, a single transmembrane domain with conserved amino acids, and a short cytoplasmic tail. They may be grouped into at least three subfamilies based on primary sequence. One subfamily comprises yeast Emp24p and mammalian p24A. Another subfamily comprises yeast Erv25p and mammalian Tmp21, and the third subfamily comprises mammalian gp25L proteins.

    Proteins where this domain is known:
    PY00476    PY03735    PY05566    PY06414   


    PF01106 - NifU (Pfam link)

    Interpro entry IPR001075 : NIF system FeS cluster assembly, NifU, C-terminal (Interpro link)

    Pfam description:
    This is an alignment of the carboxy-terminal domain. This is the only common region between the NifU protein from nitrogen-fixing bacteria and rhodobacterial species. The biochemical function of NifU is unknown.

    Interpro description:

    Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S] form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.

    The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transfering them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly.

    The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. No specific functions have been assigned to SufB and SufD. SufA is homologous to IscA, acting as a scaffold protein in which Fe and S atoms are assembled into [FeS] cluster forms, which can then easily be transferred to apoproteins targets.

    In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen.

    This entry represents the C-terminal of NifU and homologous proteins. NifU contains two domains: an N-terminal and a C-terminal domain. These domains exist either together or on different polypeptides, both domains being found in organisms that do not fix nitrogen (e.g. yeast), so they have a broader significance in the cell than nitrogen fixation.

    Proteins where this domain is known:
    PY02489   


    PF01115 - F_actin_cap_B (Pfam link)

    Interpro entry IPR001698 : F-actin capping protein, beta subunit (Interpro link)

    Interpro description:

    The actin filament system, a prominent part of the cytoskeleton in eukaryotic cells, is both a static structure and a dynamic network that can undergo rearrangements: it is thought to be involved in processes such as cell movement and phagocytosis, as well as muscle contraction.

    The F-actin capping protein binds in a calcium-independent manner to the fast growing ends of actin filaments (barbed end) thereby blocking the exchange of subunits at these ends. Unlike gelsolin and severin this protein does not sever actin filaments. The F-actin capping protein is a heterodimer composed of two unrelated subunits: alpha and beta. Neither of the subunits shows sequence similarity to other filament-capping proteins.

    The beta subunit is a protein of about 280 amino acid residues whose sequence is well conserved in eukaryotic species.

    Proteins where this domain is known:
    PY03888   


    PF01119 - DNA_mis_repair (Pfam link)

    Interpro entry IPR013507 : DNA mismatch repair protein, C-terminal (Interpro link)

    Pfam description:
    This family represents the C-terminal domain of the mutL/hexB/PMS1 family. This domain has a ribosomal S5 domain 2-like fold.

    Interpro description:

    This entry represents the C-terminal domain of DNA mismatch repair proteins, such as MutL. This domain functions in promoting dimerisation. The dimeric MutL protein has a key function in communicating mismatch recognition by MutS to downstream repair processes. Mismatch repair contributes to the overall fidelity of DNA replication by targeting mispaired bases that arise through replication errors during homologous recombination and as a result of DNA damage. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex.

    Proteins where this domain is known:
    PY01438    PY05214   


    PF01121 - CoaE (Pfam link)

    Interpro entry IPR001977 : Dephospho-CoA kinase (Interpro link)

    Pfam description:
    This family catalyses the phosphorylation of the 3\'-hydroxyl group of dephosphocoenzyme A to form Coenzyme A EC:2.7.1.24. This enzyme uses ATP in its reaction.

    Interpro description:

    This family contains dephospho-CoA kinases, which catalyzes the final step in CoA biosynthesis, the phosphorylation of the 3'-hydroxyl group of ribose using ATP as a phosphate donor.

    The crystal structures of a number of the proteins in this entry have been determined, including the structure of the protein from Haemophilus influenzae to 2.0-A resolution in a comlex with ATP. The protein consists of three domains: the nucleotide-binding domain with a five-stranded parallel beta-sheet, the substrate-binding alpha-helical domain, and the lid domain formed by a pair of alpha-helices; the overall topology of the protein resembles the structures of other nucleotide kinases.

    Proteins where this domain is known:
    PY01909   


    PF01125 - G10 (Pfam link)

    Interpro entry IPR001748 : G10 protein (Interpro link)

    Interpro description:
    A Xenopus protein known as G10 has been found to be highly conserved in a wide range of eukaryotic species. The function of G10 is still unknown. G10 is a protein of about 17 to 18 kDa (143 to 157 residues) which is hydrophilic and whose C-terminal half is rich in cysteines and could be involved in metal-binding.

    Proteins where this domain is known:
    PY00044   


    PF01131 - Topoisom_bac (Pfam link)

    Interpro entry IPR013497 : DNA topoisomerase, type IA, central (Interpro link)

    Pfam description:
    This subfamily of topoisomerase is divided on the basis that these enzymes preferentially relax negatively supercoiled DNA, from a 5\' phospho- tyrosine linkage in the enzyme-DNA covalent intermediate and has high affinity for single stranded DNA.

    Interpro description:

    DNA topoisomerases regulate the number of topological links between two DNA strands (i.e. change the number of superhelical turns) by catalysing transient single- or double-strand breaks, crossing the strands through one another, then resealing the breaks. These enzymes have several functions: to remove DNA supercoils during transcription and DNA replication; for strand breakage during recombination; for chromosome condensation; and to disentangle intertwined DNA during mitosis. DNA topoisomerases are divided into two classes: type I enzymes (topoisomerases I, III and V) break single-strand DNA, and type II enzymes (topoisomerases II, IV and VI) break double-strand DNA.

    Type I topoisomerases are ATP-independent enzymes (except for reverse gyrase), and can be subdivided according to their structure and reaction mechanisms: type IA (bacterial and archaeal topoisomerase I, topoisomerase III and reverse gyrase) and type IB (eukaryotic topoisomerase I and topoisomerase V). These enzymes are primarily responsible for relaxing positively and/or negatively supercoiled DNA, except for reverse gyrase, which can introduce positive supercoils into DNA.

    Type IA topoisomerases are comprised of four domains that together form a toroidal structure with a central hole large enough to accommodate single- and double-stranded DNA: an N-terminal alpha/beta Toprim domain, domain 2 and the C-terminal domain 4 are winged-helix domains, and domain 3 is a beta-barrel. Domains 1 (Toprim) and 3 form the active site of the enzyme, while the winged helix domains 2 and 4 form a single-strand DNA-binding groove. This entry represents the central portion of the enzyme, which covers domains 2 and 3 in topoisomerase type IA enzymes.

    More information about this protein can be found at Protein of the Month: DNA Topoisomerase.

    Proteins where this domain is known:
    PY01134   


    PF01133 - ER (Pfam link)

    Interpro entry IPR000781 : (Interpro link)

    Pfam description:
    Enhancer of rudimentary is a protein of unknown function that is highly conserved in plants and animals. This protein is found to be an enhancer of the rudimentary gene Swiss:P05990.

    Interpro description:
    The Drosophila protein 'enhancer of rudimentary' (gene (e(r)) is a small protein of 104 residues whose function is not yet clear. From an evolutionary point of view, it is highly conserved and has been found to exist in probably all multicellular eukaryotic organisms. It has been proposed that this protein plays a role in the cell cycle.

    Proteins where this domain is known:
    PY04075   


    PF01134 - GIDA (Pfam link)

    Interpro entry IPR002218 : Glucose-inhibited division protein A-related (Interpro link)

    Interpro description:

    GidA is a tRNA modification enzyme found in bacteria and mitochondria. Though its precise molecular function of these proteins is not known, it is involved in the 5-carboxymethylaminomethyl modification of the wobble uridine base in some tRNAs. Sequence variations in the human mitochondrial protein may influence the severity of aminoglycoside-induced deafness.

    This entry represents GidA and related proteins, such as Gid, whose functions are not known.

    Proteins where this domain is known:
    PY01214   


    PF01135 - PCMT (Pfam link)

    Interpro entry IPR000682 : Protein-L-isoaspartate(D-aspartate) O-methyltransferase (Interpro link)

    Interpro description:

    Protein-L-isoaspartate(D-aspartate) O-methyltransferase (PCMT) (which is also known as L-isoaspartyl protein carboxyl methyltransferase) is an enzyme that catalyses the transfer of a methyl group from S-adenosylmethionine to the free carboxyl groups of D-aspartyl or L-isoaspartyl residues in a variety of peptides and proteins. The enzyme does not act on normal L-aspartyl residues L-isoaspartyl and D-aspartyl are the products of the spontaneous deamidation and/or isomerisation of normal L-aspartyl and L-asparaginyl residues in proteins. PCMT plays a role in the repair and/or degradation of these damaged proteins; the enzymatic methyl esterification of the abnormal residues can lead to their conversion to normal L-aspartyl residues. The SAM domain is present in most of these proteins.

    Proteins where this domain is known:
    PY05016   


    PF01137 - RTC (Pfam link)

    Interpro entry IPR000228 : (Interpro link)

    Pfam description:
    RNA cyclases are a family of RNA-modifying enzymes that are conserved in all cellular organisms. They catalyse the ATP-dependent conversion of the 3\'-phosphate to the 2\',3\'-cyclic phosphodiester at the end of RNA, in a reaction involving formation of the covalent AMP-cyclase intermediate. The structure of RTC demonstrates that RTCs are comprised two domain. The larger domain contains an insert domain of approximately 100 amino acids.

    Interpro description:
    RNA cyclases are a family of RNA-modifying enzymes that are conserved in eukaryotes, bacteria and archaea. RNA 3'-terminal phosphate cyclase catalyses the conversion of 3'-phosphate to a 2',3'-cyclic phosphodiester at the end of RNA.
     ATP + RNA 3'-terminal-phosphate = AMP + diphosphate + RNA terminal-2',3'-cyclic-phosphate 
    These enzymes might be responsible for production of the cyclic phosphate RNA ends that are known to be required by many RNA ligases in both prokaryotes and eukaryotes.

    RNA cyclase is a protein of from 36 to 42 kDa. The best conserved region is a glycine-rich stretch of residues located in the central part of the sequence and which is reminiscent of various ATP, GTP or AMP glycine-rich loops.

    The crystal structure of RNA 3'-terminal phosphate cyclase shows that each molecule consists of two domains. The larger domain contains three repeats of a folding unit comprising two parallel alpha helices and a four-stranded beta sheet; this fold was previously identified in translation initiation factor 3 (IF3). The large domain is similar to one of the two domains of 5-enolpyruvylshikimate-3-phosphate synthase and UDP-N-acetylglucosamine enolpyruvyl transferase. The smaller domain uses a similar secondary structure element with different topology, observed in many other proteins such as thioredoxin. Although the active site of this enzyme could not be unambiguously assigned, it can be mapped to a region surrounding His309, an adenylate acceptor, in which a number of amino acids are highly conserved in the enzyme from different sources.

    Proteins where this domain is known:
    PY05891   


    PF01138 - RNase_PH (Pfam link)

    Interpro entry IPR001247 : Exoribonuclease, phosphorolytic domain 1 (Interpro link)

    Pfam description:
    This family includes 3\'-5\' exoribonucleases. Ribonuclease PH contains a single copy of this domain, and removes nucleotide residues following the -CCA terminus of tRNA. Polyribonucleotide nucleotidyltransferase (PNPase) contains two tandem copies of the domain. PNPase is involved in mRNA degradation in a 3\'-5\' direction. The exosome is a 3\'-5\' exoribonuclease complex that is required for 3\' processing of the 5.8S rRNA. Three of its five protein components, Swiss:P46948 Swiss:Q12277 and Swiss:P25359 contain a copy of this domain. Swiss:Q10205, a hypothetical protein from S. pombe appears to belong to an uncharacterised subfamily. This subfamily is found in both eukaryotes and archaebacteria.

    Interpro description:

    The PH (phosphorolytic) domain is responsible for 3'-5' exoribonuclease activity, although in some proteins this domain has lost its catalytic function. An active PH domain uses inorganic phosphate as a nucleophile, adding it across the phosphodiester bond between the end two nucleotides in order to release ribonucleoside 5'-diphosphate (rNDP) from the 3' end of the RNA substrate.

    PH domains can be found in bacterial/organelle RNases and PNPases (polynucleotide phosphorylases), as well as in archaeal and eukaryotic RNA exosomes, the later acting as nano-compartments for the degradation or processing of RNA (including mRNA, rRNA, snRNA and snoRNA). Bacterial/organelle PNPases share a common barrel structure with RNA exosomes, consisting of a hexameric ring of PH domains that act as a degradation chamber, and an S1-domain/KH-domain containing cap that binds the RNA substrate (and sometimes accessory proteins) in order to regulate and restrict entry into the degradation chamber . Unstructured RNA substrates feed in through the pore made by the S1 domains, are degraded by the PH domain ring, and exit as nucleotides via the PH pore at the opposite end of the barrel.

    This entry represents the phosphorolytic (PH) domain 1, which has a core 2-layer alpha/beta structure with a left-handed crossover, similar to that found in ribosomal protein S5. This domain is found in bacterial/organelle PNPases and in archaeal/eukaryotic exosomes.

    More information about these proteins can be found at Protein of the Month: RNA Exosomes.

    Proteins where this domain is known:
    PY00320    PY02553    PY06726    PY07159   


    PF01139 - UPF0027 (Pfam link)

    Interpro entry IPR001233 : (Interpro link)

    Interpro description:
    A number of uncharacterised proteins including Escherichia coli rtcB, Mycobacterium tuberculosis MtCY441.01., Caenorhabditis elegans F16A11.2 and Methanocaldococcus jannaschii (Methanococcus jannaschii) MJ0682 belong to this family.

    Proteins where this domain is known:
    PY03374    PY03776   


    PF01142 - TruD (Pfam link)

    Interpro entry IPR001656 : tRNA pseudouridine synthase D (Interpro link)

    Pfam description:
    TruD is responsible for synthesis of pseudouridine from uracil-13 in transfer RNAs. The structure of TruD reveals an overall V-shaped molecule which contains an RNA-binding cleft.

    Interpro description:

    This entry represents tRNA pseudouridine synthase D (TruD) proteins, which appear to be responsible for synthesis of pseudouridine from uracil-13 in transfer RNAs. They are hydrophilic proteins of from 39 to 77 kDa and homologues are found in bacteria, archaea, and eukarya.

    Proteins where this domain is known:
    PY03001    PY03893   


    PF01145 - Band_7 (Pfam link)

    Interpro entry IPR001107 : (Interpro link)

    Pfam description:
    This family has been called SPFH, Band 7 or PHB domain.

    Interpro description:
    The band 7 protein is an integral membrane protein which is thought to regulate cation conductance. A variety of proteins belong to this family. These include the prohibitins, cytoplasmic anti-proliferative proteins and stomatin, an erythrocyte membrane protein. Bacterial HflC protein also belongs to this family.

    Proteins where this domain is known:
    PY03188    PY07166    PY07442   


    PF01148 - CTP_transf_1 (Pfam link)

    Interpro entry IPR000374 : Phosphatidate cytidylyltransferase (Interpro link)

    Pfam description:
    The members of this family are integral membrane protein cytidylyltransferases. The family includes phosphatidate cytidylyltransferase EC:2.7.7.41 as well as Sec59 from yeast. Sec59 is a dolichol kinase EC:2.7.1.108.

    Interpro description:
    Phosphatidate cytidylyltransferase (also known as CDP- diacylglycerol synthase) (CDS) is the enzyme that catalyzes the synthesis of CDP-diacylglycerol from CTP and phosphatidate (PA):
     CTP + phosphatidate = diphosphate + CDP-diacylglycerol 
    CDP-diacylglycerol is an important branch point intermediate in both prokaryotic and eukaryotic organisms. CDS is a membrane-bound enzyme.

    Proteins where this domain is known:
    PY01816   


    PF01150 - GDA1_CD39 (Pfam link)

    Interpro entry IPR000407 : Nucleoside phosphatase GDA1/CD39 (Interpro link)

    Interpro description:

    A number of nucleoside diphosphate and triphosphate hydrolases as well as some yet uncharacterised proteins have been found to belong to the same family. The uncharacterised proteins all seem to be membrane-bound.

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at Protein Reviews On The Web (http://mpr.nci.nih.gov/prow/).

    Proteins where this domain is known:
    PY06466   


    PF01151 - ELO (Pfam link)

    Interpro entry IPR002076 : GNS1/SUR4 membrane protein (Interpro link)

    Pfam description:
    Members of this family are involved in long chain fatty acid elongation systems that produce the 26-carbon precursors for ceramide and sphingolipid synthesis. Predicted to be integral membrane proteins, in eukaryotes they are probably located on the endoplasmic reticulum. Yeast ELO3 (Swiss:P40319) affects plasma membrane H+-ATPase activity, and may act on a glucose-signaling pathway that controls the expression of several genes that are transcriptionally regulated by glucose such as PMA1.

    Interpro description:

    This group of eukaryotic integral membrane proteins are evolutionary related, but exact function has not yet clearly been established. The proteins have from 290 to 435 amino acid residues. Structurally, they seem to be formed of three sections: a N-terminal region with two transmembrane domains, a central hydrophilic loop and a C-terminal region that contains from one to three transmembrane domains. Members of this family are involved in long chain fatty acid elongation systems that produce the 26-carbon precursors for ceramide and sphingolipid synthesis. Predicted to be integral membrane proteins, in eukaryotes they are probably located on the endoplasmic reticulum. Yeast ELO3 affects plasma membrane H+-ATPase activity, and may act on a glucose-signalling pathway that controls the expression of several genes that are transcriptionally regulated by glucose such as PMA1.

    Proteins where this domain is known:
    PY00625    PY03060    PY05248   


    PF01157 - Ribosomal_L21e (Pfam link)

    Interpro entry IPR001147 : Ribosomal protein L21e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    L21E family contains proteins from a number of eukaryotic and archaebacterial organisms which include; mammalian L2, Entamoeba histolytica L21, Caenorhabditis elegans L21 (C14B9.7), Saccharomyces cerevisiae (Baker's yeast) L21E (URP1) and Haloarcula marismortui HL31.

    Proteins where this domain is known:
    PY05142   


    PF01158 - Ribosomal_L36e (Pfam link)

    Interpro entry IPR000509 : Ribosomal protein L36e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. The L36E ribosomal family consists of mammalian, Caenorhabditis elegans and Drosophila L36, Candida albicans L39, and yeast YL39 ribosomal proteins.

    Proteins where this domain is known:
    PY06787   


    PF01161 - PBP (Pfam link)

    Interpro entry IPR008914 : (Interpro link)

    Interpro description:

    The PEBP (PhosphatidylEthanolamine-Binding Protein) family is a highly conserved group of proteins that have been identified in numerous tissues in a wide variety of organisms, including bacteria, yeast, nematodes, plants, drosophila and mammals. The various functions described for members of this family include lipid binding, neuronal development, serine protease inhibition, the control of the morphological switch between shoot growth and flower structures, and the regulation of several signalling pathways such as the MAP kinase pathway, and the NF-kappaB pathway. The control of the latter two pathways involves the PEBP protein RKIP, which interacts with MEK and Raf-1 to inhibit the MAP kinase pathway, and with TAK1, NIK, IKKalpha and IKKbeta to inhibit the NF-kappaB pathway. Other PEBP-like proteins that show strong structural homology to PEBP include Escherichia coli YBHB and YBCL, the Rattus norvegicus (Rat) neuropeptide HCNP, and Antirrhinum majus (Garden snapdragon) protein centroradialis (CEN).

    Structures have been determined for several members of the PEBP-like family, all of which show extensive fold conservation. The structure consists of a large central beta-sheet flanked by a smaller beta-sheet on one side, and an alpha helix on the other. Sequence alignments show two conserved central regions, CR1 and CR2, that form a consensus signature for the PEBP family. These two regions form part of the ligand-binding site, which can accommodate various anionic groups. The N- and C-terminal regions are the least conserved, and may be involved in interactions with different protein partners. The N-terminal residues 2-12 form the natural cleavage peptide HCNP involved in neuronal development. The C-terminal region is deleted in plant and bacterial PEBP homologues, and may help control accessibility to the active site.

    Proteins where this domain is known:
    PY01852   


    PF01163 - RIO1 (Pfam link)

    Interpro entry IPR018934 : RIO-like kinase (Interpro link)

    Pfam description:
    This is a family of atypical serine kinases which are found in archaea, bacteria and eukaryotes. Activity of Rio1 is vital in Saccharomyces cerevisiae for the processing of ribosomal RNA, as well as for proper cell cycle progression and chromosome maintenance. The structure of RIO1 has been determined.

    Interpro description:

    Protein kinases are a group of enzymes that possess a catalytic subunit which transfers the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. The enzymes fall into two broad classes, characterised with respect to substrate specificity: serine/threonine specific and tyrosine specific.

    Protein kinase function has been evolutionarily conserved from Escherichia coli to human. Protein kinases play a role in a mulititude of cellular processes, including division, proliferation, apoptosis, and differentiation. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins.

    The catalytic subunits of protein kinases are highly conserved, and several structures have been solved, leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases.

    This entry represents RIO kinase, they exhibit little sequence similarity with eukaryotic protein kinases, and are classified as atypical protein kinases. The conformation of ATP when bound to the RIO kinases is unique when compared with ePKs, such as serine/threonine kinases or the insulin receptor tyrosine kinase, suggesting that the detailed mechanism by which the catalytic aspartate of RIO kinases participates in phosphoryl transfer may not be identical to that employed in known serine/threonine ePKs. Representatives of the RIO family are present in organisms varying from Archaea to humans, although the RIO3 proteins have only been identified in multicellular eukaryotes, to date.

    Yeast Rio1 and Rio2 proteins are required for proper cell cycle progression and chromosome maintenance, and are necessary for survival of the cells. These proteins are involved in the processing of 20 S pre-rRNA via late 18 S rRNA processing.

    Proteins where this domain is known:
    PY01572    PY06287   


    PF01167 - Tub (Pfam link)

    Interpro entry IPR000007 : (Interpro link)

    Interpro description:

    Tubby, an autosomal recessive mutation, mapping to mouse chromosome 7, was recently found to be the result of a splicing defect in a novel gene with unknown function. This mutation maps to the tub gene. The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and sensory deficits. By contrast with the rapid juvenile-onset weight gain seen in diabetes (db) and obese (ob) mice, obesity in tubby mice develops gradually, and strongly resembles the late-onset obesity observed in the human population. Excessive deposition of adipose tissue culminates in a two-fold increase of body weight. Tubby mice also suffer retinal degeneration and neurosensory hearing loss. The tripartite character of the tubby phenotype is highly similar to human obesity syndromes, such as Alstrom and Bardet-Biedl. Although these phenotypes indicate a vital role for tubby proteins, no biochemical function has yet been ascribed to any family member, although it has been suggested that the phenotypic features of tubby mice may be the result of cellular apoptosis triggered by expression of the mutated tub gene. TUB is the founding-member of the tubby-like proteins, the TULPs. TULPs are found in multicellular organisms from both the plant and animal kingdoms. Ablation of members of this protein family cause disease phenotypes that are indicative of their importance in nervous-system function and development.

    Mammalian TUB is a hydrophilic protein of ~500 residues. The N-terminal portion of the protein is conserved neither in length nor sequence, but, in TUB, contains the nuclear localisation signal and may have transcriptional-activation activity. The C-terminal 250 residues are highly conserved. The C-terminal extremity contains a cysteine residue that might play an important role in the normal functioning of these proteins. The crystal structure of the C-terminal core domain from mouse tubby has been determined to 1.9A resolution. This domain is arranged as a 12-stranded, all anti-parallel, closed beta-barrel that surrounds a central alpha helix, (which is at the extreme carboxyl terminus of the protein) that forms most of the hydrophobic core. Structural analyses suggest that TULPs constitute a unique family of bipartite transcription factors.

    Proteins where this domain is known:
    PY04112   


    PF01170 - UPF0020 (Pfam link)

    Interpro entry IPR000241 : (Interpro link)

    Pfam description:
    This domain is probably a methylase. It is associated with the THUMP domain that also occurs with RNA modification domains.

    Interpro description:
    This domain is probably a methylase. It is associated with the THUMP domain that also occurs with RNA modification domains.

    Proteins where this domain is known:
    PY05490   


    PF01171 - ATP_bind_3 (Pfam link)

    Interpro entry IPR011063 : (Interpro link)

    Pfam description:
    This family of proteins belongs to the PP-loop superfamily.

    Interpro description:

    This entry represents the PP-loop motif superfamily. The PP-loop motif appears to be a modified version of the P-loop of nucleotide binding domain that is involved in phosphate binding. Named PP-motif, since it appears to be a part of a previously uncharacterised ATP pyrophophatase domain. ATP sulfurylases, Escherichia coli NtrL, and Bacillus subtilis OutB consist of this domain alone. In other proteins, the pyrophosphatase domain is associated with amidotransferase domains (type I or type II), a putative citrulline-aspartate ligase domain or a nitrilase/amidase domain.

    Proteins where this domain is known:
    PY03101    PY04866   


    PF01172 - SBDS (Pfam link)

    Interpro entry IPR002140 : (Interpro link)

    Pfam description:
    This family is highly conserved in species ranging from archaea to vertebrates and plants. The family contains several Shwachman-Bodian-Diamond syndrome (SBDS) proteins from both mouse and humans. Shwachman-Diamond syndrome is an autosomal recessive disorder with clinical features that include pancreatic exocrine insufficiency, haematological dysfunction and skeletal abnormalities. It is characterised by bone marrow failure and leukemia predisposition. Members of this family play a role in RNA metabolism. In yeast these proteins have been shown to be critical for the release and recycling of the nucleolar shuttling factor Tif6 from pre-60S ribosomes, a key step in 60S maturation and translational activation of ribosomes. This data links defective late 60S subunit maturation to an inherited bone marrow failure syndrome associated with leukemia predisposition.

    Interpro description:
    The proteins in this entry are highly conserved in species ranging from archaea to vertebrates and plants. The family contains several Shwachman-Bodian-Diamond syndrome (SBDS) proteins from both mouse and humans. Shwachman-Diamond syndrome is an autosomal recessive disorder with clinical features that include pancreatic exocrine insufficiency, haematological dysfunction and skeletal abnormalities. It is characterised by bone marrow failure and leukemia predisposition. Members of this family play a role in RNA metabolism. In yeast these proteins have been shown to be critical for the release and recycling of the nucleolar shuttling factor Tif6 from pre-60S ribosomes, a key step in 60S maturation and translational activation of ribosomes. This data links defective late 60S subunit maturation to an inherited bone marrow failure syndrome associated with leukemia predisposition.

    A number of uncharacterised hydrophilic proteins of about 30 kDa share regions of similarity. These include,

    Proteins where this domain is known:
    PY04031   


    PF01174 - SNO (Pfam link)

    Interpro entry IPR002161 : (Interpro link)

    Pfam description:
    This family and its amidotransferase domain was first described in. It is predicted that members of this family are involved in the pyridoxine biosynthetic pathway, based on the proximity and co-regulation of the corresponding genes and physical interaction between the members of Pfam:PF01174 and Pfam:PF01680.

    Interpro description:

    Members of this family are involved in the pyridoxine biosynthetic pathway. The regulation of cellular growth and proliferation in response to environmental cues is critical for development and the maintenance of viability in all organisms. In unicellular organisms, such as the budding yeast Saccharomyces cerevisiae (Baker's yeast), growth and proliferation are regulated by nutrient availability.

    Proteins where this domain is known:
    PY02155   


    PF01176 - eIF-1a (Pfam link)

    Interpro entry IPR006196 : S1, IF1 type (Interpro link)

    Pfam description:
    This family includes both the eukaryotic translation factor eIF-1A and the bacterial translation initiation factor IF-1.

    Interpro description:

    The S1 domain of around 70 amino acids, originally identified in ribosomal protein S1, is found in a large number of RNA-associated proteins. It has been shown that S1 proteins bind RNA through their S1 domains with some degree of sequence specificity. This type of S1 domain is found in translation initiation factor 1.

    The solution structure of one S1 RNA-binding domain from Escherichia coli polynucleotide phosphorylase has been determined. It displays some similarity with the cold shock domain (CSD). Both the S1 and the CSD domain consist of an antiparallel beta barrel of the same topology with 5 beta strands. This fold is also shared by many other proteins of unrelated function and is known as the OB fold. However, the S1 and CSD fold can be distinguished from the other OB folds by the presence of a short 3(10) helix at the end of strand 3. This unique feature is likely to form a part of the DNA/RNA-binding site.

    More information about these proteins can be found at Protein of the Month: RNA Exosomes.

    Proteins where this domain is known:
    PY01972    PY01973   


    PF01180 - DHO_dh (Pfam link)

    Interpro entry IPR012135 : Dihydroorotate dehydrogenase, classes 1 and 2 (Interpro link)

    Interpro description:

    Dihydroorotate dehydrogenase (DHOD), also known as dihydroorotate oxidase, catalyses the fourth step in de novo pyrimidine biosynthesis, the stereospecific oxidation of (S)-dihydroorotate to orotate, which is the only redox reaction in this pathway. DHODs can be divided into two mains classes: class 1 cytosolic enzymes found primarily in Gram-positive bacteria, and class 2 membrane-associated enzymes found primarily in eukaryotic mitochondria and Gram-negative bacteria.

    The class 1 DHODs can be further divided into subclasses 1A and 1B, which differ in their structural organisation and use of electron acceptors. The 1A enzyme is a homodimer of two PyrD subunits where each subunit forms a TIM barrel fold with a bound FMN cofactor located near the top of the barrel. Fumarate is the natural electron acceptor for this enzyme. The 1B enzyme, in contrast is a heterotetramer composed of a central, FMN-containing, PyrD homodimer resembling the 1A homodimer, and two additional PyrK subunits which contain FAD and a 2Fe-2S cluster. These additional groups allow the enzyme to use NAD(+) as its natural electron acceptor.

    The class 2 membrane-associated enzymes are monomers which have the FMN-containing TIM barrel domain found in the class 1 PyrD subunit, and an additional N-terminal alpha helical domain. These enzymes use respiratory quinones as the physiological electron acceptor.

    This entry represents the FMN-binding subunit common to all classes of dihydroorotate dehydrogenase.

    Proteins where this domain is known:
    PY02580   


    PF01182 - Glucosamine_iso (Pfam link)

    Interpro entry IPR006148 : Glucosamine/galactosamine-6-phosphate isomerase (Interpro link)

    Interpro description:
    This entry contains 6-phosphogluconolactonase, Glucosamine-6-phosphate isomerase, and Galactosamine-6-phosphate isomerase. 6-phosphogluconolactonase is the enzyme responsible for the hydrolysis of 6-phosphogluconolactone to 6-phosphogluconate, the second step in the pentose phosphate pathway. Glucosamine-6-phosphate isomerase (or Glucosamine 6-phosphate deaminase) is the enzyme responsible for the conversion of D-glucosamine 6-phosphate into D-fructose 6-phosphate. It is the last specific step in the pathway for N-acetylglucosamine (GlcNAC) utilization in bacteria such as Escherichia coli (gene nagB) or in fungi such as Candida albicans (gene NAG1). A region located in the central part of Glucosamine-6-phosphate isomerase contains a conserved histidine which has been shown, in nagB, to be important for the pyranose ring-opening step of the catalytic mechanism.

    Proteins where this domain has been detected by our approach:
    PY00793   


    PF01187 - MIF (Pfam link)

    Interpro entry IPR001398 : (Interpro link)

    Interpro description:

    Macrophage migration inhibitory factor (MIF) is a key regulatory cytokine within innate and adaptive immune responses, capable of promoting and modulating the magnitude of the response. MIF is released from T-cells and macrophages, and acts within the neuroendocrine system. MIF is capable of tautomerase activity, although its biological function has not been fully characterised. It is induced by glucocorticoid and is capable of overriding the anti-inflammatory actions of glucocorticoid. MIF regulates cytokine secretion and the expression of receptors involved in the immune response. It can be taken up into target cells in order to interact with intracellular signalling molecules, inhibiting p53 function, and/or activating components of the mitogen-activated protein kinase and Jun-activation domain-binding protein-1 (Jab-1). MIF has been linked to various inflammatory diseases, such as rheumatoid arthritis and atherosclerosis.

    The MIF homologue D-dopachrome tautomerase is involved in detoxification through the conversion of dopaminechrome (and possibly norepinephrinechrome), the toxic quinine product of the neurotransmitter dopamine (and norepinephrine), to an indole derivative that can serve as a precursor to neuromelanin.

    Proteins where this domain is known:
    PY05452   


    PF01189 - Nol1_Nop2_Fmu (Pfam link)

    Interpro entry IPR001678 : (Interpro link)

    Interpro description:

    This domain is found in archaeal, bacterial and eukaryotic proteins.

    In the archaea and bacteria, they are annotated as putative nucleolar protein, Sun (Fmu) family protein or tRNA/rRNA cytosine-C5-methylase. The majority have the S-adenosyl methionine (SAM) binding domain and are related to Escherichia coli Fmu (Sun) protein (16S rRNA m5C 967 methyltransferase) whose structure has been determined.

    In the eukaryota, the majority are annotated as being 'hypothetical protein', nucleolar protein or the Nop2/Sun (Fmu) family. Unlike their bacterial homologues, few of the eukaryotic members in this family have a the SAM binding signature. Despite this, Saccharomyces cerevisiae (Baker's yeast) Nop2p is a probable RNA m5C methyltransferase. It is essential for processing and maturation of 27S pre-rRNA and large ribosomal subunit biogenesis; localized to the nucleolus and is essential for viability. Reduced Nop2p expression limits yeast growth and decreases levels of mature 60S ribosomal subunits while altering rRNA processing. There is substantial identity between Nop2p and Homo sapiens (Human) p120 (NOL1), which is also called the proliferation-associated nucleolar antigen.

    Proteins where this domain is known:
    PY01948    PY03774    PY05667   


    PF01191 - RNA_pol_Rpb5_C (Pfam link)

    Interpro entry IPR000783 : RNA polymerase, subunit H/Rpb5 C-terminal (Interpro link)

    Pfam description:
    The assembly domain of Rpb5. The archaeal equivalent to this domain is subunit H. Subunit H lacks the N-terminal domain.

    Interpro description:

    Prokaryotes contain a single DNA-dependent RNA polymerase (RNAP; that is responsible for the transcription of all genes, while eukaryotes have three classes of RNAPs (I-III) that transcribe different sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. Certain subunits of RNAPs, including RPB5 (POLR2E in mammals), are common to all three eukaryotic polymerases. RPB5 plays a role in the transcription activation process. Eukaryotic RPB5 has a bipartite structure consisting of a unique N-terminal region, plus a C-terminal region that is structurally homologous to the prokaryotic RPB5 homologue, subunit H (gene rpoH).

    This entry represents prokaryotic subunit H and the C-terminal domain of eukaryotic RPB5, which share a two-layer alpha/beta fold, with a core structure of beta/alpha/beta/alpha/beta(2).

    Proteins where this domain is known:
    PY02779   


    PF01192 - RNA_pol_Rpb6 (Pfam link)

    Interpro entry IPR006110 : RNA polymerase Rpb6 (Interpro link)

    Pfam description:
    Rpb6 is an essential subunit in the eukaryotic polymerases Pol I, II and III. The bacterial equivalent to Rpb6 is the omega subunit. Rpb6 and omega are structurally conserved and both function in polymerase assembly.

    Interpro description:

    In eukaryotes, there are three different forms of DNA-dependent RNA polymerases transcribing different sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebacteria, there is generally a single form of RNA polymerase which also consists of an oligomeric assemblage of 10 to 13 polypeptides. A component of 14 to 18 kDa shared by all three forms of eukaryotic RNA polymerases and which has been sequenced in budding yeast (gene RPB6 or RPO26), in Schizosaccharomyces pombe (Fission yeast) (gene rpb6 or rpo15), in human and in African swine fever virus (ASFV) is evolutionary related to the archaebacterial subunit K (gene rpoK). The archaebacterial protein is colinear with the C-terminal part of the eukaryotic subunit.

    The structures of the omega subunit and RBP6, and the structures of the omega/beta' and RPB6/RPB1 interfaces, suggest a molecular mechanism for the function of omega and RPB6 in promoting RNAP assembly and/or stability. The conserved regions of omega and RPB6 form a compact structural domain that interacts simultaneously with conserved regions of the largest RNAP subunit and with the C-terminal tail following a conserved region of the largest RNAP subunit. The second half of the conserved region of omega and RPB6 forms an arc that projects away from the remainder of the structural domain and wraps over and around the C-terminal tail of the largest RNAP subunit, clamping it in a crevice, and threading the C-terminal tail of the largest RNAP subunit through the narrow gap between omega and RPB6.

    Proteins where this domain is known:
    PY01352   


    PF01193 - RNA_pol_L (Pfam link)

    Interpro entry IPR011261 : DNA-directed RNA polymerase, dimerisation (Interpro link)

    Pfam description:
    The two eukaryotic subunits Rpb3 and Rpb11 dimerise to from a platform onto which the other subunits of the RNA polymerase assemble (D/L in archaea). The prokaryotic equivalent to the Rpb3/Rpb11 platform is the alpha-alpha dimer. The dimerisation domain of the alpha subunit/Rpb3 is interrupted by an insert domain (PFAM:PF01000). Some of the alpha subunits also contain iron-sulphur binding domains (PFAM:PF00037). Rpb11 is found as a continuous domain. Members of this family include: alpha subunit from eubacteria alpha subunits from chloroplasts Rpb3 subunits from eukaryotes Rpb11 subunits from eukaryotes RpoD subunits from archaeal RpoL subunits from archaeal

    Interpro description:

    DNA-directed RNA polymerases(also known as DNA-dependent RNA polymerases) are responsible for the polymerisation of ribonucleotides into a sequence complementary to the template DNA. In eukaryotes, there are three different forms of DNA-directed RNA polymerases transcribing different sets of genes. Most RNA polymerases are multimeric enzymes and are composed of a variable number of subunits. The core RNA polymerase complex consists of five subunits (two alpha, one beta, one beta-prime and one omega) and is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme. The core RNA polymerase complex forms a "crab claw"-like structure with an internal channel running along the full length. The key functional sites of the enzyme, as defined by mutational and cross-linking analysis, are located on the inner wall of this channel.

    RNA synthesis follows after the attachment of RNA polymerase to a specific site, the promoter, on the template DNA strand. The RNA synthesis process continues until a termination sequence is reached. The RNA product, which is synthesised in the 5' to 3'direction, is known as the primary transcript. Eukaryotic nuclei contain three distinct types of RNA polymerases that differ in the RNA they synthesise:

    Eukaryotic cells are also known to contain separate mitochondrial and chloroplast RNA polymerases. Eukaryotic RNA polymerases, whose molecular masses vary in size from 500 to 700 kD, contain two non-identical large (>100 kDa) subunits and an array of up to 12 different small (less than 50 kDa) subunits.

    RNA polymerase (RNAP) II, which is responsible for all mRNA synthesis in eukaryotes, consists of 12 subunits. Subunits Rpb3 and Rpb11 form a heterodimer that is functionally analogous to the archaeal RNAP D/L heterodimer, and to the prokaryotic RNAP alpha subunit (RpoA) homodimer. In each case, they play a key role in RNAP assembly by forming a platform on which the catalytic subunits (eukaryotic Rpb1/Rpb2, and prokaryotic beta/beta') can interact. These different subunits share regions of homology required for dimerisation. In eukaryotic Rpb11 and archaeal L subunits, the dimerisation domain consists of a contiguous Rpb11-like domain, whereas in eukaryotic Rpb3, archaeal D and bacterial RpoA subunits, the dimerisation domain consists of the Rpb11-like domain interrupted by an insert domain. In the prokaryotic alpha subunit, this dimerisation domain is the N-terminal domain.

    Proteins where this domain is known:
    PY00033    PY01971    PY02415    PY04295    PY04455   


    PF01194 - RNA_pol_N (Pfam link)

    Interpro entry IPR000268 : RNA polymerases, N/8 Kd subunits (Interpro link)

    Interpro description:
    In eukaryotes, there are three different forms of DNA-dependent RNA polymerases transcribing different sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebacteria, there is generally a single form of RNA polymerase which also consists of an oligomeric assemblage of 10 to 13 polypeptides. Archaebacterial subunit N (gene rpoN) is a small protein of about 8 kDa, it is evolutionary related to a 8.3 kDa component shared by all three forms of eukaryotic RNA polymerases (gene RPB10 in yeast and POLR2J in mammals) as well as to African swine fever virus (ASFV) protein CP80R. There is a conserved region which is located at the N-terminal extremity of these polymerase subunits; this region contains two cysteines that binds a zinc ion.

    Proteins where this domain is known:
    PY00134   


    PF01196 - Ribosomal_L17 (Pfam link)

    Interpro entry IPR000456 : Ribosomal protein L17 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L17 is one of the proteins from the large ribosomal subunit. Bacterial L17 is a protein of 120 to 130 amino-acid residues while yeast YmL8 is twice as large (238 residues). The N-terminal half of YmL8 is colinear with the sequence of L17 from Escherichia coli.

    Proteins where this domain is known:
    PY00041    PY01260   


    PF01198 - Ribosomal_L31e (Pfam link)

    Interpro entry IPR000054 : Ribosomal protein L31e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaebacterial large subunit ribosomal proteins can be grouped on the basis of sequence similarities. These proteins have 87 to 128 amino-acid residues. This family consists of:

  • Yeast L34
  • Archaeal L31
  • Plants L31
  • Mammalian L31
  • Proteins where this domain is known:
    PY00344   


    PF01199 - Ribosomal_L34e (Pfam link)

    Interpro entry IPR008195 : Ribosomal protein L34e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaebacterial ribosomal proteins belong to the L34e family. These include, vertebrate L34, mosquito L31, plant L34, yeast putative ribosomal protein YIL052c and archaebacterial L34e.

    Proteins where this domain is known:
    PY04861   


    PF01200 - Ribosomal_S28e (Pfam link)

    Interpro entry IPR000289 : Ribosomal protein S28e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. Examples are:

  • Mammalian S28
  • Plant S28
  • Fungi S33
  • Archaebacterial S28e.
  • These proteins have from 64 to 78 amino acids and a highly conserved C-terminal extremity region.

    Proteins where this domain is known:
    PY04142   


    PF01201 - Ribosomal_S8e (Pfam link)

    Interpro entry IPR001047 : Ribosomal protein S8e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaeal ribosomal proteins have been grouped based on sequence similarities. One of these families, S8e, consists of a number of proteins with either about 220 amino acids (in eukaryotes) or about 125 amino acids (in archaea).

    Proteins where this domain is known:
    PY00137    PY01682   


    PF01202 - SKI (Pfam link)

    Interpro entry IPR000623 : Shikimate kinase (Interpro link)

    Interpro description:

    Shikimate kinase catalyses the fifth step in the biosynthesis of aromatic amino acids from chorismate (the so-called shikimate pathway). The enzyme catalyses the following reaction:

     ATP + shikimate = ADP + shikimate-3-phosphate 

    The protein is found in bacteria (gene aroK or aroL), plants and fungi (where it is part of a multifunctional enzyme that catalyses five consecutive steps in this pathway). In 1994, the 3D structure of shikimate kinase was predicted to be very close to that of adenylate kinase, suggesting a functional similarity as well as an evolutionary relationship. This prediction has since been confirmed experimentally. The protein is reported to possess an alpha/beta fold, consisting of a central sheet of five parallel beta-strands flanked by alpha-helices. Such a topology is very similar to that of adenylate kinase.

    Proteins where this domain has been detected by our approach:
    PY00069   


    PF01207 - Dus (Pfam link)

    Interpro entry IPR001269 : tRNA-dihydrouridine synthase (Interpro link)

    Pfam description:
    Members of this family catalyse the reduction of the 5,6-double bond of a uridine residue on tRNA. Dihydrouridine modification of tRNA is widely observed in prokaryotes and eukaryotes, and also in some archae. Most dihydrouridines are found in the D loop of t-RNAs. The role of dihydrouridine in tRNA is currently unknown, but may increase conformational flexibility of the tRNA. It is likely that different family members have different substrate specificities, which may overlap. Dus 1 (Swiss:Q9HGN6) from Saccharomyces cerevisiae acts on pre-tRNA-Phe, while Dus 2 (Swiss:P53720) acts on pre-tRNA-Tyr and pre-tRNA-Leu. Dus 1 is active as a single subunit, requiring NADPH or NADH, and is stimulated by the presence of FAD. Some family members may be targeted to the mitochondria and even have a role in mitochondria.

    Interpro description:

    Members of this family catalyse the reduction of the 5,6-double bond of a uridine residue on tRNA. Dihydrouridine modification of tRNA is widely observed in prokaryotes and eukaryotes, and also in some archae. Most dihydrouridines are found in the D loop of t-RNAs. The role of dihydrouridine in tRNA is currently unknown, but may increase conformational flexibility of the tRNA. It is likely that different family members have different substrate specificities, which may overlap. Dus 1 from Saccharomyces cerevisiae (Baker's yeast) acts on pre-tRNA-Phe, while Dus 2 acts on pre-tRNA-Tyr and pre-tRNA-Leu. Dus 1 is active as a single subunit, requiring NADPH or NADH, and is stimulated by the presence of FAD. Some family members may be targeted to the mitochondria and even have a role in mitochondria.

    Proteins where this domain is known:
    PY01376    PY06801   


    PF01208 - URO-D (Pfam link)

    Interpro entry IPR000257 : Uroporphyrinogen decarboxylase (URO-D) (Interpro link)

    Interpro description:

    Uroporphyrinogen decarboxylase (URO-D), the fifth enzyme of the haem biosynthetic pathway, catalyses the sequential decarboxylation of the four acetyl side chains of uroporphyrinogen to yield coproporphyrinogen. URO-D deficiency is responsible for the human genetic diseases familial porphyria cutanea tarda (fPCT) and hepatoerythropoietic porphyria (HEP). The sequence of URO-D has been well conserved throughout evolution. The best conserved region is located in the N-terminal section; it contains a perfectly conserved hexapeptide. There are two arginine residues in this hexapeptide which could be involved in the binding, via salt bridges, to the carboxyl groups of the propionate side chains of the substrate.

    The crystal structure of human uroporphyrinogen decarboxylase shows it as comprised of a single domain containing a (beta/alpha)8-barrel with a deep active site cleft formed by loops at the C-terminal ends of the barrel strands. URO-D is a dimer in solution. Dimerisation juxtaposes the active site clefts of the monomers, suggesting a functionally important interaction between the catalytic centres.

    Proteins where this domain is known:
    PY02528   


    PF01209 - Ubie_methyltran (Pfam link)

    Interpro entry IPR004033 : UbiE/COQ5 methyltransferase (Interpro link)

    Interpro description:
    A number of methyltransferases have been shown to share regions of similarities. Apart from the ubiquinone/menaquinone biosynthesis methyltransferases (for example, the C-methyltransferase from the ubiE gene of Escherichia coli), the ubiquinone biosynthesis methyltransferases (for example, the C-methyltransferase from the COQ5 gene of Saccharomyces cerevisiae) and the menaquinone biosynthesis methyltransferases (for example, the C-methyltransferase from the MENH gene of Bacillus subtilis), this family also includes methyltransferases involved in biotin and sterol biosynthesis and in phosphatidylethanolamine methylation.

    Proteins where this domain is known:
    PY00901   


    PF01210 - NAD_Gly3P_dh_N (Pfam link)

    Interpro entry IPR011128 : NAD-dependent glycerol-3-phosphate dehydrogenase, N-terminal (Interpro link)

    Pfam description:
    NAD-dependent glycerol-3-phosphate dehydrogenase (GPDH) catalyses the interconversion of dihydroxyacetone phosphate and L-glycerol-3-phosphate. This family represents the N-terminal NAD-binding domain.

    Interpro description:
    NAD-dependent glycerol-3-phosphate dehydrogenase (GPDH) catalyses the interconversion of dihydroxyacetone phosphate and L-glycerol-3-phosphate. This family represents the N-terminal NAD-binding domain.

    Proteins where this domain is known:
    PY00789    PY05585   


    PF01214 - CK_II_beta (Pfam link)

    Interpro entry IPR000704 : Casein kinase II, regulatory subunit (Interpro link)

    Interpro description:

    Protein kinases are a group of enzymes that possess a catalytic subunit which transfers the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. The enzymes fall into two broad classes, characterised with respect to substrate specificity: serine/threonine specific and tyrosine specific.

    Protein kinase function has been evolutionarily conserved from Escherichia coli to human. Protein kinases play a role in a mulititude of cellular processes, including division, proliferation, apoptosis, and differentiation. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins.

    The catalytic subunits of protein kinases are highly conserved, and several structures have been solved, leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases.

    Casein kinase, a ubiquitous, well-conserved protein kinase involved in cell metabolism and differentiation, is characterised by its preference for Ser or Thr in acidic stretches of amino acids. The enzyme is a tetramer of 2 alpha- and 2 beta-subunits. However, some species (e.g., mammals) possess 2 related forms of the alpha-subunit (alpha and alpha'), while others (e.g., fungi) possess 2 related beta-subunits (beta and beta'). The alpha-subunit is the catalytic unit and contains regions characteristic of serine/threonine protein kinases. The beta-subunit is believed to be regulatory, possessing an N-terminal auto-phosphorylation site, an internal acidic domain, and a potential metal-binding motif. The beta subunit is a highly conserved protein of about 25 kD that contains, in its central section, a cysteine-rich motif, CX(n)C, that could be involved in binding a metal such as zinc. The mammalian beta-subunit gene promoter shares common features with those of other mammalian protein kinases and is closely related to the promoter of the regulatory subunit of cAMP-dependent protein kinase.

    Proteins where this domain is known:
    PY01577    PY01939   


    PF01217 - Clat_adaptor_s (Pfam link)

    Interpro entry IPR000804 : Clathrin adaptor, sigma subunit/coatomer, zeta subunit (Interpro link)

    Interpro description:

    Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer.

    Clathrin coats contain both clathrin and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors. All AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). Each subunit has a specific function. Adaptin subunits recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal appendage domains. By contrast, GGAs are monomers composed of four domains, which have functions similar to AP subunits: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The GAE domain is similar to the AP gamma-adaptin ear domain, being responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis.

    While clathrin mediates endocytic protein transport from ER to Golgi, coatomers (COPI, COPII) primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits.

    This entry represents the small sigma subunit of various adaptins from different AP clathrin adaptor complexes (including AP1, AP2, AP3 and AP4), and the zeta subunit of various coatomer (COP) adaptors. The small sigma subunit of AP proteins have been characterised in several species. The sigma subunit plays a role in protein sorting in the late-Golgi/trans-Golgi network (TGN) and/or endosomes. The zeta subunit of coatomers (zeta-COP) is required for coatomer binding to Golgi membranes and for coat-vesicle assembly.

    More information about these proteins can be found at Protein of the Month: Clathrin.

    Proteins where this domain is known:
    PY01700    PY01701    PY03598    PY04027    PY05216    PY06451   

    Proteins where this domain has been detected by our approach:
    PY02804    PY05839    PY06523   


    PF01218 - Coprogen_oxidas (Pfam link)

    Interpro entry IPR001260 : Coproporphyrinogen III oxidase (Interpro link)

    Interpro description:
    Coprogen oxidase (i.e. coproporphyrin III oxidase or coproporphyrinogenase) catalyses the oxidative decarboxylation of coproporphyrinogen III to proto-porhyrinogen IX in the haem and chlorophyll biosynthetic pathways. The protein is a homodimer containing two internally bound iron atoms per molecule of native protein . The enzyme is active in the presence of molecular oxygen that acts as an electron acceptor). The enzyme is widely distributed having been found in a variety of eukaryotic and prokaryotic sources.

    Proteins where this domain is known:
    PY05501   


    PF01221 - Dynein_light (Pfam link)

    Interpro entry IPR001372 : Dynein light chain, type 1 and 2 (Interpro link)

    Interpro description:

    Dynein is a multisubunit microtubule-dependent motor enzyme that acts as the force generating protein of eukaryotic cilia and flagella. The cytoplasmic isoform of dynein acts as a motor for the intracellular retrograde motility of vesicles and organelles along microtubules.

    Dynein is composed of a number of ATP-binding large subunits, intermediate size subunits and small subunits. Among the small subunits, there is a family of highly conserved proteins which make up this family.

    Both type 1 (DLC1) and 2 (DLC2) dynein light chains have a similar two-layer alpha-beta core structure consisting of beta-alpha(2)-beta-X-beta(2).

    Proteins where this domain is known:
    PY02016   


    PF01226 - Form_Nir_trans (Pfam link)

    Interpro entry IPR000292 : Formate/nitrite transporter (Interpro link)

    Interpro description:

    A number of bacterial and archaebacterial proteins involved in transporting formate or nitrite have been shown to be related:

    These transporters are proteins of about 280 residues and seem to contain six transmembrane regions.

    Proteins where this domain is known:
    PY06388   


    PF01227 - GTP_cyclohydroI (Pfam link)

    Interpro entry IPR001474 : GTP cyclohydrolase I (Interpro link)

    Pfam description:
    This family includes GTP cyclohydrolase enzymes and a family of related bacterial proteins including Swiss:Q46920.

    Interpro description:

    GTP cyclohydrolase I catalyzes the biosynthesis of formic acid and dihydroneopterin triphosphate from GTP. This reaction is the first step in the biosynthesis of tetrahydrofolate in prokaryotes, of tetrahydrobiopterin in vertebrates, and of pteridine-containing pigments in insects. The comparison of the sequence of the enzyme from bacterial and eukaryotic sources shows that the structure of this enzyme has been extremely well conserved throughout evolution.

    Proteins where this domain is known:
    PY03300   


    PF01230 - HIT (Pfam link)

    Interpro entry IPR001310 : (Interpro link)

    Interpro description:

    The Histidine Triad (HIT) motif, His-phi-His-phi-His-phi-phi (phi, a hydrophobic amino acid) was identified as being highly conserved in a variety of organisms. Crystal structure of rabbit Hint, purified as an adenosine and AMP-binding protein, showed that proteins in the HIT superfamily are conserved as nucleotide-binding proteins and that Hint homologues, which are found in all forms of life, are structurally related to Fhit homologues and GalT-related enzymes, which have more restricted phylogenetic profiles. Hint homologues including rabbit Hint and yeast Hnt1 hydrolyse adenosine 5' monophosphoramide substrates such as AMP-NH2 and AMP-lysine to AMP plus the amine product and function as positive regulators of Cdk7/Kin28 in vivo. Fhit homologues are diadenosine polyphosphate hydrolases and function as tumour suppressors in human and mouse though the tumour suppressing function of Fhit does not depend on ApppA hydrolysis. The third branch of the HIT superfamily, which includes GalT homologues, contains a related His-X-His-X-Gln motif and transfers nucleoside monophosphate moieties to phosphorylated second substrates rather than hydrolysing them.

    Proteins where this domain is known:
    PY05168    PY07476   


    PF01233 - NMT (Pfam link)

    Interpro entry IPR000903 : Myristoyl-CoA:protein N-myristoyltransferase (Interpro link)

    Pfam description:
    The N and C-terminal domains of NMT are structurally similar, each adopting an acyl-CoA N-acyltransferase-like fold.

    Interpro description:
    Myristoyl-CoA:protein N-myristoyltransferase (Nmt) is the enzyme responsible for transferring a myristate group on the N-terminal glycine of a number of cellular eukaryotics and viral proteins. Nmt is a monomeric protein of about 50 to 60 kD whose sequence appears to be well conserved.

    Proteins where this domain is known:
    PY01548   


    PF01237 - Oxysterol_BP (Pfam link)

    Interpro entry IPR000648 : Oxysterol-binding protein (Interpro link)

    Interpro description:
    A number of eukaryotic proteins that seem to be involved with sterol synthesis and/or its regulation have been found to be evolutionary related. These include mammalian oxysterol-binding protein (OSBP), a protein of about 800 amino-acid residues that binds a variety of oxysterols (oxygenated derivatives of cholesterol); yeast OSH1, a protein of 859 residues that also plays a role in ergosterol synthesis; yeast proteins HES1 and KES1, highly related proteins of 434 residues that seem to play a role in ergosterol synthesis; and yeast hypothetical proteins YHR001w, YHR073w and YKR003w.

    Proteins where this domain is known:
    PY04188   


    PF01238 - PMI_typeI (Pfam link)

    Interpro entry IPR001250 : Mannose-6-phosphate isomerase, type I (Interpro link)

    Pfam description:
    This is a family of Phosphomannose isomerase type I enzymes (EC 5.3.1.8).

    Interpro description:

    Mannose-6-phosphate isomerase or phosphomannose isomerase (PMI) is the enzyme that catalyses the interconversion of mannose-6-phosphate and fructose-6-phosphate. In eukaryotes PMI is involved in the synthesis of GDP-mannose, a constituent of N- and O-linked glycans and GPI anchors and in prokaryotes it participates in a variety of pathways, including capsular polysaccharide biosynthesis and D-mannose metabolism. PMI's belong to the cupin superfamily whose functions range from isomerase and epimerase activities involved in the modification of cell wall carbohydrates in bacteria and plants, to non-enzymatic storage proteins in plant seeds, and transcription factors linked to congenital baldness in mammals. Three classes of PMI have been defined.

    Type I includes eukaryotic PMI and the enzyme encoded by the manA gene in enterobacteria. PMI has a bound zinc ion, which is essential for activity.

    A crystal structure of PMI from Candida albicans shows that the enzyme has three distinct domains. The active site lies in the central domain, contains a single essential zinc atom, and forms a deep, open cavity of suitable dimensions to contain M6P or F6P The central domain is flanked by a helical domain on one side and a jelly-roll like domain on the other.

    Proteins where this domain is known:
    PY03463   


    PF01239 - PPTA (Pfam link)

    Interpro entry IPR002088 : Protein prenyltransferase, alpha subunit (Interpro link)

    Pfam description:
    Both farnesyltransferase (FT) and geranylgeranyltransferase 1 (GGT1) recognise a CaaX motif on their substrates where \'a\' stands for preferably aliphatic residues, whereas GGT2 recognises a completely different motif. Important substrates for FT include, amongst others, many members of the Ras superfamily. GGT1 substrates include some of the other small GTPases and GGT2 substrates include the Rab family.

    Interpro description:

    Protein prenylation is the posttranslational attachment of either a farnesyl group or a geranylgeranyl group via a thioether linkage (-C-S-C-) to a cysteine at or near the carboxyl terminus of the protein. Farnesyl and geranylgeranyl groups are polyisoprenes, unsaturated hydrocarbons with a multiple of five carbons; the chain is 15 carbons long in the farnesyl moiety and 20 carbons long in the geranylgeranyl moiety. There are three different protein prenyltransferases in humans: farnesyltransferase (FT) and geranylgeranyltransferase 1 (GGT1) share the same motif (the CaaX box) around the cysteine in their substrates, and are thus called CaaX prenyltransferases, whereas geranylgeranyltransferase 2 (GGT2, also called Rab geranylgeranyltransferase) recognises a different motif and is thus called a non-CaaX prenyltransferase. Protein prenyltransferases are currently known only in eukaryotes, but they are widespread, being found in vertebrates, insects, nematodes, plants, fungi and protozoa, including several parasites.

    Each protein consists of two subunits, alpha and beta; the alpha subunit of FT and GGT1 is encoded by the same gene, FNTA. The alpha subunit is thought to participate in a stable complex with the isoprenyl substrate; the beta subunit binds the peptide substrate. In the alpha subunits of both types of protein prenyltransferases, seven tetratricopeptide repeats are formed by pairs of helices that are stabilized by conserved intercalating residues. The alpha subunits of GGT2 in mammals and plants also have an immunoglobulin-like domain between the fifth and sixth tetratricopeptide repeat, as well as leucine-rich repeats at the carboxyl terminus. The functions of these additional domains in GGT2 are as yet undefined, but they are apparently not directly involved in the interaction with substrates and Rab escort proteins. The tetratricopeptide repeats of the alpha subunit form a right-handed superhelix, which embraces the (alpha-alpha)6 barrel of the beta subunit.

    Proteins where this domain is known:
    PY02453    PY04103   


    PF01245 - Ribosomal_L19 (Pfam link)

    Interpro entry IPR001857 : Ribosomal protein L19 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L19 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L19 is known to be located at the 30S-50S ribosomal subunit interface and may play a role in the structure and function of the aminoacyl-tRNA binding site. It belongs to a family of ribosomal proteins, including L19 from bacteria and the chloroplasts of red algae.

    L19 is a protein of 120 to 130 amino-acid residues.

    Proteins where this domain is known:
    PY04391   


    PF01246 - Ribosomal_L24e (Pfam link)

    Interpro entry IPR000988 : Ribosomal protein L24e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaeabacterial ribosomal proteins can be grouped on the basis of sequence similarities. One of these families consists of mammalian ribosomal protein L24; yeast ribosomal protein L30A/B (Rp29) (YL21); Kluyveromyces lactis ribosomal protein L30; Arabidopsis thaliana ribosomal protein L24 homolog; Haloarcula marismortui ribosomal protein HL21/HL22; and Methanocaldococcus jannaschii (Methanococcus jannaschii) MJ1201. These proteins have 60 to 160 amino-acid residues.

    Proteins where this domain is known:
    PY00711    PY06973   


    PF01247 - Ribosomal_L35Ae (Pfam link)

    Interpro entry IPR001780 : Ribosomal protein L35Ae (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. One of these families consists of:

    These proteins have 87 to 110 amino-acid residues.

    Proteins where this domain is known:
    PY05736   


    PF01248 - Ribosomal_L7Ae (Pfam link)

    Interpro entry IPR004038 : (Interpro link)

    Pfam description:
    This family includes: Ribosomal L7A from metazoa, Ribosomal L8-A and L8-B from fungi, 30S ribosomal protein HS6 from archaebacteria, 40S ribosomal protein S12 from eukaryotes, Ribosomal protein L30 from eukaryotes and archaebacteria. Gadd45 and MyD118.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This family includes: Ribosomal L7A from metazoa, Ribosomal L8-A and L8-B from fungi, 30S ribosomal protein HS6 from archaebacteria, 40S ribosomal protein S12 from eukaryotes, ribosomal protein L30 from eukaryotes and archaebacteria, Gadd45 and MyD118.

    Proteins where this domain is known:
    PY02904    PY03136    PY05200    PY05755    PY06460   


    PF01249 - Ribosomal_S21e (Pfam link)

    Interpro entry IPR001931 : Ribosomal protein S21e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. These proteins have 82 to 87 amino acids. The amino termini are all N alpha-acetylated. The N-terminal halves of the protein molecules are highly conserved in contrast to the carboxy-terminal parts.

    Proteins where this domain is known:
    PY06953   


    PF01250 - Ribosomal_S6 (Pfam link)

    Interpro entry IPR000529 : Ribosomal protein S6 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein S6 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S6 is known to bind together with S18 to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups bacterial, red algal chloroplast and cyanelle S6 ribosomal proteins.

    Proteins where this domain is known:
    PY03622   


    PF01251 - Ribosomal_S7e (Pfam link)

    Interpro entry IPR000554 : Ribosomal protein S7e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families consists of Xenopus S8, and mammalian, insect and yeast S7. These proteins have about 200 amino acids.

    Proteins where this domain is known:
    PY03900   


    PF01253 - SUI1 (Pfam link)

    Interpro entry IPR001950 : Translation initiation factor SUI1 (Interpro link)

    Interpro description:
    In Saccharomyces cerevisiae (Baker's yeast), SUI1 is a translation initiation factor that functions in concert with eIF-2 and the initiator tRNA-Met in directing the ribosome to the proper start site of translation. SUI1 is a protein of 108 residues. Close homologs of SUI1 have been found in mammals, insects and plants. SUI1 is also evolutionary related to hypothetical proteins from Escherichia coli (yciH), Haemophilus influenzae (HI1225) and Methanococcus vannielii.

    Proteins where this domain is known:
    PY01209    PY04116   


    PF01255 - Prenyltransf (Pfam link)

    Interpro entry IPR001441 : Di-trans-poly-cis-decaprenylcistransferase-like (Interpro link)

    Pfam description:
    Previously known as uncharacterized protein family UPF0015, a single member of this family Swiss:O82827 has been identified as an undecaprenyl diphosphate synthase.

    Interpro description:

    Synonym(s): Di-trans-poly-cis-undecaprenyl-diphosphate synthase, Undecaprenyl pyrophosphate synthetase, Undecaprenyl pyrophosphate synthase, UPP synthetase

    Di-trans-poly-cis-decaprenylcistransferase (UPP synthetase) generates undecaprenyl pyrophosphate (UPP) from isopentenyl pyrophosphate (IPP). This bacterial enzyme is also found in archaebacteria and in a number of uncharacterised proteins including some from yeasts.

    This entry also matches related enzymes that transfer alkyl groups, such as dehydrodolichyl diphosphate synthase.

    Proteins where this domain is known:
    PY05574    PY05575   


    PF01256 - Carb_kinase (Pfam link)

    Interpro entry IPR000631 : (Interpro link)

    Pfam description:
    This family is related to Pfam:PF02110 and Pfam:PF00294 implying that it also is a carbohydrate kinase. (personal obs Yeats C).

    Interpro description:

    This family is related to Hydroxyethylthiazole kinaseand PfkB carbohydrate kinaseimplying that it also a carbohydrate kinase.

    Several uncharacterised proteins have been shown to share regions of similarities, including yeast chromosome XI hypothetical protein YKL151c; Caenorhabditis elegans hypothetical protein R107.2; Escherichia coli hypothetical protein yjeF; Bacillus subtilis hypothetical protein yxkO; Helicobacter pylori hypothetical protein HP1363; Mycobacterium tuberculosis hypothetical protein MtCY77.05c; Mycobacterium leprae hypothetical protein B229_C2_201; Synechocystis sp. (strain PCC 6803) hypothetical protein sll1433; and Methanocaldococcus jannaschii (Methanococcus jannaschii) hypothetical protein MJ1586. These are proteins of about 30 to 40 kDa whose central region is well conserved.

    Proteins where this domain is known:
    PY02608   


    PF01261 - AP_endonuc_2 (Pfam link)

    Interpro entry IPR012307 : (Interpro link)

    Pfam description:
    This TIM alpha/beta barrel structure is found in xylose isomerase (Swiss:P19148) and in endonuclease IV (Swiss:P12638, EC:3.1.21.2). This domain is also found in the N termini of bacterial myo-inositol catabolism proteins. These are involved in the myo-inositol catabolism pathway, and is required for growth on myo-inositol in Rhizobium leguminosarum bv. viciae.

    Interpro description:

    This TIM alpha/beta barrel structure is found in xylose isomerase and in endonuclease IV. This domain is also found in the N termini of bacterial myo-inositol catabolism proteins. These are involved in the myo-inositol catabolism pathway, and is required for growth on myo-inositol in Rhizobium leguminosarum bv. viciae.

    Proteins where this domain is known:
    PY05725   


    PF01262 - AlaDh_PNT_C (Pfam link)

    Interpro entry IPR007698 : Alanine dehydrogenase/PNT, C-terminal (Interpro link)

    Pfam description:
    This family now also contains the lysine 2-oxoglutarate reductases.

    Interpro description:

    Alanine dehydrogenases and pyridine nucleotide transhydrogenase have been shown to share regions of similarity. Alanine dehydrogenase catalyzes the NAD-dependent reversible reductive amination of pyruvate into alanine. Pyridine nucleotide transhydrogenase catalyzes the reduction of NADP+ to NADPH with the concomitant oxidation of NADH to NAD+. This enzyme is located in the plasma membrane of prokaryotes and in the inner membrane of the mitochondria of eukaryotes. The transhydrogenation between NADH and NADP is coupled with the translocation of a proton across the membrane. In prokaryotes the enzyme is composed of two different subunits, an alpha chain (gene pntA) and a beta chain (gene pntB), while in eukaryotes it is a single chain protein. The sequence of alanine dehydrogenase from several bacterial species are related with those of the alpha subunit of bacterial pyridine nucleotide transhydrogenase and of the N-terminal half of the eukaryotic enzyme. The two most conserved regions correspond respectively to the N-terminal extremity of these proteins and to a central glycine-rich region which is part of the NAD(H)-binding site.

    This is a C-terminal domain of alanine dehydrogenases. This domain is also found in the lysine 2-oxoglutarate reductases.

    Proteins where this domain is known:
    PY05907   


    PF01264 - Chorismate_synt (Pfam link)

    Interpro entry IPR000453 : Chorismate synthase (Interpro link)

    Interpro description:
    Chorismate synthase catalyzes the last of the seven steps in the shikimate pathway which is used in prokaryotes, fungi and plants for the biosynthesis of aromatic amino acids. It catalyzes the 1,4-trans elimination of the phosphate group from 5-enolpyruvylshikimate-3-phosphate (EPSP) to form chorismate which can then be used in phenylalanine, tyrosine or tryptophan biosynthesis. Chorismate synthase requires the presence of a reduced flavin mononucleotide (FMNH2 or FADH2) for its activity. Chorismate synthase from various sources shows a high degree of sequence conservation. It is a protein of about 360 to 400 amino-acid residues.

    Proteins where this domain is known:
    PY04071   


    PF01265 - Cyto_heme_lyase (Pfam link)

    Interpro entry IPR000511 : Cytochrome c and c1 haem-lyase (Interpro link)

    Interpro description:
    Cytochrome c haem-lyase (CCHL) and cytochrome Cc1 haem-lyase (CC1HL) are mitochondrial enzymes that catalyse the covalent attachment of a haem group on two cysteine residues of cytochrome c and c1. These two enzymes are functionally and evolutionary related. There are two conserved regions, the first is located in the central section and the second in the C-terminal section. Both patterns contain conserved histidine, tryptophan and acidic residues which could be important for the interaction of the enzymes with the apoproteins and/or the haem group.

    Proteins where this domain is known:
    PY05914   


    PF01266 - DAO (Pfam link)

    Interpro entry IPR006076 : FAD dependent oxidoreductase (Interpro link)

    Pfam description:
    This family includes various FAD dependent oxidoreductases: Glycerol-3-phosphate dehydrogenase EC:1.1.99.5, Sarcosine oxidase beta subunit EC:1.5.3.1, D-alanine oxidase EC:1.4.99.1, D-aspartate oxidase EC:1.4.3.1.

    Interpro description:
    This entry includes various FAD dependent oxidoreductases: Glycerol-3-phosphate dehydrogenase, Sarcosine oxidase beta subunit, D-alanine oxidase, D-aspartate oxidase.

    D-amino acid oxidase (DAMOX or DAO) is an FAD flavoenzyme that catalyzes the oxidation of neutral and basic D-amino acids into their corresponding keto acids. DAOs have been characterised and sequenced in fungi and vertebrates where they are known to be located in the peroxisomes. D-aspartate oxidase (DASOX) is an enzyme, structurally related to DAO, which catalyzes the same reaction but is active only toward dicarboxylic D-amino acids. In DAO, a conserved histidine has been shown to be important for the enzyme's catalytic activity.

    Proteins where this domain is known:
    PY05233    PY05303   


    PF01269 - Fibrillarin (Pfam link)

    Interpro entry IPR000692 : Fibrillarin (Interpro link)

    Interpro description:
    Fibrillarin is a component of a nucleolar small nuclear ribonucleoprotein (SnRNP), functioning in vivo in ribosomal RNA processing. It is associated with U3, U8 and U13 small nuclear RNAs in mammals and is similar to the yeast NOP1 protein. Fibrillarin has a well conserved sequence of around 320 amino acids, and contains 3 domains, an N-terminal Gly/Arg-rich region; a central domain resembling other RNA-binding proteins and containing an RNP-2-like consensus sequence; and a C-terminal alpha-helical domain. An evolutionarily related pre-rRNA processing protein, which lacks the Gly/Arg-rich domain, has been found in various archaebacteria.

    Proteins where this domain is known:
    PY01071   


    PF01276 - OKR_DC_1 (Pfam link)

    Interpro entry IPR000310 : Orn/Lys/Arg decarboxylase, major region (Interpro link)

    Interpro description:
    Pyridoxal-dependent decarboxylases are bacterial proteins acting on ornithine, lysine, arginine and related substrates. One of the regions of sequence similarity contains a conserved lysine residue, which is the site of attachment of the pyridoxal-phosphate group.

    Proteins where this domain is known:
    PY00349   


    PF01280 - Ribosomal_L19e (Pfam link)

    Interpro entry IPR000196 : Ribosomal protein L19/L19e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This entry represents the ribosomal protein L19 from eukaryotes, as well as L19e from archaea. L19/L19e is absent in bacteria. L19/L19e is part of the large ribosomal subunit, whose structure has been determined in a number of eukaryotic and archaeal species. L19/L19e is a multi-helical protein consisting of two different 3-helical domains connected by a long, partly helical linker.

    Proteins where this domain is known:
    PY05486   


    PF01281 - Ribosomal_L9_N (Pfam link)

    Interpro entry IPR000244 : Ribosomal protein L9 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L9 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L9 is known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins grouped on the basis of sequence similarities.

    The crystal structure of Bacillus stearothermophilus L9 shows the 149-residue protein comprises two globular domains connected by a rigid linker. Each domain contains an rRNA binding site, and the protein functions as a structural protein in the large subunit of the ribosome. The C-terminal domain consists of two loops, an alpha-helix and a three-stranded mixed parallel, anti-parallel beta-sheet packed against the central alpha-helix. The long central alpha-helix is exposed to solvent in the middle and participates in the hydrophobic cores of the two domains at both ends.

    Proteins where this domain is known:
    PY05802   


    PF01282 - Ribosomal_S24e (Pfam link)

    Interpro entry IPR001976 : Ribosomal protein S24e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This family contains the S24e ribosomal proteins from eukaryotes and archaebacteria. These proteins have 101 to 148 amino acids.

    Proteins where this domain is known:
    PY06213   


    PF01283 - Ribosomal_S26e (Pfam link)

    Interpro entry IPR000892 : Ribosomal protein S26e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families, the S26E family, includes mammalian S26; Octopus S26; Drosophila S26 (DS31); plant cytoplasmic S26; and fungal S26. These proteins have 114 to 127 amino acids.

    Proteins where this domain is known:
    PY04712   


    PF01287 - eIF-5a (Pfam link)

    Interpro entry IPR001884 : Eukaryotic initiation factor 5A hypusine (eIF-5A) (Interpro link)

    Interpro description:

    Translation initiation factor 5A (IF-5A) is reported to be involved in the first step of peptide bond formation in translation, to be involved in cell-cycle regulation and to be a cofactor for the Rev and Rex transactivator proteins of human immunodeficiency virus-1 and T-cell leukaemia virus I, respectively. IF-5A contains an unusual amino acid, hypusine N-epsilon-(4-aminobutyl-2-hydroxy)lysine), that is required for its function. The first step in the post-translational modification of lysine to hypusine is catalyzed by the enzyme deoxyhypusine synthase, the structure of which has been reported.

    The crystal structure of IF-5A from the archaeon Pyrobaculum aerophilum has been determined to 1.75 A. Unmodified P. aerophilum IF-5A is found to be a beta structure with two domains and three separate hydrophobic cores. The lysine (Lys42) that is post-translationally modified by deoxyhypusine synthase is found at one end of the IF-5A molecule in a turn between beta strands beta4 and beta5; this lysine residue is freely solvent accessible. The C-terminal domain is found to be homologous to the cold-shock protein CspA of E. coli, which has a well characterised RNA-binding fold, suggesting that IF-5A is involved in RNA binding.

    Proteins where this domain is known:
    PY00983   


    PF01293 - PEPCK_ATP (Pfam link)

    Interpro entry IPR001272 : Phosphoenolpyruvate carboxykinase, ATP-utilising (Interpro link)

    Interpro description:

    Phosphoenolpyruvate carboxykinase (PEPCK) catalyses the first committed (rate-limiting) step in hepatic gluconeogenesis, namely the reversible decarboxylation of oxaloacetate to phosphoenolpyruvate (PEP) and carbon dioxide, using either ATP or GTP as a source of phosphate. The ATP-utilising and GTP-utilising enzymes form two divergent subfamilies, which have little sequence similarity but which retain conserved active site residues. ATP-utilising PEPCKs are monomers or oligomers of identical subunits found in certain bacteria, yeast, trypanosomatids, and plants, while GTP-utilising PEPCKs are mainly monomers found in animals and some bacteria. Both require divalent cations for activity, such as magnesium or manganese. One cation interacts with the enzyme at metal binding site 1 to elicit activation, while the second cation interacts at metal binding site 2 to serve as a metal-nucleotide substrate. In bacteria, fungi and plants, PEPCK is involved in the glyoxylate bypass, an alternative to the tricarboxylic acid cycle.

    PEPCK helps to regulate blood glucose levels. The rate of gluconeogenesis can be controlled through transcriptional regulation of the PEPCK gene by cAMP (the mediator of glucagon and catecholamines), glucocorticoids and insulin. In general, PEPCK expression is induced by glucagon, catecholamines and glucocorticoids during periods of fasting and in response to stress, but is inhibited by (glucose-induced) insulin upon feeding. With type II diabetes, this regulation system can fail, resulting in increased gluconeogenesis that in turn raises glucose levels.

    PEPCK consists of an N-terminal and a catalytic C-terminal domain, with the active site and metal ions located in a cleft between them. Both domains have an alpha/beta topology that is partly similar to one another. Substrate binding causes PEPCK to undergo a conformational change, which accelerates catalysis by forcing bulk solvent molecules out of the active site. PCK uses an alpha/beta/alpha motif for nucleotide binding, this motif differing from other kinase domains. GTP-utilising PEPCK has a PEP-binding domain and two kinase motifs to bind GTP and magnesium.

    This entry represents ATP-utilising phosphoenolpyruvate carboxykinase enzymes.

    Proteins where this domain is known:
    PY01233   


    PF01294 - Ribosomal_L13e (Pfam link)

    Interpro entry IPR001380 : Ribosomal protein L13e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    The ribosomal protein L13e is widely found in vertebrates, Drosophila melanogaster, plants, yeast and others.

    Proteins where this domain is known:
    PY06465   


    PF01300 - Sua5_yciO_yrdC (Pfam link)

    Interpro entry IPR006070 : (Interpro link)

    Pfam description:
    This domain has been shown to preferentially bind to dsRNA. The domain is found in SUA5 Swiss:P32579 as well as HypF and YrdC Swiss:P45748.

    Interpro description:

    The YrdC family of hypothetical proteins are widely distributed in eukaryotes and prokaryotes and occur as: (i) independent proteins, (ii) with C-terminal extensions, and (iii) as domains in larger proteins, some of which are implicated in regulation. The YrdC protein, which consists solely of this domain, forms an alpha/beta twisted open-sheet structure composed of seven alpha helices and seven beta strands. YrdC from Escherichia coli preferentially binds to double-stranded RNA and DNA. YrdC is predicted to be an rRNA maturation factor, as deletions in its gene lead to immature ribosomal 30S subunits and, consequently, fewer translating ribosomes. Therefore, YrdC may function by keeping an rRNA structure needed for proper processing of 16S rRNA, especially at lower temperatures. Sua5 is an example of a multi-domain protein that contains an N-terminal YrdC-like domain and a C-terminal Sua5 domain. Sua5 was identified in Saccharomyces cerevisiae (Baker's yeast) as a suppressor of a translation initiation defect in the cytochrome c gene and is required for normal growth in yeast; however its exact function remains unknown. HypF is involved in the synthesis of the active site of [NiFe]-hydrogenases.

    Proteins where this domain is known:
    PY05913   


    PF01302 - CAP_GLY (Pfam link)

    Interpro entry IPR000938 : (Interpro link)

    Pfam description:
    Cytoskeleton-associated proteins (CAPs) are involved in the organisation of microtubules and transportation of vesicles and organelles along the cytoskeletal network. A conserved motif, CAP-Gly, has been identified in a number of CAPs, including CLIP-170 and dynactins. The crystal structure of Caenorhabditis elegans F53F4.3 protein Swiss:Q20728 CAP-Gly domain was recently solved. The domain contains three beta-strands. The most conserved sequence, GKNDG, is located in two consecutive sharp turns on the surface, forming the entrance to a groove.

    Interpro description:

    Cytoskeleton-associated proteins (CAP) are made of three distinct parts, an N-terminal section that is most probably globular and contains the CAP-Gly domain, a large central region predicted to be in an alpha-helical coiled-coil conformation and, finally, a short C-terminal globular domain. The CAP-Gly domain is a conserved, glycine-rich domain of about 42 residues found in some CAPs. Proteins known to contain this domain include restin (also known as cytoplasmic linker protein-170 or CLIP-170), a 160 kDa protein associated with intermediate filaments and that links endocytic vesicles to microtubules; vertebrate dynactin (150 kDa dynein-associated polypeptide; DAP) and Drosophila glued, a major component of activator I; yeast protein BIK1, which seems to be required for the formation or stabilisation of microtubules during mitosis and for spindle pole body fusion during conjugation; yeast protein NIP100 (NIP80); human protein CKAP1/TFCB; Schizosaccharomyces pombe protein alp11 and Caenorhabditis elegans hypothetical protein F53F4.3. The latter proteins contain a N-terminal ubiquitin domain and a C-terminal CAP-Gly domain.

    The crystal structure of the CAP-Gly domain of C. elegans F53F4.3 protein, solved by single wavelength sulphur-anomalous phasing, revealed a novel protein fold containing three beta-sheets. The most conserved sequence, GKNDG, is located in two consecutive sharp turns on the surface, forming the entrance to a groove. Residues in the groove are highly conserved as measured from the information content of the aligned sequences. The C-terminal tail of another molecule in the crystal is bound in this groove.

    Proteins where this domain is known:
    PY00136    PY05144   


    PF01305 - Ribosomal_L15 (Pfam link)

    Interpro entry IPR001196 : Ribosomal protein L15 (Interpro link)

    Pfam description:
    This family is always associated with Pfam:PF00256. This family is diagnostic of ribosomal L15 proteins.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    L15 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L15 is known to bind the 23S rRNA. Ribosomal protein, L15 from bacteria and plant chloroplasts (nuclear-encoded) belong to this family. Vertebrate L27a, Tetrahymena thermophila L29 and fungal L27a (L29, CRP-1, CYH2) also are members of this group.

    Ribosomal L18E protein from a number of archebacteria show homology to both the eukaryotic L18 and eubacterial ribosomal protein L15, an observation which has been seen to substantiate the belief that archaea represent an evolutionary stage between bacteria and eukaryotes.

    Proteins where this domain is known:
    PY00688   


    PF01321 - Creatinase_N (Pfam link)

    Interpro entry IPR000587 : Creatinase (Interpro link)

    Pfam description:
    This family includes the N-terminal non-catalytic domains from creatinase and prolidase. The exact function of this domain is uncertain.

    Interpro description:

    Creatinase or creatine amidinohydrolase catalyses the conversion of creatine and water to sarcosine and urea. The enzyme works as a homodimer, and is induced by choline chloride. Each monomer of creatinase has two clearly defined domains, a small N-terminal domain, and a large C-terminal domain.

    The structure of the C-terminal region represents the "pita-bread" fold. The fold contains both alpha helices and an anti-parallel beta sheet within two structurally similar domains that are thought to be derived from an ancient gene duplication. The active site, where conserved, is located between the two domains. The fold is common to methionine aminopeptidase, aminopeptidase P, prolidase, agropine synthase and creatinase . Though many of these peptidases require a divalent cation, creatinase is not a metal-dependent enzyme.

    Proteins where this domain is known:
    PY00855   


    PF01327 - Pep_deformylase (Pfam link)

    Interpro entry IPR000181 : Formylmethionine deformylase (Interpro link)

    Interpro description:

    Peptide deformylase (PDF) is an essential metalloenzyme required for the removal of the formyl group at the N-terminus of nascent polypeptide chains in eubacteria The enzyme acts as a monomer and binds a single zinc ion, catalysing the reaction::

     N-formyl-L-methionine + H2O = formate + methionyl peptide 
    Catalytic efficiency strongly depends on the identity of the bound metal.

    The structure of these enzymes is known. PDF, a member of the zinc metalloproteases family, comprises an active core domain of 147 residues and a C-terminal tail of 21 residue. The 3D fold of the catalytic core has been determined by X-ray crystallography and NMR. Overall, the structure contains a series of anti-parallel beta- strands that surround two perpendicular alpha-helices. The C-terminal helix contains the characteristic HEXXH motif of metalloenzymes, which is crucial for activity. The helical arrangement, and the way the histidine residues bind the zinc ion, is reminiscent of other metalloproteases, such as thermolysin or metzincins. However, the arrangement of secondary and tertiary structures of PDF, and the positioning of its third zinc ligand (a cysteine residue), are quite different. These discrepancies, together with notable biochemical differences, suggest that PDF constitutes a new class of zinc-metalloproteases. .

    Proteins where this domain is known:
    PY01738   


    PF01331 - mRNA_cap_enzyme (Pfam link)

    Interpro entry IPR001339 : mRNA capping enzyme (Interpro link)

    Pfam description:
    This family represents the ATP binding catalytic domain of the mRNA capping enzyme.

    Interpro description:

    The mRNA capping enzyme in yeasts is composed of two separate chains, alpha a mRNA guanyltransferase and beta an RNA 5'-triphosphate. X-ray crystallography reveals a large conformational change during guanyl transfer by mRNA capping enzymes. Binding of the enzyme to nucleotides is specific to the GMP moiety of GTP. The viral mRNA capping enzyme is a monomer that transfers a GMP cap onto the end of mRNA that terminates with a 5'-diphosphate tail.

    Proteins where this domain is known:
    PY05095   


    PF01336 - tRNA_anti (Pfam link)

    Interpro entry IPR004365 : Nucleic acid binding, OB-fold, tRNA/helicase-type (Interpro link)

    Pfam description:
    This family contains OB-fold domains that bind to nucleic acids. The family includes the anti-codon binding domain of lysyl, aspartyl, and asparaginyl -tRNA synthetases (See Pfam:PF00152). Aminoacyl-tRNA synthetases catalyse the addition of an amino acid to the appropriate tRNA molecule EC:6.1.1.-. This family also includes part of RecG helicase involved in DNA repair. Replication factor A is a heterotrimeric complex, that contains a subunit in this family. This domain is also found at the C-terminus of bacterial DNA polymerase III alpha chain.

    Interpro description:

    The OB-fold (oligonucleotide/oligosaccharide-binding fold) is found in all three kingdoms and its common architecture presents a binding face that has adapted to bind different ligands. The OB-fold is a five/six-stranded closed beta-barrel formed by 70-80 amino acid residues. The strands are connected by loops of varying length which form the functional appendages of the protein. The majority of OB-fold proteins use the same face for ligand binding or as an active site. Different OB-fold proteins use this 'fold-related binding face' to, variously, bind oligosaccharides, oligonucleotides, proteins, metal ions and catalytic substrates.

    This entry contains OB-fold domains that bind to nucleic acids. It includes the anti-codon binding domain of lysyl, aspartyl, and asparaginyl-tRNA synthetases (See. Aminoacyl-tRNA synthetases catalyse the addition of an amino acid to the appropriate tRNA molecule This domain is found in RecG helicase involved in DNA repair. Replication factor A is a heterotrimeric complex, that contains a subunit in this family. This domain is also found at the C terminus of bacterial DNA polymerase III alpha chain.

    Proteins where this domain is known:
    PY00067    PY00115    PY01511    PY01996    PY02504    PY03253    PY04090    PY04594    PY05639    PY05658    PY07197   


    PF01344 - Kelch_1 (Pfam link)

    Interpro entry IPR006652 : (Interpro link)

    Pfam description:
    The kelch motif was initially discovered in Kelch (Swiss:Q04652). In this protein there are six copies of the motif. It has been shown that Swiss:Q04652 is related to Galactose Oxidase for which a structure has been solved. The kelch motif forms a beta sheet. Several of these sheets associate to form a beta propeller structure as found in Pfam:PF00064, Pfam:PF00400 and Pfam:PF00415.

    Interpro description:

    Kelch is a 50-residue motif, named after the Drosophila mutant in which it was first identified. This sequence motif represents one beta-sheet blade, and several of these repeats can associate to form a beta-propeller. For instance, the motif appears 6 times in Drosophila egg-chamber regulatory protein, creating a 6-bladed beta-propeller. The motif is also found in mouse protein MIPP and in a number of poxviruses. In addition, kelch repeats have been recognised in alpha- and beta-scruin, and in galactose oxidase from the fungus Dactylium dendroides. The structure of galactose oxidase reveals that the repeated sequence corresponds to a 4-stranded anti-parallel beta-sheet motif that forms the repeat unit in a super-barrel structural fold.

    The known functions of kelch-containing proteins are diverse: scruin is an actin cross-linking protein; galactose oxidase catalyses the oxidation of the hydroxyl group at the C6 position in D-galactose; neuraminidase hydrolyses sialic acid residues from glycoproteins; and kelch may have a cytoskeletal function, as it is localised to the actin-rich ring canals that connect the 15 nurse cells to the developing oocyte in Drosophila. Nevertheless, based on the location of the kelch pattern in the catalytic unit in galactose oxidase, functionally important residues have been predicted in glyoxal oxidase.

    This entry represents a type of kelch sequence motif that comprises one beta-sheet blade.

    Proteins where this domain is known:
    PY00078    PY00272    PY00275    PY01661    PY01757    PY02946    PY05008    PY05921    PY06605    PY07399    PY07654   

    Proteins where this domain has been detected by our approach:
    PY01412    PY02289   


    PF01351 - RNase_HII (Pfam link)

    Interpro entry IPR001352 : Ribonuclease HII/HIII (Interpro link)

    Interpro description:

    Ribonuclease HII is involved in the degradation of the ribonucleotide moiety on RNA-DNA hybrid molecules carrying out endonucleolytic cleavage to 5'-phospo-monoester. Proteins which belong to this family have been found in bacteria, archaea, and yeasts. This family also includes Ribonuclease HIII.

    Proteins where this domain is known:
    PY06561   


    PF01363 - FYVE (Pfam link)

    Interpro entry IPR000306 : (Interpro link)

    Pfam description:
    The FYVE zinc finger is named after four proteins that it has been found in: Fab1, YOTB/ZK632.12, Vac1, and EEA1. The FYVE finger has been shown to bind two Zn++ ions. The FYVE finger has eight potential zinc coordinating cysteine positions. Many members of this family also include two histidines in a motif R+HHC+XCG, where + represents a charged residue and X any residue. We have included members which do not conserve these histidine residues but are clearly related.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    The FYVE zinc finger is named after four proteins that it has been found in: Fab1, YOTB/ZK632.12, Vac1, and EEA1. The FYVE finger has been shown to bind two zinc ions. The FYVE finger has eight potential zinc coordinating cysteine positions. Many members of this family also include two histidines in a motif R+HHC+XCG, where + represents a charged residue and X any residue. FYVE-type domains are divided into two known classes: FYVE domains that specifically bind to phosphatidylinositol 3-phosphate in lipid bilayers and FYVE-related domains of undetermined function. Those that bind to phosphatidylinositol 3-phosphate are often found in proteins targeted to lipid membranes that are involved in regulating membrane traffic. Most FYVE domains target proteins to endosomes by binding specifically to phosphatidylinositol-3-phosphate at the membrane surface. By contrast, the CARP2 FYVE-like domain is not optimized to bind to phosphoinositides or insert into lipid bilayers. FYVE domains are distinguished from other zinc fingers by three signature sequences: an N-terminal WxxD motif, a basic R(R/K)HHCR patch, and a C-terminal RVC motif.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY01916   

    Proteins where this domain has been detected by our approach:
    PY05522   


    PF01367 - 5_3_exonuc (Pfam link)

    Interpro entry IPR002421 : 5'-3' exonuclease (Interpro link)

    Interpro description:

    The N-terminal and internal 5'3'-exonuclease domains are commonly found together, and are most often associated with 5' to 3' nuclease activities. The XPG protein signatures are never found outside the '53EXO' domains. The latter are found in more diverse proteins. The number of amino acids that separate the two 53EXO domains, and the presence of accompanying motifs allow the diagnosis of several protein families.

    In the eubacterial type A DNA-polymerases, the N-terminal and internal domains are separated by a few amino acids, usually four. The pattern DNA_POLYMERASE_A is always present towards the C-terminus. Several eukaryotic structure-dependent endonucleases and exonucleases have the 53EXO domains separated by 24 to 27 amino acids, and the XPG protein signatures are always present. In several proteins from herpesviridae, the two 53EXO domains are separated by 50 to 120 amino acids. These proteins are implicated in the inhibition of the expression of the host genes. Eukaryotic DNA repair proteins with 600 to 700 amino acids between the 53_EXO domains all carry the XPG protein signatures.

    Proteins where this domain is known:
    PY01683    PY02238   


    PF01369 - Sec7 (Pfam link)

    Interpro entry IPR000904 : SEC7-like (Interpro link)

    Pfam description:
    The Sec7 domain is a guanine-nucleotide-exchange-factor (GEF) for the Pfam:PF00025 family.

    Interpro description:
    The SEC7 domain was named after the first protein found to contain such a region. It has been shown to be linked with guanine nucleotide exchange function. The 3D structure of the domain displays several alpha-helices. It was found to be associated with other domains involved in guanine nucleotide exchange (e.g., CDC25, Dbl) in mammalian factors.

    Proteins where this domain is known:
    PY00094   


    PF01370 - Epimerase (Pfam link)

    Interpro entry IPR001509 : NAD-dependent epimerase/dehydratase (Interpro link)

    Pfam description:
    This family of proteins utilise NAD as a cofactor. The proteins in this family use nucleotide-sugar substrates for a variety of chemical reactions.

    Interpro description:

    This family of proteins utilise NAD as a cofactor. The proteins in this family use nucleotide-sugar substrates for a variety of chemical reactions. It contains the NAD(P)- binding domain which is a commonly found domain with a core Rossmann-type fold. One of the best studied of these proteins is UDP-galactose 4-epimerase which catalyses the conversion of UDP-galactose to UDP-glucose during galactose metabolism.

    Proteins where this domain is known:
    PY01921    PY06463   


    PF01379 - Porphobil_deam (Pfam link)

    Interpro entry IPR000860 : Tetrapyrrole biosynthesis, hydroxymethylbilane synthase (Interpro link)

    Interpro description:

    Tetrapyrroles are large macrocyclic compounds derived from a common biosynthetic pathway. The end-product, uroporphyrinogen III, is used to synthesise a number of important molecules, including vitamin B12, haem, sirohaem, chlorophyll, coenzyme F430 and phytochromobilin.

    The first stage in tetrapyrrole synthesis is the synthesis of 5-aminoaevulinic acid ALA via two possible routes: (1) condensation of succinyl CoA and glycine (C4 pathway) using ALA synthase, or (2) decarboxylation of glutamate (C5 pathway) via three different enzymes, glutamyl-tRNA synthetase to charge a tRNA with glutamate, glutamyl-tRNA reductase to reduce glutamyl-tRNA to glutamate-1-semialdehyde (GSA), and GSA aminotransferase to catalyse a transamination reaction to produce ALA.

    The second stage is to convert ALA to uroporphyrinogen III, the first macrocyclic tetrapyrrolic structure in the pathway. This is achieved by the action of three enzymes in one common pathway: porphobilinogen (PBG) synthase (or ALA dehydratase) to condense two ALA molecules to generate porphobilinogen; hydroxymethylbilane synthase (or PBG deaminase) to polymerise four PBG molecules into preuroporphyrinogen (tetrapyrrole structure); and uroporphyrinogen III synthase to link two pyrrole units together (rings A and D) to yield uroporphyrinogen III.

    Uroporphyrinogen III is the first branch point of the pathway. To synthesise cobalamin (vitamin B12), sirohaem, and coenzyme F430, uroporphyrinogen III needs to be converted into precorrin-2 by the action of uroporphyrinogen III methyltransferase. To synthesise haem and chlorophyll, uroporphyrinogen III needs to be decarboxylated into coproporphyrinogen III by the action of uroporphyrinogen III decarboxylase.

    This entry represents hydroxymethylbilane synthase (or porphobilinogen deaminase), which functions during the second stage of tetrapyrrole biosynthesis. This enzyme catalyses the polymerisation of four PBG molecules into the tetrapyrrole structure, preuroporphyrinogen, with the concomitant release of four molecules of ammonia. This enzyme uses a unique dipyrro-methane cofactor made from two molecules of PBG, which is covalently attached to a cysteine side chain. The tetrapyrrole product is synthesized in an ordered, sequential fashion, by initial attachment of the first pyrrole unit (ring A) to the cofactor, followed by subsequent additions of the remaining pyrrole units (rings B, C, D) to the growing pyrrole chain. The link between the pyrrole ring and the cofactor is broken once all the pyrroles have been added. This enzyme is folded into three distinct domains that enclose a single, large active site that makes use of an aspartic acid as its one essential catalytic residue, acting as a general acid/base during catalysis. A deficiency of hydroxymethylbilane synthase is implicated in the neuropathic disease, Acute Intermittent Porphyria (AIP).

    Proteins where this domain is known:
    PY01828   


    PF01380 - SIS (Pfam link)

    Interpro entry IPR001347 : Sugar isomerase (SIS) (Interpro link)

    Pfam description:
    SIS (Sugar ISomerase) domains are found in many phosphosugar isomerases and phosphosugar binding proteins. SIS domains are also found in proteins that regulate the expression of genes involved in synthesis of phosphosugars. Presumably the SIS domains bind to the end-product of the pathway.

    Interpro description:
    The SIS (Sugar ISomerase) domain is a phosphosugar-binding domain found in many phosphosugar isomerases and phosphosugar binding proteins. SIS domains are also found in proteins that regulate the expression of genes involved in synthesis of phosphosugars possibly by binding to the end-product of the pathway.

    Proteins where this domain is known:
    PY00101   


    PF01381 - HTH_3 (Pfam link)

    Interpro entry IPR001387 : Helix-turn-helix type 3 (Interpro link)

    Pfam description:
    This large family of DNA binding helix-turn helix proteins includes Cro Swiss:P03036 and CI Swiss:P03034.

    Interpro description:

    This is large family of DNA binding helix-turn helix proteins that include a bacterial plasmid copy control protein, bacterial methylases, various bacteriophage transcription control proteins and a vegetative specific protein from Dictyostelium discoideum (Slime mould).

    Proteins where this domain is known:
    PY05916   


    PF01384 - PHO4 (Pfam link)

    Interpro entry IPR001204 : Phosphate transporter (Interpro link)

    Pfam description:
    This family includes PHO-4 from Neurospora crassa which is a is a Na(+)-phosphate symporter. This family also contains the leukaemia virus receptor Swiss:Q08344.

    Interpro description:

    The PHO-4 family of transporters includes the phosphate-repressible phosphate permease (PHO-4) from Neurospora crassa which is probably a sodium-phosphate symporter. This family also includes the human leukemia virus receptor.

    Proteins where this domain is known:
    PY01393   


    PF01391 - Collagen (Pfam link)

    Interpro entry IPR008160 : (Interpro link)

    Pfam description:
    Members of this family belong to the collagen superfamily. Collagens are generally extracellular structural proteins involved in formation of connective tissue structure. The alignment contains 20 copies of the G-X-Y repeat that forms a triple helix. The first position of the repeat is glycine, the second and third positions can be any residue but are frequently proline and hydroxyproline. Collagens are post translationally modified by proline hydroxylase to form the hydroxyproline residues. Defective hydroxylation is the cause of scurvy. Some members of the collagen superfamily are not involved in connective tissue structure but share the same triple helical structure.

    Interpro description:
    Members of this family belong to the collagen superfamily. Collagens are generally extracellular structural proteins involved in formation of connective tissue structure. The sequence is predominantly repeats of the G-X-Y and the polypeptide chains form a triple helix. The first position of the repeat is glycine, the second and third positions can be any residue but are frequently proline and hydroxyproline. Collagens are post-translationally modified by proline hydroxylase to form the hydroxyproline residues. Defective hydroxylation is the cause of scurvy.

    Some members of the collagen superfamily are not involved in connective tissue structure but share the same triple helical structure.

    Proteins where this domain is known:
    PY00324    PY01286    PY03876    PY05674   

    Proteins where this domain has been detected by our approach:
    PY04858   


    PF01394 - Clathrin_propel (Pfam link)

    Interpro entry IPR001473 : Clathrin, heavy chain, propeller, N-terminal (Interpro link)

    Pfam description:
    Clathrin is the scaffold protein of the basket-like coat that surrounds coated vesicles. The soluble assembly unit, a triskelion, contains three heavy chains and three light chains in an extended three-legged structure. Each leg contains one heavy and one light chain. The N-terminus of the heavy chain is known as the globular domain, and is composed of seven repeats which form a beta propeller.

    Interpro description:

    Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors.

    Clathrin is a trimer composed of three heavy chains and three light chains, each monomer projecting outwards like a leg; this three-legged structure is known as a triskelion. The heavy chains form the legs, their N-terminal beta-propeller regions extending outwards, while their C-terminal alpha-alpha-superhelical regions form the central hub of the triskelion. Peptide motifs can bind between the beta-propeller blades. The light chains appear to have a regulatory role, and may help orient the assembly and disassembly of clathrin coats as they interact with hsc70 uncoating ATPase. Clathrin triskelia self-polymerise into a curved lattice by twisting individual legs together. The clathrin lattice forms around a vesicle as it buds from the TGN, plasma membrane or endosomes, acting to stabilise the vesicle and facilitate the budding process. The multiple blades created when the triskelia polymerise are involved in multiple protein interactions, enabling the recruitment of different cargo adaptors and membrane attachment proteins.

    This entry represents the N-terminal beta-propeller region of clathrin heavy chains that extends away from the hub of triskelia, and which are responsible for peptide binding.

    More information about these proteins can be found at Protein of the Month: Clathrin.

    Proteins where this domain is known:
    PY01854   


    PF01398 - Mov34 (Pfam link)

    Interpro entry IPR000555 : (Interpro link)

    Pfam description:
    Members of this family are found in proteasome regulatory subunits, eukaryotic initiation factor 3 (eIF3) subunits and regulators of transcription factors. This family is also known as the MPN domain and PAD-1-like domain. It has been shown that this domain occurs in prokaryotes.

    Interpro description:

    Members of this family are found in proteasome regulatory subunits, eukaryotic initiation factor 3 (eIF3) subunits and regulators of transcription factors. This family is also known as the MPN domain and PAD-1-like domain. It has been shown that this domain occurs in prokaryotes.

    Mov34 proteins act as the regulatory subunit of the 26 proteasome, which is involved in the ATP-dependent degradation of ubiquitinated proteins. The function of this domain is unclear, but it is found in the N-terminus of the proteasome regulatory subunits, eukaryotic initiation factor 3 (eIF3) subunits and regulators of transcription factors.

    A number of the proteins associated with this family belong to MEROPS peptidase family M67 (clan M-). This includes the Poh1 peptidase of Saccharomyces cerevisiae (Baker's yeast) which is a component of the 19S proteasome regulatory particle.

    Proteins where this domain is known:
    PY02659    PY03078    PY03442    PY05051   

    Proteins where this domain has been detected by our approach:
    PY01669   


    PF01399 - PCI (Pfam link)

    Interpro entry IPR000717 : (Interpro link)

    Pfam description:
    This domain has also been called the PINT motif (Proteasome, Int-6, Nip-1 and TRIP-15).

    Interpro description:
    A homology domain of unclear function, occurs in the C-terminal region of several regulatory components of the 26S proteasome as well as in other proteins. This domain has also been called the PINT motif (Proteasome, Int-6, Nip-1 and TRIP-15). Apparently, all of the characterised proteins containing PCI domains are parts of larger multi-protein complexes. Proteins with PCI domains include budding yeast proteasome regulatory components Rpn3(Sun2), Rpn5, Rpn6, Rpn7and Rpn9; mammalian proteasome regulatory components p55, p58 and p44.5, and translation initiation factor 3 complex subunits p110 and INT6; Arabidopsis COP9 and FUS6/COP11; mammalian G-protein pathway suppressor GPS1, and several uncharacterised ORFs from plant, nematodes and mammals. The complete homology domain comprises approx. 200 residues, the highest conservation is found in the C-terminal half. Several of the proteins mentioned above have no detectable homology to the N-terminal half of the domain.

    Proteins where this domain is known:
    PY00119    PY00916    PY01281    PY02643    PY02721    PY03267    PY06923   


    PF01400 - Astacin (Pfam link)

    Interpro entry IPR001506 : Peptidase M12A, astacin (Interpro link)

    Pfam description:
    The members of this family are enzymes that cleave peptides. These proteases require zinc for catalysis. Members of this family contain two conserved disulphide bridges, these are joined 1-4 and 2-3. Members of this family have an amino terminal propeptide which is cleaved to give the active protease domain. All other linked domains are found to the carboxyl terminus of this domain. This family includes: Astacin Swiss:P07584, a digestive enzyme from Crayfish. Meprin, Swiss:Q16819, a multiple domain membrane component that is constructed from a homologous alpha and beta chain. Proteins involved in morphogenesis such as Swiss:P13497, and Tolloid from drosophila Swiss:P25723.

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of metallopeptidases belong to the MEROPS peptidase family M12, subfamily M12A (astacin family, clan MA(M)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA and the predicted active site residues for members of this family and thermolysin occur in the motif HEXXH.

    The astacin family of metalloendopeptidases encompasses a range of proteins found in hydra to humans, in mature and developmental systems. Their functions include activation of growth factors, degradation of polypeptides, and processing of extracellular proteins. The proteins are synthesised with N-terminal signal and pro-enzyme sequences, and many contain multiple domains C-terminal to the protease domain. They are either secreted from cells, or are associated with the plasma membrane.

    The astacin molecule adopts a kidney shape, with a deep active-site cleft between its N- and C-terminal domains. The zinc ion, which lies at the bottom of the cleft, exhibits a unique penta-coordinated mode of binding, involving 3 histidine residues, a tyrosine and a water molecule (which is also bound to the carboxylate side chain of Glu93). The N-terminal domain comprises 2 alpha-helices and a 5-stranded beta-sheet. The overall topology of this domain is shared by the archetypal zinc-endopeptidase thermolysin. Astacin protease domains also share common features with serralysins, matrix metalloendopeptidases, and snake venom proteases; they cleave peptide bonds in polypeptides such as insulin B chain and bradykinin, and in proteins such as casein and gelatin; and they have arylamidase activity.

    Proteins where this domain has been detected by our approach:
    PY07283   


    PF01406 - tRNA-synt_1e (Pfam link)

    Interpro entry IPR015803 : Cysteinyl-tRNA synthetase, class Ia, N-terminal (Interpro link)

    Pfam description:
    This family includes only cysteinyl tRNA synthetases.

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

    Cysteinyl-tRNA synthetase is an alpha monomer and belongs to class Ia.

    Proteins where this domain is known:
    PY04618   


    PF01409 - tRNA-synt_2d (Pfam link)

    Interpro entry IPR018157 : Phenylalanyl-tRNA synthetase, class IIc, C-terminal (Interpro link)

    Pfam description:
    Other tRNA synthetase sub-families are too dissimilar to be included. This family includes only phenylalanyl-tRNA synthetases. This is the core catalytic domain.

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

    Phenylalanyl-tRNA synthetase is an alpha2/beta2 tetramer composed of 2 subunits that belongs to class IIc. In eubacteria, a small subunit (pheS gene) can be designated as beta (E. coli) or alpha subunit (nomenclature adopted in InterPro). Reciprocally the large subunit (pheT gene) can be designated as alpha (E. coli) or beta (see. In all other kingdoms the two subunits have equivalent length in eukaryota, and can be identified by specific signatures. The enzyme from Thermus thermophilus has an alpha 2 beta 2 type quaternary structure and is one of the most complicated members of the synthetase family. Identification of phenylalanyl-tRNA synthetase as a member of class II aaRSs was based only on sequence alignment of the small alpha-subunit with other synthetases.

    Proteins where this domain is known:
    PY00417    PY04422    PY04756   


    PF01411 - tRNA-synt_2c (Pfam link)

    Interpro entry IPR018164 : Alanyl-tRNA synthetase, class IIc, N-terminal (Interpro link)

    Pfam description:
    Other tRNA synthetase sub-families are too dissimilar to be included. This family includes only alanyl-tRNA synthetases.

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

    Alanyl-tRNA synthetase is an alpha4 tetramer that belongs to class IIc.

    Proteins where this domain is known:
    PY03081   


    PF01412 - ArfGap (Pfam link)

    Interpro entry IPR001164 : Arf GTPase activating protein (Interpro link)

    Pfam description:
    Putative zinc fingers with GTPase activating proteins (GAPs) towards the small GTPase, Arf. The GAP of ARD1 stimulates GTPase hydrolysis for ARD1 but not ARFs.

    Interpro description:

    This entry describes a family of small GTPase activating proteins, for example ARF1-directed GTPase-activating protein, the cycle control GTPase activating protein (GAP) GCS1 which is important for the regulation of the ADP ribosylation factor ARF, a member of the Ras superfamily of GTP-binding proteins. The GTP-bound form of ARF is essential for the maintenance of normal Golgi morphology, it participates in recruitment of coat proteins which are required for budding and fission of membranes. Before the fusion with an acceptor compartment the membrane must be uncoated. This step required the hydrolysis of GTP associated to ARF. These proteins contain a characteristic zinc finger motif (Cys-x2-Cys-x(16,17)-x2-Cys) which displays some similarity to the C4-type GATA zinc finger. The ARFGAP domain display no obvious similarity to other GAP proteins.

    The 3D structure of the ARFGAP domain of the PYK2-associated protein beta has been solved. It consists of a three-stranded beta-sheet surrounded by 5 alpha helices. The domain is organised around a central zinc atom which is coordinated by 4 cysteines. The ARFGAP domain is clearly unrelated to the other GAP proteins structures which are exclusively helical. Classical GAP proteins accelerate GTPase activity by supplying an arginine finger to the active site. The crystal structure of ARFGAP bound to ARF revealed that the ARFGAP domain does not supply an arginine to the active site which suggests a more indirect role of the ARFGAP domain in the GTPase hydrolysis.

    The Rev protein of human immunodeficiency virus type 1 (HIV-1) facilitates nuclear export of unspliced and partly-spliced viral RNAs. Rev contains an RNA-binding domain and an effector domain; the latter is believed to interact with a cellular cofactor required for the Rev response and hence HIV-1 replication. Human Rev interacting protein (hRIP) specifically interacts with the Rev effector. The amino acid sequence of hRIP is characterised by an N-terminal, C-4 class zinc finger motif.

    Proteins where this domain is known:
    PY00381    PY03828    PY03864   


    PF01416 - PseudoU_synth_1 (Pfam link)

    Interpro entry IPR001406 : tRNA pseudouridine synthase (Interpro link)

    Pfam description:
    Involved in the formation of pseudouridine at the anticodon stem and loop of transfer-RNAs Pseudouridine is an isomer of uridine (5-(beta-D-ribofuranosyl) uracil, and id the most abundant modified nucleoside found in all cellular RNAs. The TruA-like proteins also exhibit a conserved sequence with a strictly conserved aspartic acid, likely involved in catalysis.

    Interpro description:
    Transfer RNA-pseudouridine synthetase contains one atom of zinc essential for its native conformation and tRNA recognition and has a strictly conserved aspartic acid that is likely to be involved in catalysis. It is involved in the formation of pseudouridine at positions 38, 39 and 40 in the anticodon stem and loop of transfer-RNAs. Pseudouridine is the most abundant modified nucleoside found in all cellular RNAs.

    Proteins where this domain is known:
    PY03314   

    Proteins where this domain has been detected by our approach:
    PY05524   


    PF01417 - ENTH (Pfam link)

    Interpro entry IPR001026 : (Interpro link)

    Pfam description:
    The ENTH (Epsin N-terminal homology) domain is found in proteins involved in endocytosis and cytoskeletal machinery. The function of the ENTH domain is unknown.

    Interpro description:

    The ENTH (Epsin N-terminal homology) domain is approximately 150 amino acids in length and is always found located at the N-termini of proteins. The domain forms a compact globular structure, composed of 9 alpha-helices connected by loops of varying length. The general topology is determined by three helical hairpins that are stacked consecutively with a right hand twist.. An N-terminal helix folds back, forming a deep basic groove that forms the binding pocket for the Ins(1,4,5)P3 ligand. The ligand is coordinated by residues from surrounding alpha-helices and all three phosphates are multiply coordinated. The coordination of Ins(1,4,5)P3 suggests that ENTH is specific for particular head groups.

    Proteins containing this domain have been found to bind PtdIns(4,5)P2 and PtdIns(1,4,5)P3 suggesting that the domain may be a membrane interacting module. The main function of proteins containing this domain appears to be to act as accessory clathrin adaptors in endocytosis, Epsin is able to recruit and promote clathrin polymerisation on a lipid monolayer, but may have additional roles in signalling and actin regulation. Epsin causes a strong degree of membrane curvature and tubulation, even fragmentation of membranes with a high PtdIns(4,5)P2 content. Epsin binding to membranes facilitates their deformation by insertion of the N-terminal helix into the outer leaflet of the bilayer, pushing the head groups apart. This would reduce the energy needed to curve the membrane into a vesicle, making it easier for the clathrin cage to fix and stabilise the curved membrane. This points to a pioneering role for epsin in vesicle budding as it provides both a driving force and a link between membrane invagination and clathrin polymerisation.

    Proteins where this domain is known:
    PY07827   


    PF01421 - Reprolysin (Pfam link)

    Interpro entry IPR001590 : Peptidase M12B, ADAM/reprolysin (Interpro link)

    Pfam description:
    The members of this family are enzymes that cleave peptides. These proteases require zinc for catalysis. Members of this family are also known as adamalysins. Most members of this family are snake venom endopeptidases, but there are also some mammalian proteins such as Swiss:P78325, and fertilin Swiss:Q28472. Fertilin and closely related proteins appear to not have some active site residues and may not be active enzymes.

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of metallopeptidases belong to the MEROPS peptidase family M12, subfamily M12B (adamalysin family, clan (MA(M)). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA and the predicted active site residues for members of this family and thermolysin occur in the motif HEXXH.

    The adamalysins are zinc dependent endopeptidases found in snake venom. There are some mammalian proteins such as and fertilin Fertilin and closely related proteins appear to not have some active site residues and may not be active enzymes.

    CD156 (also called ADAM8 or MS2 human) has been implicated in extravasation of leukocytes. CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at Protein Reviews On The Web (http://mpr.nci.nih.gov/prow/).

    Proteins where this domain has been detected by our approach:
    PY06415   


    PF01423 - LSM (Pfam link)

    Interpro entry IPR001163 : (Interpro link)

    Pfam description:
    The LSM domain contains Sm proteins as well as other related LSM (Like Sm) proteins. The U1, U2, U4/U6, and U5 small nuclear ribonucleoprotein particles (snRNPs) involved in pre-mRNA splicing contain seven Sm proteins (B/B\', D1, D2, D3, E, F and G) in common, which assemble around the Sm site present in four of the major spliceosomal small nuclear RNAs. The U6 snRNP binds to the LSM (Like Sm) proteins. Sm proteins are also found in archaebacteria, which do not have any splicing apparatus suggesting a more general role for Sm proteins. All Sm proteins contain a common sequence motif in two segments, Sm1 and Sm2, separated by a short variable linker. This family also includes the bacterial Hfq (host factor Q) proteins. Hfq are also RNA-binding proteins, that form hexameric rings.

    Interpro description:

    This family is found in Lsm (like-Sm) proteins and in bacterial Lsm-related Hfq proteins. In each case, the domain adopts a core structure consisting of an open beta-barrel with an SH3-like topology.

    Lsm (like-Sm) proteins have diverse functions, and are thought to be important modulators of RNA biogenesis and function. The Sm proteins form part of specific small nuclear ribonucleoproteins (snRNPs) that are involved in the processing of pre-mRNAs to mature mRNAs, and are a major component of the eukaryotic spliceosome. Most snRNPs consist of seven Sm proteins (B/BÂ, D1, D2, D3, E, F and G) arranged in a ring on a uridine-rich sequence (Sm site), plus a small nuclear RNA (snRNA) (either U1, U2, U5 or U4/6). All Sm proteins contain a common sequence motif in two segments, Sm1 and Sm2, separated by a short variable linker. In other snRNPs, certain Sm proteins are replaced with different Lsm proteins, such as with U7 snRNPs, in which the D1 and D2 Sm proteins are replaced with U7-specific Lsm10 and Lsm11 proteins, where Lsm11 plays a role in histone U7-specific RNA processing. Lsm proteins are also found in archaebacteria, which do not have any splicing apparatus suggesting a more general role for Lsm proteins.

    The pleiotropic translational regulator Hfq (host factor Q) is a bacterial Lsm-like protein, which modulates the structure of numerous RNA molecules by binding preferentially to A/U-rich sequences in RNA. Hfq forms an Lsm-like fold, however, unlike the heptameric Sm proteins, Hfq forms a homo-hexameric ring.

    Proteins where this domain is known:
    PY00270    PY00919    PY01248    PY02122    PY02123    PY03069    PY03502    PY03614    PY04489    PY05604    PY07229    PY07662   


    PF01425 - Amidase (Pfam link)

    Interpro entry IPR000120 : Amidase signature enzyme (Interpro link)

    Interpro description:

    Amidase signature (AS) enzymes are a large group of hydrolytic enzymes that contain a conserved stretch of approximately 130 amino acids known as the AS sequence. They are widespread, being found in both prokaryotes and eukaryotes. AS enzymes catalyse the hydrolysis of amide bonds (CO-NH2), although the family has diverged widely with regard to substrate specificity and function. Nonetheless, these enzymes maintain a core alpha/beta/alpha structure, where the topologies of the N- and C-terminal halves are similar. AS enzymes characteristically have a highly conserved C-terminal region rich in serine and glycine residues, but devoid of aspartic acid and histidine residues, therefore they differ from classical serine hydrolases. These enzymes posses a unique, highly conserved Ser-Ser-Lys catalytic triad used for amide hydrolysis, although the catalytic mechanism for acyl-enzyme intermediate formation can differ between enzymes.

    Examples of AS enzymes include:

    Proteins where this domain is known:
    PY02858   


    PF01432 - Peptidase_M3 (Pfam link)

    Interpro entry IPR001567 : Peptidase M3A and M3B, thimet/oligopeptidase F (Interpro link)

    Pfam description:
    This is the Thimet oligopeptidase family, large family of mammalian and bacterial oligopeptidases that cleave medium sized peptides. The group also contains mitochondrial intermediate peptidase which is encoded by nuclear DNA but functions within the mitochondria to remove the leader sequence.

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of metallopeptidases belong to MEROPS peptidase family M3 (clan MA(E)), subfamilies M3A and M3B. The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA.

    The Thimet oligopeptidase family, is a large family of archaeal, bacterial and eukaryotic oligopeptidases that cleave medium sized peptides. The group contains:

    Proteins where this domain is known:
    PY01285    PY06253   


    PF01433 - Peptidase_M1 (Pfam link)

    Interpro entry IPR014782 : Peptidase M1, membrane alanine aminopeptidase, N-terminal (Interpro link)

    Pfam description:
    Members of this family are aminopeptidases. The members differ widely in specificity, hydrolysing acidic, basic or neutral N-terminal residues. This family includes leukotriene-A4 hydrolase Swiss:P09960, this enzyme also has an aminopeptidase activity.

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of metallopeptidases belong to the MEROPS peptidase family M1 (clan MA(E)), the type example being aminopeptidase N from Homo sapiens (Human). The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA.

    Membrane alanine aminopeptidase is part of the HEXXH+E group; it consists entirely of aminopeptidases, spread across a wide variety of species. Functional studies show that CD13/APN catalyzes the removal of single amino acids from the amino terminus of small peptides and probably plays a role in their final digestion; one family member (leukotriene-A4 hydrolase) is known to hydrolyse the epoxide leukotriene-A4 to form an inflammatory mediator. This hydrolase has been shown to have aminopeptidase activity, and the zinc ligands of the M1 family were identified by site-directed mutagenesis on this enzyme CD13 participates in trimming peptides bound to MHC class II molecules and cleaves MIP-1 chemokine, which alters target cell specificity from basophils to eosinophils. CD13 acts as a receptor for specific strains of RNA viruses (coronaviruses) which cause a relatively large percentage of upper respiratory trace infections.

    CD molecules are leucocyte antigens on cell surfaces. CD antigens nomenclature is updated at Protein Reviews On The Web (http://mpr.nci.nih.gov/prow/).

    Proteins where this domain is known:
    PY01557   


    PF01434 - Peptidase_M41 (Pfam link)

    Interpro entry IPR000642 : Peptidase M41 (Interpro link)

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of metallopeptidases belong to MEROPS peptidase family M41 (FtsH endopeptidase family, clan MA(E)). The predicted active site residues for members of this family and thermolysin, the type example for clan MA, occur in the motif HEXXH.

    The peptidase M41 family belong to a larger family of zinc metalloproteases. This family includes the cell division protein FtsH, and the yeast mitochondrial respiratory chain complexes assembly protein, which is a putative ATP-dependent protease required for assembly of the mitochondrial respiratory chain and ATPase complexes. FtsH is an integral membrane protein, which seems to act as an ATP-dependent zinc metallopeptidase that binds one zinc ion.

    Proteins where this domain is known:
    PY04402    PY05070    PY05838   


    PF01436 - NHL (Pfam link)

    Interpro entry IPR001258 : (Interpro link)

    Pfam description:
    The NHL (NCL-1, HT2A and LIN-41) repeat is found in multiple tandem copies. It is about 40 residues long and resembles the WD repeat Pfam:PF00400. The repeats have a catalytic activity in Swiss:P10731, proteolysis has shown that the Peptidyl-alpha-hydroxyglycine alpha-amidating lyase (PAL) activity is localised to the repeats. Swiss:Q13049 interacts with the activation domain of Tat. This interaction is me diated by the NHL repeats.

    Interpro description:

    The NHL repeat, named after NCL-1, HT2A and Lin-41, is found largely in a large number of eukaryotic and prokaryotic proteins. For example, the repeat is found in a variety of enzymes of the copper type II, ascorbate-dependent monooxygenase family which catalyse the C-terminus alpha-amidation of biological peptides. In many it occurs in tandem arrays, for example in the ringfinger beta-box, coiled-coil (RBCC) eukaryotic growth regulators. The 'Brain Tumor' protein (Brat) is one such growth regulator that contains a 6-bladed NHL-repeat beta-propeller.

    The NHL repeats are also found in serine/threonine protein kinase (STPK) in diverse range of pathogenic bacteria. These STPK are transmembrane receptors with a intracellular N-terminal kinase domain and extracellular C-terminal sensor domain. In the STPK, PknD, from Mycobacterium tuberculosis, the sensor domain forms a rigid, six-bladed b-propeller composed of NHL repeats with a flexible tether to the transmembrane domain.

    Proteins where this domain has been detected by our approach:
    PY00076   


    PF01448 - ELM2 (Pfam link)

    Interpro entry IPR000949 : (Interpro link)

    Pfam description:
    The ELM2 (Egl-27 and MTA1 homology 2) domain is a small domain of unknown function. It is found in the MTA1 protein that is part of the NuRD complex. The domain is usually found to the N terminus of a myb-like DNA binding domain Pfam:PF00249. ELM2 is also found associated with an ARID DNA binding domain Pfam:PF01388 in Swiss:O82364. This suggests that ELM2 may also be involved in DNA binding, or perhaps is a protein-protein interaction domain.

    Interpro description:

    The ELM2 (Egl-27 and MTA1 homology 2) domain is a small domain of unknown function. It is found in the MTA1 protein that is part of the NuRD complex. The domain is usually found to the N terminus of a myb-like DNA binding domain and a GATA binding domain. ELM2, in some instances, is also found associated with the ARID DNA binding domain This suggests that ELM2 may also be involved in DNA binding, or perhaps is a protein-protein interaction domain.

    Proteins where this domain is known:
    PY00631   

    Proteins where this domain has been detected by our approach:
    PY03412   


    PF01458 - UPF0051 (Pfam link)

    Interpro entry IPR000825 : SUF system FeS cluster assembly, SufBD (Interpro link)

    Interpro description:

    Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S] form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.

    The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transfering them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly.

    The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. No specific functions have been assigned to SufB and SufD. SufA is homologous to IscA, acting as a scaffold protein in which Fe and S atoms are assembled into [FeS] cluster forms, which can then easily be transferred to apoproteins targets.

    In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen.

    This entry represents SufB and SufD proteins that form part of the SufBCD complex in the SUF system. No specific functions have been assigned to these proteins.

    Proteins where this domain is known:
    PY00058    PY00907   


    PF01459 - Porin_3 (Pfam link)

    Interpro entry IPR001925 : Porin, eukaryotic type (Interpro link)

    Interpro description:

    The major protein of the outer mitochondrial membrane of eukaryotes is a porin that forms a voltage-dependent anion-selective channel (VDAC) that behaves as a general diffusion pore for small hydrophilic molecules. The channel adopts an open conformation at low or zero membrane potential and a closed conformation at potentials above 30-40 mV.

    This protein contains about 280 amino acids and its sequence is composed of between 12 to 16 beta-strands that span the mitochondrial outer membrane. Yeast contains two members of this family (genes POR1 and POR2); vertebrates have at least three members (genes VDAC1, VDAC2 and VDAC3).

    Proteins where this domain is known:
    PY05229   


    PF01466 - Skp1 (Pfam link)

    Interpro entry IPR016072 : SKP1 component, dimerisation (Interpro link)

    Interpro description:

    SKP1 (together with SKP2) was identified as an essential component of the cyclin A-CDK2 S phase kinase complex. It was found to bind several F-box containing proteins (e.g., Cdc4, Skp2, cyclin F) and to be involved in the ubiquitin protein degradation pathway. A yeast homologue of SKP1 (P52286) was identified in the centromere bound kinetochore complex and is also involved in the ubiquitin pathway. In Dictyostelium discoideum (Slime mold) FP21 was shown to be glycosylated in the cytosol and has homology to SKP1.

    This entry represents a dimerisation domain found at the C-terminal of SKP1 proteins, as well as in subunit D of the centromere DNA-binding protein complex Cbf3. This domain is multi-helical in structure, and consists of an interlocked herterodimer in F-box proteins.

    Proteins where this domain is known:
    PY00081   


    PF01467 - CTP_transf_2 (Pfam link)

    Interpro entry IPR004820 : Cytidylyltransferase (Interpro link)

    Pfam description:
    This family includes: Cholinephosphate cytidylyltransferase Swiss:P49585. Glycerol-3-phosphate cytidylyltransferase Swiss:P27623.

    Interpro description:

    This family includes:

    CTP:cholinephosphate cytidylyltransferase (CCT) is a key regulatory enzyme in phosphatidylcholine biosynthesis that catalyzes the formation of CDP-choline. A comparison of the catalytic domains of CCTs from a wide variety of organisms reveals a large number of completely conserved residues. There may be a role for the conserved HXGH sequence in catalysis. The membrane-binding domain in rat CCT has been defined, and it has been suggested that lipids may play a role in inactivating the enzyme. A phosphorylation domain has been described.

    Proteins where this domain is known:
    PY02436    PY06073    PY06728   


    PF01469 - Pentapeptide_2 (Pfam link)

    Interpro entry IPR002989 : (Interpro link)

    Pfam description:
    These repeats are found in many mycobacterial proteins. These repeats are most common in the Pfam:PF00823 family of proteins, where they are found in the MPTR subfamily of PPE proteins. The function of these repeats is unknown. The repeat can be approximately described as XNXGX, where X can be any amino acid. These repeats are similar to Pfam:PF00805, however it is not clear if these two families are structurally related.

    Interpro description:
    These repeats are found in many mycobacterial proteins. The repeats are most common in the PPE family of proteins where they are found in the MPTR subfamily. The function of these repeats is unknown. The repeat can be approximately described as XNXGX, where X can be any amino acid. These repeats are similar to A(D/N)LXX repeats however it is not clear if these two families are structurally related.

    Proteins where this domain is known:
    PY03713   


    PF01472 - PUA (Pfam link)

    Interpro entry IPR002478 : PUA (Interpro link)

    Pfam description:
    The PUA domain named after Pseudouridine synthase and Archaeosine transglycosylase, was detected in archaeal and eukaryotic pseudouridine synthases, archaeal archaeosine synthases, a family of predicted ATPases that may be involved in RNA modification, a family of predicted archaeal and bacterial rRNA methylases. Additionally, the PUA domain was detected in a family of eukaryotic proteins that also contain a domain homologous to the translation initiation factor eIF1/SUI1; these proteins may comprise a novel type of translation factors. Unexpectedly, the PUA domain was detected also in bacterial and yeast glutamate kinases; this is compatible with the demonstrated role of these enzymes in the regulation of the expression of other genes. It is predicted that the PUA domain is an RNA binding domain.

    Interpro description:

    The PUA (PseudoUridine synthase and Archaeosine transglycosylase) domain was named after the proteins in which it was first found. PUA is a highly conserved RNA-binding motif found in a wide range of archaeal, bacterial and eukaryotic proteins, including enzymes that catalyse tRNA and rRNA post-transcriptional modifications, proteins involved in ribosome biogenesis and translation, as well as in enzymes involved in proline biosynthesis. The structures of several PUA-RNA complexes reveal a common RNA recognition surface, but also some versatility in the way in which the motif binds to RNA. PUA motifs are involved in dyskeratosis congenita and cancer, pointing to links between RNA metabolism and human diseases.

    Proteins where this domain is known:
    PY02784    PY05447   

    Proteins where this domain has been detected by our approach:
    PY04116   


    PF01477 - PLAT (Pfam link)

    Interpro entry IPR001024 : (Interpro link)

    Pfam description:
    This domain is found in a variety of membrane or lipid associated proteins. It is called the PLAT (Polycystin-1, Lipoxygenase, Alpha-Toxin) domain or LH2 (Lipoxygenase homology) domain. The known structure of pancreatic lipase shows this domain binds to procolipase Pfam:PF01114, which mediates membrane association. So it appears possible that this domain mediates membrane attachment via other protein binding partners. The structure of this domain is known for many members of the family and is composed of a beta sandwich.

    Interpro description:

    Lipoxygenases are a class of iron-containing dioxygenases which catalyses the hydroperoxidation of lipids, containing a cis,cis-1,4-pentadiene structure. They are common in plants where they may be involved in a number of diverse aspects of plant physiology including growth and development, pest resistance, and senescence or responses to wounding. In mammals a number of lipoxygenases isozymes are involved in the metabolism of prostaglandins and leukotrienes. Sequence data is available for the following lipoxygenases:

    The iron atom in lipoxygenases is bound by four ligands, three of which are histidine residues. Six histidines are conserved in all lipoxygenase sequences, five of them are found clustered in a stretch of 40 amino acids. This region contains two of the three zinc-ligands; the other histidines have been shown to be important for the activity of lipoxygenases.

    This entry represents a domain found in lipoxygenases and other enzymes. It is known as the PLAT (Polycystin-1, Lipoxygenase, Alpha-Toxin) domain or LH2 (Lipoxygenase homology) domain, is found in a variety of membrane or lipid associated proteins. Structurally, this domain forms a beta-sandwich composed of two sheets of four strands each. The most highly conserved regions coincide with the beta-strands, with most of the highly conserved residues being buried within the protein. An exception to this is a surface lysine or arginine that occurs on the surface of the fifth beta-strand of the eukaryotic domains. In pancreatic lipase, the lysine in this position forms a salt bridge with the procolipase protein. The conservation of a charged surface residue may indicate the location of a conserved ligand-binding site. It is thought that this domain may mediate membrane attachment via other protein binding partners.

    Proteins where this domain is known:
    PY01071   


    PF01479 - S4 (Pfam link)

    Interpro entry IPR002942 : RNA-binding S4 (Interpro link)

    Pfam description:
    The S4 domain is a small domain consisting of 60-65 amino acid residues that was detected in the bacterial ribosomal protein S4, eukaryotic ribosomal S9, two families of pseudouridine synthases, a novel family of predicted RNA methylases, a yeast protein containing a pseudouridine synthetase and a deaminase domain, bacterial tyrosyl-tRNA synthetases, and a number of uncharacterized, small proteins that may be involved in translation regulation. The S4 domain probably mediates binding to RNA.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    The S4 domain is a small domain consisting of 60-65 amino acid residues that was detected in the bacterial ribosomal protein S4, eukaryotic ribosomal S9, two families of pseudouridine synthases, a novel family of predicted RNA methylases, a yeast protein containing a pseudouridine synthetase and a deaminase domain, bacterial tyrosyl-tRNA synthetases, and a number of uncharacterised, small proteins that may be involved in translation regulation. The S4 domain probably mediates binding to RNA.

    Proteins where this domain is known:
    PY02191    PY02882    PY04143    PY04495    PY06291   

    Proteins where this domain has been detected by our approach:
    PY02484    PY03779    PY04000   


    PF01480 - PWI (Pfam link)

    Interpro entry IPR002483 : Splicing factor PWI (Interpro link)

    Interpro description:

    The PWI domain, named after a highly conserved PWI tri-peptide located within its N-terminal region, is a ~80 amino acid module, which is found either at the N-terminus or at the C-terminus of eukaryotic proteins involved in pre-mRNA processing. It is generally found in association with other domains such as RRM and RS. The PWI domain is a RNA/DNA-binding domain that has an equal preference for single- and double-stranded nucleic acids and is likely to have multiple important functions in pre-mRNA processing. Proteins containing this domain include the SR-related nuclear matrix protein of 160 kD (SRm160) splicing and 3'-end cleavage-stimulatory factor, and the mammalian splicing factor PRP3.

    The PWI domain is a soluble, globular and independently folded domain which consists of a four-helix bundle, with structured N- and C-terminal elements.

    Proteins where this domain is known:
    PY05315   

    Proteins where this domain has been detected by our approach:
    PY04393   


    PF01485 - IBR (Pfam link)

    Interpro entry IPR002867 : Zinc finger, C6HC-type (Interpro link)

    Pfam description:
    The IBR (In Between Ring fingers) domain is often found to occur between pairs of ring fingers (Pfam:PF00097). This domain has also been called the C6HC domain and DRIL (for double RING finger linked) domain. Proteins that contain two Ring fingers and an IBR domain (these proteins are also termed RBR family proteins) are thought to exist in all eukaryotic organisms. RBR family members play roles in protein quality control and can indirectly regulate transcription. Evidence suggests that RBR proteins are often parts of cullin-containing ubiquitin ligase complexes. The ubiquitin ligase Parkin is an RBR family protein whose mutations are involved in forms of familial Parkinson\'s diseas.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents a cysteine-rich (C6HC) zinc finger domain that is present in Triad1, and which is conserved in other proteins encoded by various eukaryotes. The C6HC consensus pattern is:

    The C6HC zinc finger motif is the fourth family member of the zinc-binding RING, LIM, and LAP/PHD fingers. Strikingly, in most of the proteins the C6HC domain is flanked by two RING finger structures The novel C6HC motif has been called DRIL (double RING finger linked). The strong conservation of the larger tripartite TRIAD (twoRING fingers and DRIL) structure indicates that the three subdomains are functionally linked and identifies a novel class of proteins.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY00504   


    PF01490 - Aa_trans (Pfam link)

    Interpro entry IPR013057 : (Interpro link)

    Pfam description:
    This transmembrane region is found in many amino acid transporters including UNC-47 and MTR. UNC-47 encodes a vesicular amino butyric acid (GABA) transporter, (VGAT). UNC-47 is predicted to have 10 transmembrane domains Swiss:P34579. MTR is a N system amino acid transporter system protein involved in methyltryptophan resistance Swiss:P38680. Other members of this family include proline transporters and amino acid permeases.

    Interpro description:
    This transmembrane region is found in many amino acid transporters including(UNC-47) and(MTR). UNC-47 encodes a vesicular amino butyric acid (GABA) transporter, (VGAT) and is is predicted to have 10 transmembrane domains UNC47_CAEEL. MTR is an N system amino acid transporter system protein involved in methyltryptophan resistance MTR_NEUCR. Other members of this family include proline transporters and amino acid transporters whose specificity has not yet been identified.

    Proteins where this domain is known:
    PY04168    PY06164   


    PF01493 - GXGXG (Pfam link)

    Interpro entry IPR002489 : Glutamate synthase, alpha subunit, C-terminal (Interpro link)

    Pfam description:
    This domain is found in glutamate synthase, tungsten formylmethanofuran dehydrogenase subunit c (FwdC) and molybdenum formylmethanofuran dehydrogenase subunit c (FmdC). A repeated G-XX-G-XXX-G motif is seen in the alignment.

    Interpro description:

    Glutamate synthase (GltS) is a complex iron-sulphur flavoprotein that catalyses the reductive synthesis of L-glutamate from 2-oxoglutarate and L-glutamine via intramolecular channelling of ammonia, a reaction in the bacterial, yeast and plant pathways for ammonia assimilation. GltS is a multifunctional enzyme that functions through three distinct active centres carrying out multiple reaction steps: L-glutamine hydrolysis, conversion of 2-oxoglutarate into L-glutamate, and electron uptake from an electron donor. The active centres are synchronised to avoid the wasteful consumption of L-glutamine. There are three classes of GltS, which share many functional properties: bacterial NADPH-dependent GltS, ferredoxin-dependent GltS from photosynthetic cells, and NAD(P)H-dependent GltS from yeast, fungi and lower animals.

    The dimeric alpha subunits each consist of four domains: N-terminal amidotransferase domain, the central domain, the FMN binding domain and the C-terminal domain. The C-terminal domain forms a right-handed beta-helix that comprises seven helical turns. Each helical turn has a sharp bend that is associated with a repeated sequence motif consisting of G-XX-G-XXX-G. This domain does not contain any residues directly involved in catalysis, but has a crucial structural role.

    This domain is also found in proteins such as subunit C of formylmethanofuran dehydrogenase, which catalyses the first step in methane formation from carbon dioxide in methanogenic archaea. There are two isoenzymes of formylmethanofuran dehydrogenase: a tungsten-containing isoenzyme (FwdC) and a molybdenum-containing isoenzyme (FmdC). The tungsten isoenzyme is constitutively transcribed, whereas transcription of the molybdenum operon is induced by molybdate.

    Proteins where this domain is known:
    PY03719   


    PF01494 - FAD_binding_3 (Pfam link)

    Interpro entry IPR002938 : Monooxygenase, FAD-binding (Interpro link)

    Pfam description:
    This domain is involved in FAD binding in a number of enzymes.

    Interpro description:
    Monooxygenases incorporate one hydroxyl group into substrates and are found in many metabolic pathways. In this reaction, two atoms of dioxygen are reduced to one hydroxyl group and one H2O molecule by the concomitant oxidation of NAD(P)H. P-hydroxybenzoate hydroxylase from Pseudomonas fluorescens contains this sequence motif (present in in flavoprotein hydroxylases) with a putative dual function in FAD and NADPH binding.

    Proteins where this domain is known:
    PY04441   


    PF01496 - V_ATPase_I (Pfam link)

    Interpro entry IPR002490 : ATPase, V0/A0 complex, 116-kDa subunit (Interpro link)

    Pfam description:
    This family consists of the 116kDa V-type ATPase (vacuolar (H+)-ATPases) subunits, as well as V-type ATP synthase subunit i. The V-type ATPases family are proton pumps that acidify intracellular compartments in eukaryotic cells for example yeast central vacuoles, clathrin-coated and synaptic vesicles. They have important roles in membrane trafficking processes. The 116kDa subunit (subunit a) in the V-type ATPase is part of the V0 functional domain responsible for proton transport. The a subunit is a transmembrane glycoprotein with multiple putative transmembrane helices it has a hydrophilic amino terminal and a hydrophobic carboxy terminal. It has roles in proton transport and assembly of the V-type ATPase complex. This subunit is encoded by two homologous gene in yeast VPH1 and STV1.

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    The V-ATPases (or V1V0-ATPase) and A-ATPases (or A1A0-ATPase) are each composed of two linked complexes: the V1 or A1 complex contains the catalytic core that hydrolyses/synthesizes ATP, and the V0 or A0 complex that forms the membrane-spanning pore. The V- and A-ATPases both contain rotary motors, one that drives proton translocation across the membrane and one that drives ATP synthesis/hydrolysis . The V- and A-ATPases more closely resemble one another in subunit structure than they do the F-ATPases, although the function of A-ATPases is closer to that of F-ATPases.

    This entry represents the 116-kDa subunit (or subunit a) and subunit I found in the V0 or A0 complex of V- or A-ATPases, respectively. The 116-kDa subunit is a transmembrane glycoprotein required for the assembly and proton transport activity of the ATPase complex. Several isoforms of the 116-kDa subunit exist, providing a potential role in the differential targeting and regulation of the V-ATPase for specific organelles.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY05420   


    PF01504 - PIP5K (Pfam link)

    Interpro entry IPR002498 : Phosphatidylinositol-4-phosphate 5-kinase, core (Interpro link)

    Pfam description:
    This family contains a region from the common kinase core found in the type I phosphatidylinositol-4-phosphate 5-kinase (PIP5K) family as described in. The family consists of various type I, II and III PIP5K enzymes. PIP5K catalyses the formation of phosphoinositol-4,5-bisphosphate via the phosphorylation of phosphatidylinositol-4-phosphate a precursor in the phosphinositide signaling pathway.

    Interpro description:
    This entry represents a conserved region from the common kinase core found in the type I phosphatidylinositol-4-phosphate 5-kinase (PIP5K) family as described in. This region is found in I, II and III phosphatidylinositol-4-phosphate 5-kinases (PIP5K enzymes). PIP5K catalyses the formation of phosphoinositol-4,5-bisphosphate via the phosphorylation of phosphatidylinositol-4-phosphate a precursor in the phosphinositide signalling pathway.

    Proteins where this domain is known:
    PY02398    PY07237   


    PF01507 - PAPS_reduct (Pfam link)

    Interpro entry IPR002500 : Phosphoadenosine phosphosulphate reductase (Interpro link)

    Pfam description:
    This domain is found in phosphoadenosine phosphosulfate (PAPS) reductase enzymes or PAPS sulfotransferase. PAPS reductase is part of the adenine nucleotide alpha hydrolases superfamily also including N type ATP PPases and ATP sulphurylases. The enzyme uses thioredoxin as an electron donor for the reduction of PAPS to phospho-adenosine-phosphate (PAP). It is also found in NodP nodulation protein P from Rizobium which has ATP sulfurylase activity (sulfate adenylate transferase).

    Interpro description:
    This domain is found in phosphoadenosine phosphosulphate (PAPS) reductase enzymes or PAPS sulphotransferase. PAPS reductase is part of the adenine nucleotide alpha hydrolases superfamily also including N type ATP PPases and ATP sulphurylases. The enzyme uses thioredoxin as an electron donor for the reduction of PAPS to phospho-adenosine-phosphate (PAP). It is also found in NodP nodulation protein P from Rhizobium meliloti (Sinorhizobium meliloti) which has ATP sulphurylase activity (sulphate adenylate transferase).

    Proteins where this domain is known:
    PY07439   


    PF01509 - TruB_N (Pfam link)

    Interpro entry IPR002501 : tRNA pseudouridine synthase B, N-terminal (Interpro link)

    Pfam description:
    Members of this family are involved in modifying bases in RNA molecules. They carry out the conversion of uracil bases to pseudouridine. This family includes TruB, a pseudouridylate synthase that specifically converts uracil 55 to pseudouridine in most tRNAs. This family also includes Cbf5p that modifies rRNA.

    Interpro description:

    Members of this family are involved in modifying bases in RNA molecules. They carry out the conversion of uracil bases to pseudouridine, specifically converting uracil-55 to pseudouridine in most tRNAs. This family also includes Cbf5p that modifies rRNA.

    Proteins where this domain is known:
    PY02575    PY02644    PY05447   


    PF01521 - Fe-S_biosyn (Pfam link)

    Interpro entry IPR000361 : (Interpro link)

    Pfam description:
    This family is involved in iron-sulphur cluster biosynthesis. Its members include proteins that are involved in nitrogen fixation such as the HesB and HesB-like proteins.

    Interpro description:

    The proteins in this entry are variously annotated as iron-sulphur cluster insertion protein or Fe/S biogenesis protein. They appear to be involved in Fe-S cluster biogenesis. This family includes IscA, HesB, YadR and YfhF-like proteins. The hesB gene is expressed only under nitrogen fixation conditions. IscA, an 11 kDa member of the hesB family of proteins, binds iron and [2Fe-2S] clusters, and participates in the biosynthesis of iron-sulphur proteins. IscA is able to bind at least 2 iron ions per dimer. Other members of this family include various hypothetical proteins that also contain the NifU-like domain suggesting that they too are able to bind iron and are involved in Fe-S cluster biogenesis. The HesB family are found in species as divergent as Homo sapiens (Human) and Haemophilus influenzae suggesting that these proteins are involved in basic cellular functions.

    Proteins where this domain is known:
    PY00043    PY01258    PY02064   


    PF01529 - zf-DHHC (Pfam link)

    Interpro entry IPR001594 : Zinc finger, DHHC-type (Interpro link)

    Pfam description:
    This domain is also known as NEW1. This domain is predicted to be a zinc binding domain. The function of this domain is unknown, but it has been predicted to be involved in protein-protein or protein-DNA interactions, and palmitoyltransferase activity.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents the DHHC-type zinc finger domain, which is also known as NEW1. The DHHC Zn-finger was first isolated in the Drosophila putative transcription factor DNZ1 . The function of this domain is unknown, but it has been predicted to be involved in protein-protein or protein-DNA interactions.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY00520    PY01955    PY02160    PY02619    PY03577    PY03989    PY05972    PY06597    PY07037   


    PF01535 - PPR (Pfam link)

    Interpro entry IPR002885 : (Interpro link)

    Pfam description:
    This repeat has no known function. It is about 35 amino acids long and found in up to 18 copies in some proteins. This family appears to be greatly expanded in plants. This repeat occurs in PET309 Swiss:P32522 that may be involved in RNA stabilisation. This domain occurs in crp1 that is involved in RNA processing. This repeat is associated with a predicted plant protein Swiss:O49549 that has a domain organisation similar to the human BRCA1 protein. The repeat has been called PPR.

    Interpro description:

    This entry represents the PPR repeat.

    Pentatricopeptide repeat (PPR) proteins are characterised by tandem repeats of a degenerate 35 amino acid motif. Most of PPR proteins have roles in mitochondria or plastid. PPR repeats were discovered while screening Arabidopsis proteins for those predicted to be targeted to mitochondria or chloroplast. Some of these proteins have been shown to play a role in post-transcriptional processes within organelles and they are thought to be sequence-specific RNA-binding proteins. Plant genomes have between one hundred to five hundred PPR genes per genome whereas non-plant genomes encode two to six PPR proteins.

    Although no PPR structures are yet known, the motif is predicted to fold into a helix-turn-helix structure similar to those found in the tetratricopeptide repeat (TPR) family (see.

    The plant PPR protein family has been divided in two subfamilies on the basis of their motif content and organisation.

    Examples of PPR repeat-containing proteins include PET309 which may be involved in RNA stabilisation, and crp1, which is involved in RNA processing. The repeat is associated with a predicted plant proteinthat has a domain organisation similar to the human BRCA1 protein.

    Proteins where this domain is known:
    PY03222    PY04935   

    Proteins where this domain has been detected by our approach:
    PY00025   


    PF01536 - SAM_decarbox (Pfam link)

    Interpro entry IPR001985 : S-adenosylmethionine decarboxylase (Interpro link)

    Pfam description:
    This is a family of S-adenosylmethionine decarboxylase (SAMDC) proenzymes. In the biosynthesis of polyamines SAMDC produces decarboxylated S-adenosylmethionine, which serves as the aminopropyl moiety necessary for spermidine and spermine biosynthesis from putrescine. The Pfam alignment contains both the alpha and beta chains that are cleaved to form the active enzyme.

    Interpro description:

    S-adenosylmethionine decarboxylase (AdoMetDC) catalyzes the removal of the carboxylate group of S-adenosylmethionine to form S-adenosyl-5'-3-methylpropylamine which then acts as the n-propylamine group donor in the synthesis of the polyamines spermidine and spermine from putrescine.

    The catalytic mechanism of AdoMetDC involves a covalently-bound pyruvoyl group. This group is post-translationally generated by a self-catalyzed intramolecular proteolytic cleavage reaction between a glutamate and a serine. This cleavage generates two chains, beta (N-terminal) and alpha (C-terminal). The N-terminal serine residue of the alpha chain is then converted by nonhydrolytic serinolysis into a pyruvyol group.

    Proteins where this domain is known:
    PY04754   


    PF01541 - GIY-YIG (Pfam link)

    Interpro entry IPR000305 : Excinuclease ABC, C subunit, N-terminal (Interpro link)

    Pfam description:
    This domain called GIY-YIG is found in the amino terminal region of excinuclease abc subunit c (uvrC), bacteriophage T4 endonucleases segA, segB, segC, segD and segE; it is also found in putative endonucleases encoded by group I introns of fungi and phage. The structure of I-TevI a GIY-YIG endonuclease, reveals a novel alpha/beta-fold with a central three-stranded antiparallel beta-sheet flanked by three helices. The most conserved and putative catalytic residues are located on a shallow, concave surface and include a metal coordination site.

    Interpro description:

    During the process of Escherichia coli nucleotide excision repair, DNA damage recognition and processing are achieved by the action of the uvrA, uvrB, and uvrC gene products. The UvrC proteins contain 4 conserved regions: a central region which interacts with UvrB (Uvr domain), a Helix hairpin Helix (HhH) domain important for 5 prime incision of damage DNA and the homology regions 1 and 2 of unknown function. UvrC homology region 2 is specific for UvrC proteins, whereas UvrC homology region 1 is also shared by few other nucleases.

    It is found in the amino terminal region of excinuclease abc subunit c (uvrC), Bacteriophage T4 endonucleases segA, segB, segC, segD and segE; it is also found in putative endonucleases encoded by group I introns of fungi and phage.

    Proteins where this domain has been detected by our approach:
    PY03673   


    PF01544 - CorA (Pfam link)

    Interpro entry IPR002523 : Mg2+ transporter protein, CorA-like (Interpro link)

    Pfam description:
    The CorA transport system is the primary Mg2+ influx system of Salmonella typhimurium and Escherichia coli. CorA is virtually ubiquitous in the Bacteria and Archaea. There are also eukaryotic relatives of this protein. The family includes the MRS2 protein Swiss:Q01926 from yeast that is thought to be an RNA splicing protein. However its membership of this family suggests that its effect on splicing is due to altered magnesium levels in the cell.

    Interpro description:
    The CorA transport system is the primary Mg2+ influx system of Salmonella typhimurium and Escherichia coli. CorA is virtually ubiquitous in the Bacteria and Archaea. There are also eukaryotic relatives of this protein.

    Proteins where this domain is known:
    PY00890    PY07158   


    PF01545 - Cation_efflux (Pfam link)

    Interpro entry IPR002524 : Cation efflux protein (Interpro link)

    Pfam description:
    Members of this family are integral membrane proteins, that are found to increase tolerance to divalent metal ions such as cadmium, zinc, and cobalt. These proteins are thought to be efflux pumps that remove these ions from cells.

    Interpro description:

    Members of this family are integral membrane proteins, that are found to increase tolerance to divalent metal ions such as cadmium, zinc, and cobalt. These proteins are considered to be efflux pumps that remove these ions from cells, however others are implicated in ion uptake. The family has six predicted transmembrane domains. Members of the family are variable in length because of variably sized inserts, often containing low-complexity sequence.

    Proteins where this domain is known:
    PY07157   


    PF01553 - Acyltransferase (Pfam link)

    Interpro entry IPR002123 : Phospholipid/glycerol acyltransferase (Interpro link)

    Pfam description:
    This family contains acyltransferases involved in phospholipid biosynthesis and other proteins of unknown function. This family also includes tafazzin Swiss:Q16635, the Barth syndrome gene.

    Interpro description:

    This family contains acyltransferases involved in phospholipid biosynthesis and other proteins of unknown function. This domain is found in tafazzins, defects in which are the cause of Barth syndrome; a severe inherited disorder which is often fatal in childhood and is characterised by cardiac and skeletal abnormalities. Phospholipid/glycerol acyltransferase is not found in the viruses or the archaea and is under represented in the bacteria. Bacterial glycerol-phosphate acyltransferases are involved in membrane biogenesis since they use fatty acid chains to form the first membrane phospholipids.

    Proteins where this domain is known:
    PY01678    PY02486    PY06015   


    PF01555 - N6_N4_Mtase (Pfam link)

    Interpro entry IPR002941 : DNA methylase N-4/N-6 (Interpro link)

    Pfam description:
    Members of this family are DNA methylases. The family contains both N-4 cytosine-specific DNA methylases and N-6 Adenine-specific DNA methylases.

    Interpro description:

    This domain is found in DNA methylases. In prokaryotes, the major role of DNA methylation is to protect host DNA against degradation by restriction enzymes. This family contains both N-4 cytosine-specific DNA methylases and N-6 Adenine-specific DNA methylases. N-4 cytosine-specific DNA methylases are enzymes that specifically methylate the amino group at the C-4 position of cytosines in DNA. Such enzymes are found as components of type II restriction-modification systems in prokaryotes. Such enzymes recognise a specific sequence in DNA and methylate a cytosine in that sequence. By this action they protect DNA from cleavage by type II restriction enzymes that recognise the same sequence. N-6 adenine-specific DNA methylases (A-Mtase) are enzymes that specifically methylate the amino group at the C-6 position of adenines in DNA. Such enzymes are found in the three existing types of bacterial restriction-modification systems (in type I system the A-Mtase is the product of the hsdM gene, and in type III it is the product of the mod gene). All of these enzymes recognise a specific sequence in DNA and methylate an adenine in that sequence.

    Proteins where this domain has been detected by our approach:
    PY05313   


    PF01556 - DnaJ_C (Pfam link)

    Interpro entry IPR002939 : Chaperone DnaJ, C-terminal (Interpro link)

    Pfam description:
    This family consists of the C terminal region form the DnaJ protein. Although the function of this region is unknown, it is always found associated with Pfam:PF00226 and Pfam:PF00684. DnaJ is a chaperone associated with the Hsp70 heat-shock system involved in protein folding and renaturation after stress.

    Interpro description:

    Molecular chaperones are a diverse family of proteins that function to protect proteins in the intracellular milieu from irreversible aggregation during synthesis and in times of cellular stress. The bacterial molecular chaperone DnaK is an enzyme that couples cycles of ATP binding, hydrolysis, and ADP release by an N-terminal ATP-hydrolizing domain to cycles of sequestration and release of unfolded proteins by a C-terminal substrate binding domain. Dimeric GrpE is the co-chaperone for DnaK, and acts as a nucleotide exchange factor, stimulating the rate of ADP release 5000-fold. DnaK is itself a weak ATPase; ATP hydrolysis by DnaK is stimulated by its interaction with another co-chaperone, DnaJ. Thus the co-chaperones DnaJ and GrpE are capable of tightly regulating the nucleotide-bound and substrate-bound state of DnaK in ways that are necessary for the normal housekeeping functions and stress-related functions of the DnaK molecular chaperone cycle.

    Besides stimulating the ATPase activity of DnaK through its J-domain, DnaJ also associates with unfolded polypeptide chains and prevents their aggregation. Thus, DnaK and DnaJ may bind to one and the same polypeptide chain to form a ternary complex. The formation of a ternary complex may result in cis-interaction of the J-domain of DnaJ with the ATPase domain of DnaK. An unfolded polypeptide may enter the chaperone cycle by associating first either with ATP-liganded DnaK or with DnaJ. DnaK interacts with both the backbone and side chains of a peptide substrate; it thus shows binding polarity and admits only L-peptide segments. In contrast, DnaJ has been shown to bind both L- and D-peptides and is assumed to interact only with the side chains of the substrate.

    This domain consists of the C-terminal region of the DnaJ protein. Although the function of this region is unknown, it is always found associated withand

    Proteins where this domain is known:
    PY02476    PY02986    PY03544    PY04093    PY07104   


    PF01564 - Spermine_synth (Pfam link)

    Interpro entry IPR001045 : Spermine synthase (Interpro link)

    Pfam description:
    Spermine and spermidine are polyamines. This family includes spermidine synthase that catalyses the fifth (last) step in the biosynthesis of spermidine from arginine, and spermine synthase.

    Interpro description:
    Synonym(s): Spermidine aminopropyltransferase

    A group of polyamine biosynthetic enzymes involved in the fifth (last) step in the biosynthesis of spermidine from arginine and methionine which includes; spermidine synthase, spermine synthase and putrescine N-methyltransferase.

    The Thermotoga maritima spermidine synthase monomer consists of two domains: an N-terminal domain composed of six beta-strands, and a Rossmann-like C- terminal domain. The larger C-terminal catalytic core domain consists of a seven-stranded beta-sheet flanked by nine alpha helices. This domain resembles a topology observed in a number of nucleotide and dinucleotide-binding enzymes, and in S-adenosyl-L-methionine (AdoMet)- dependent methyltransferase (MTases).

    Proteins where this domain is known:
    PY04976    PY04977   


    PF01566 - Nramp (Pfam link)

    Interpro entry IPR001046 : Natural resistance-associated macrophage protein (Interpro link)

    Pfam description:
    The natural resistance-associated macrophage protein (NRAMP) family consists of Nramp1, Nramp2, and yeast proteins Smf1 and Smf2. The NRAMP family is a novel family of functional related proteins defined by a conserved hydrophobic core of ten transmembrane domains. This family of membrane proteins are divalent cation transporters. Nramp1 is an integral membrane protein expressed exclusively in cells of the immune system and is recruited to the membrane of a phagosome upon phagocytosis. By controlling divalent cation concentrations Nramp1 may regulate the interphagosomal replication of bacteria. Mutations in Nramp1 may genetically predispose an individual to susceptibility to diseases including leprosy and tuberculosis conversely this might however provide protection form rheumatoid arthritis. Nramp2 is a multiple divalent cation transporter for Fe2+, Mn2+ and Zn2+ amongst others it is expressed at high levels in the intestine; and is major transferrin-independent iron uptake system in mammals. The yeast proteins Smf1 and Smf2 may also transport divalent cations.

    Interpro description:

    The natural resistance-associated macrophage protein (NRAMP) family consists of Nramp1, Nramp2, and yeast proteins Smf1 and Smf2. The NRAMP family is a novel family of functionally related proteins defined by a conserved hydrophobic core of ten transmembrane domains. Nramp1 is an integral membrane protein expressed exclusively in cells of the immune system and is recruited to the membrane of a phagosome upon phagocytosis. Nramp2 is a multiple divalent cation transporter for Fe2+, Mn2+ and Zn2+ amongst others. It is expressed at high levels in the intestine; and is major transferrin-independent iron uptake system in mammals. The yeast proteins Smf1 and Smf2 may also transport divalent cations.

    The natural resistance of mice to infection with intracellular parasites is controlled by the Bcg locus, which modulates the cytostatic/cytocidal activity of phagocytes. Nramp1, the gene responsible, is expressed exclusively in macrophages and poly-morphonuclear leukocytes, and encodes a polypeptide (natural resistance-associated macrophage protein) with features typical of integral membrane proteins. Other transporter proteins from a variety of sources also belong to this family.

    Proteins where this domain is known:
    PY04720   


    PF01571 - GCV_T (Pfam link)

    Interpro entry IPR006222 : Glycine cleavage T-protein, N-terminal (Interpro link)

    Pfam description:
    This is a family of glycine cleavage T-proteins, part of the glycine cleavage multienzyme complex (GCV) found in bacteria and the mitochondria of eukaryotes. GCV catalyses the catabolism of glycine in eukaryotes. The T-protein is an aminomethyl transferase.

    Interpro description:
    This is a family of glycine cleavage T-proteins, part of the glycine cleavage multienzyme complex (GCV) found in bacteria and the mitochondria of eukaryotes. GCV catalyses the catabolism of glycine in eukaryotes. The T-protein is an aminomethyl transferasethat catalyses the following reaction:
     (6S)-tetrahydrofolate + S-aminomethyldihydrolipoylprotein = (6R)-5,10-methylenetetrahydrofolate + NH3 + dihydrolipoylprotein 

    Proteins where this domain is known:
    PY02845    PY05620   


    PF01585 - G-patch (Pfam link)

    Interpro entry IPR000467 : D111/G-patch (Interpro link)

    Pfam description:
    This domain is found in a number of RNA binding proteins, and is also found in proteins that contain RNA binding domains. This suggests that this domain may have an RNA binding function. This domain has seven highly conserved glycines.

    Interpro description:
    The D111/G-patch domain is a short conserved region of about 40 amino acids which occurs in a number of putative RNA-binding proteins, including tumor suppressor and DNA-damage-repair proteins, suggesting that this domain may have an RNA binding function. This domain has seven highly conserved glycines. A multiple alignment of a small subset of D111/G-patch domains is shown in Fig. 2b of.

    Proteins where this domain is known:
    PY00795    PY03703   


    PF01588 - tRNA_bind (Pfam link)

    Interpro entry IPR002547 : tRNA-binding region (Interpro link)

    Pfam description:
    This domain is found in prokaryotic methionyl-tRNA synthetases, prokaryotic phenylalanyl tRNA synthetases the yeast GU4 nucleic-binding protein (G4p1 or p42, ARC1), human tyrosyl-tRNA synthetase, and endothelial-monocyte activating polypeptide II. G4p1 binds specifically to tRNA form a complex with methionyl-tRNA synthetases. In human tyrosyl-tRNA synthetase this domain may direct tRNA to the active site of the enzyme. This domain may perform a common function in tRNA aminoacylation.

    Interpro description:
    This domain is found in prokaryotic methionyl-tRNA synthetases, prokaryotic phenylalanyl tRNA synthetases the yeast GU4 nucleic-binding protein (G4p1 or p42, ARC1), human tyrosyl-tRNA synthetase, and endothelial-monocyte activating polypeptide II. G4p1 binds specifically to tRNA form a complex with methionyl-tRNA synthetases. In human tyrosyl-tRNA synthetase this domain may direct tRNA to the active site of the enzyme. This domain may perform a common function in tRNA aminoacylation.

    Proteins where this domain is known:
    PY02994   


    PF01590 - GAF (Pfam link)

    Interpro entry IPR003018 : (Interpro link)

    Pfam description:
    Domain present in phytochromes and cGMP-specific phosphodiesterases.

    Interpro description:
    This domain is present in phytochromes and cGMP-specific phosphodiesterases. cGMP-dependent 3',5'-cyclic phosphodiesterase catalyses the conversion of guanosine 3',5'-cyclic phosphate to guanosine 5'-phosphate. A phytochrome is a regulatory photoreceptor which exists in 2 forms that are reversibly interconvertible by light, the PR form that absorbs maximally in the red region of the spectrum, and the PFR form that absorbs maximally in the far-red region. This domain is also found in NifA, a transcriptional activator which is required for activation of most Nif operons which are directly involved in nitrogen fixation. NifA interacts with sigma-54.

    Proteins where this domain is known:
    PY04596   

    Proteins where this domain has been detected by our approach:
    PY06545    PY07272   


    PF01592 - NifU_N (Pfam link)

    Interpro entry IPR002871 : NIF system FeS cluster assembly, NifU, N-terminal (Interpro link)

    Pfam description:
    This domain is found in NifU in combination with Pfam:PF01106. This domain is found on isolated in several bacterial species such as Swiss:O53156. The nif genes are responsible for nitrogen fixation. However this domain is found in bacteria that do not fix nitrogen, so it may have a broader significance in the cell than nitrogen fixation. These proteins appear to be scaffold proteins for iron-sulfur clusters.

    Interpro description:

    Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S] form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.

    The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transfering them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly.

    The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. No specific functions have been assigned to SufB and SufD. SufA is homologous to IscA, acting as a scaffold protein in which Fe and S atoms are assembled into [FeS] cluster forms, which can then easily be transferred to apoproteins targets.

    In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen.

    This entry represents the N-terminal of NifU and homologous proteins. NifU contains two domains: an N-terminal and a C-terminal domain. These domains exist either together or on different polypeptides, both domains being found in organisms that do not fix nitrogen (e.g. yeast), so they have a broader significance in the cell than nitrogen fixation.

    Proteins where this domain is known:
    PY00856   


    PF01593 - Amino_oxidase (Pfam link)

    Interpro entry IPR002937 : Amine oxidase (Interpro link)

    Pfam description:
    This family consists of various amine oxidases, including maze polyamine oxidase (PAO) and various flavin containing monoamine oxidases (MAO). The aligned region includes the flavin binding site of these enzymes. The family also contains phytoene dehydrogenases and related enzymes. In vertebrates MAO plays an important role regulating the intracellular levels of amines via there oxidation; these include various neurotransmitters, neurotoxins and trace amines. In lower eukaryotes such as aspergillus and in bacteria the main role of amine oxidases is to provide a source of ammonium. PAOs in plants, bacteria and protozoa oxidase spermidine and spermine to an aminobutyral, diaminopropane and hydrogen peroxide and are involved in the catabolism of polyamines. Other members of this family include tryptophan 2-monooxygenase, putrescine oxidase, corticosteroid binding proteins and antibacterial glycoproteins.

    Interpro description:
    This entry consists of various amine oxidases, including maize polyamine oxidase (PAO), L-amino acid oxidases (LAO) and various flavin containing monoamine oxidases (MAO). The aligned region includes the flavin binding site of these enzymes. In vertebrates MAO plays an important role in regulating the intracellular levels of amines via their oxidation; these include various neurotransmitters, neurotoxins and trace amines. In lower eukaryotes such as aspergillus and in bacteria the main role of amine oxidases is to provide a source of ammonium. PAOs in plants, bacteria and protozoa oxidise spermidine and spermine to an aminobutyral, diaminopropane and hydrogen peroxide and are involved in the catabolism of polyamines. Other members of this family include tryptophan 2-monooxygenase, putrescine oxidase, corticosteroid binding proteins and antibacterial glycoproteins.

    Proteins where this domain is known:
    PY02606    PY03791   


    PF01595 - DUF21 (Pfam link)

    Interpro entry IPR002550 : (Interpro link)

    Pfam description:
    This transmembrane region has no known function. Many of the sequences in this family are annotated as hemolysins, however this is due to a similarity to Swiss:Q54318 that does not contain this domain. This domain is found in the N-terminus of the proteins adjacent to two intracellular CBS domains Pfam:PF00571.

    Interpro description:
    This transmembrane region has no known function. Many of the sequences in this family are annotated as hemolysins, however this is due to a similarity tothat does not contain this domain. This domain is found in the N terminus of the proteins adjacent to two intracellular CBS domains.

    Proteins where this domain is known:
    PY03208   


    PF01597 - GCV_H (Pfam link)

    Interpro entry IPR002930 : Glycine cleavage H-protein (Interpro link)

    Pfam description:
    This is a family of glycine cleavage H-proteins, part of the glycine cleavage multienzyme complex (GCV) found in bacteria and the mitochondria of eukaryotes. GCV catalyses the catabolism of glycine in eukaryotes. A lipoyl group is attached to a completely conserved lysine residue. The H protein shuttles the methylamine group of glycine from the P protein to the T protein.

    Interpro description:

    This is a family of glycine cleavage H-proteins, part of the glycine cleavage multienzyme complex (GCV) found in bacteria and the mitochondria of eukaryotes. GCV catalyses the catabolism of glycine in eukaryotes. A lipoyl group is attached to a completely conserved lysine residue. The H protein shuttles the methylamine group of glycine from the P protein to the T protein.

    Proteins where this domain is known:
    PY05949   


    PF01599 - Ribosomal_S27 (Pfam link)

    Interpro entry IPR002906 : Ribosomal protein S27a (Interpro link)

    Pfam description:
    This family of ribosomal proteins consists mainly of the 40S ribosomal protein S27a which is synthesised as a C-terminal extension of ubiquitin (CEP). The S27a domain compromises the C-terminal half of the protein. The synthesis of ribosomal proteins as extensions of ubiquitin promotes their incorporation into nascent ribosomes by a transient metabolic stabilisation and is required for efficient ribosome biogenesis. The ribosomal extension protein S27a contains a basic region that is proposed to form a zinc finger; its fusion gene is proposed as a mechanism to maintain a fixed ratio between ubiquitin necessary for degrading proteins and ribosomes a source of proteins.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This family of ribosomal proteins consists mainly of the 40S ribosomal protein S27a which is synthesized as a C-terminal extension of ubiquitin (CEP). The S27a domain compromises the C-terminal half of the protein. The synthesis of ribosomal proteins as extensions of ubiquitin promotes their incorporation into nascent ribosomes by a transient metabolic stabilisation and is required for efficient ribosome biogenesis. The ribosomal extension protein S27a contains a basic region that is proposed to form a zinc finger; its fusion gene is proposed as a mechanism to maintain a fixed ratio between ubiquitin necessary for degrading proteins and ribosomes a source of proteins.

    Proteins where this domain is known:
    PY00122   


    PF01602 - Adaptin_N (Pfam link)

    Interpro entry IPR002553 : Clathrin/coatomer adaptor, adaptin-like, N-terminal (Interpro link)

    Pfam description:
    This family consists of the N terminal region of various alpha, beta and gamma subunits of the AP-1, AP-2 and AP-3 adaptor protein complexes. The adaptor protein (AP) complexes are involved in the formation of clathrin-coated pits and vesicles. The N-terminal region of the various adaptor proteins (APs) is constant by comparison to the C-terminal which is variable within members of the AP-2 famil; and it has been proposed that this constant region interacts with another uniform component of the coated vesicles.

    Interpro description:

    Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer.

    Clathrin coats contain both clathrin and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors. All AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). Each subunit has a specific function. Adaptin subunits recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal appendage domains. By contrast, GGAs are monomers composed of four domains, which have functions similar to AP subunits: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The GAE domain is similar to the AP gamma-adaptin ear domain, being responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis.

    While clathrin mediates endocytic protein transport from ER to Golgi, coatomers (COPI, COPII) primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits.

    This entry represents the N-terminal domain of various adaptins from different AP clathrin adaptor complexes (including AP1, AP2, AP3 and AP4), and from the beta and gamma subunits of various coatomer (COP) adaptors. This domain has a 2-layer alpha/alpha fold that forms a right-handed superhelix, and is a member of the ARM repeat superfamily. The N-terminal region of the various AP adaptor proteins share strong sequence identity; by contrast, the C-terminal domains of different adaptins share similar structural folds, but have little sequence identity. It has been proposed that the N-terminal domain interacts with another uniform component of the coated vesicles.

    More information about these proteins can be found at Protein of the Month: Clathrin.

    Proteins where this domain is known:
    PY00718    PY01150    PY01272    PY01282    PY01387    PY01672    PY03000    PY04230    PY05746   


    PF01612 - 3_5_exonuc (Pfam link)

    Interpro entry IPR002562 : 3'-5' exonuclease (Interpro link)

    Pfam description:
    This domain is responsible for the 3\'-5\' exonuclease proofreading activity of E. coli DNA polymerase I (polI) and other enzymes, it catalyses the hydrolysis of unpaired or mismatched nucleotides. This domain consists of the amino-terminal half of the Klenow fragment in E. coli polI it is also found in the Werner syndrome helicase (WRN), focus forming activity 1 protein (FFA-1) and ribonuclease D (RNase D). Werner syndrome is a human genetic disorder causing premature aging; the WRN protein has helicase activity in the 3\'-5\' direction. The FFA-1 protein is required for formation of a replication foci and also has helicase activity; it is a homologue of the WRN protein. RNase D is a 3\'-5\' exonuclease involved in tRNA processing. Also found in this family is the autoantigen PM/Scl thought to be involved in polymyositis-scleroderma overlap syndrome.

    Interpro description:

    This domain is responsible for the 3'-5' exonuclease proofreading activity of Escherichia coli DNA polymerase I (polI) and other enzymes, it catalyses the hydrolysis of unpaired or mismatched nucleotides. This domain consists of the amino-terminal half of the Klenow fragment in E. coli polI it is also found in the Werner syndrome helicase (WRN), focus forming activity 1 protein (FFA-1) and ribonuclease D (RNase D).

    Proteins where this domain is known:
    PY00092    PY00163    PY00900    PY00977    PY02082    PY06866   


    PF01624 - MutS_I (Pfam link)

    Interpro entry IPR007695 : DNA mismatch repair protein MutS-like, N-terminal (Interpro link)

    Pfam description:
    This domain is found in proteins of the MutS family (DNA mismatch repair proteins) and is found associated with Pfam:PF00488, Pfam:PF05188, Pfam:PF05192 and Pfam:PF05190. The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair; other members of the family included the eukaryotic MSH 1,2,3, 4,5 and 6 proteins. These have various roles in DNA repair and recombination. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein. The aligned region corresponds with globular domain I, which is involved in DNA binding, in Thermus aquaticus MutS as characterised in.

    Interpro description:

    Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA.

    MutS is a modular protein with a complex structure, and is composed of:

    Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts.

    This entry represents the N-terminal domain of proteins in the MutS family of DNA mismatch repair proteins, as well as closely related proteins. The N-terminal domain of MutS is responsible for mismatch recognition and forms a 6-stranded mixed beta-sheet surrounded by three alpha-helices, which is similar to the structure of tRNA endonuclease. Yeast MSH3, bacterial proteins involved in DNA mismatch repair, and the predicted protein product of the Rep-3 gene of mouse share extensive sequence similarity. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein.

    Proteins where this domain is known:
    PY02936    PY03673   


    PF01632 - Ribosomal_L35p (Pfam link)

    Interpro entry IPR001706 : Ribosomal protein L35 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    L35 is a basic protein of 60 to 70 amino-acid residues from the large (50S) subunit. Like many basic polypeptides, L35 completely inhibits ornithine decarboxylase when present unbound in the cell, but the inhibitory function is abolished upon its incorporation into ribosomes. It belongs to a family of ribosomal proteins, including L35 from bacteria, plant chloroplast, red algae chloroplasts and cyanelles. In plants it is a nuclear encoded gene product, which suggests a chloroplast-to-nucleus relocation during the evolution of higher plants.

    Proteins where this domain is known:
    PY01737   


    PF01633 - Choline_kinase (Pfam link)

    Interpro entry IPR002573 : (Interpro link)

    Pfam description:
    Choline kinase catalyses the committed step in the synthesis of phosphatidylcholine by the CDP-choline pathway. This alignment covers the protein kinase portion of the protein. The divergence of this family makes it very difficult to create a model that specifically predicts choline/ethanolamine kinases only. However if is also present then it is definitely a member of this family.

    Interpro description:

    Choline kinase, (ATP:choline phosphotransferase) belongs to the choline/ethanolamine kinase family.

    Ethanolamine and choline are major membrane phospholipids, in the form of glycerophosphoethanolamine and glycerophosphocholine. Ethanolamine is also a component of the glycosylphosphatidylinositol (GPI) anchor, which is necessary for cell-surface protein attachment. The de novo synthesis of these phospholipids begins with the creation of phosphoethanolamine and phosphocholine by ethanolamine and choline kinases in the first step of the CDP-ethanolamine pathway. There are two putative choline/ethanolamine kinases (C/EKs) in the Trypanosoma brucei genome.

    Ethanolamine kinase has no choline kinase activity and its activity is inhibited by ADP. Inositol supplementation represses ethanolamine kinase, decreasing the incorporation of ethanolamine into the CDP-ethanolamine pathway and into phosphatidylethanolamine and phosphatidylcholine.

    Proteins where this domain is known:
    PY00116    PY00818   


    PF01645 - Glu_synthase (Pfam link)

    Interpro entry IPR002932 : Glutamate synthase, central-C (Interpro link)

    Pfam description:
    This family represents a region of the glutamate synthase protein. This region is expressed as a separate subunit in the glutamate synthase alpha subunit from archaebacteria, or part of a large multidomain enzyme in other organisms. The aligned region of these proteins contains a putative FMN binding site and Fe-S cluster.

    Interpro description:

    Ferredoxin-dependent glutamate synthases have been implicated in a number of functions including photorespiration in Arabidopsis where they may also play a role in primary nitrogen assimilation in roots. This region is expressed as a seperate subunit in the glutamate synthase alpha subunit from archaebacteria, or part of a large multidomain enzyme in other organisms.

    The aligned region of these proteins contains a putative FMN binding site and Fe-S cluster.

    Proteins where this domain is known:
    PY03719   


    PF01648 - ACPS (Pfam link)

    Interpro entry IPR008278 : 4'-phosphopantetheinyl transferase (Interpro link)

    Pfam description:
    Members of this family transfers the 4\'-phosphopantetheine (4\'-PP) moiety from coenzyme A (CoA) to the invariant serine of Pfam:PF00550. This post-translational modification renders holo-ACP capable of acyl group activation via thioesterification of the cysteamine thiol of 4\'-PP. This superfamily consists of two subtypes: The ACPS type such as Swiss:P24224 and the Sfp type such as Swiss:P39135. The structure of the Sfp type is known, which shows the active site accommodates a magnesium ion. The most highly conserved regions of the alignment are involved in binding the magnesium ion.

    Interpro description:

    These proteins transfer the 4'-phosphopantetheine (4'-PP) moiety from coenzyme A (CoA) to the invariant serine of pp-binding. This post-translational modification renders holo-ACP capable of acyl group activation via thioesterification of the cysteamine thiol of 4'-PP. This superfamily consists of two subtypes: The ACPS type such as ACPS_ECOLI and the Sfp type such as SFP_BACSU. The structure of the Sfp type is known, which shows the active site accommodates a magnesium ion. The most highly conserved regions of the alignment are involved in binding the magnesium ion.

    Proteins where this domain is known:
    PY06285   


    PF01652 - IF4E (Pfam link)

    Interpro entry IPR001040 : Eukaryotic translation initiation factor 4E (eIF-4E) (Interpro link)

    Interpro description:
    Eukaryotic translation initiation factor 4E (eIF-4E) is a protein that binds to the cap structure of eukaryotic cellular mRNAs. eIF-4E recognises and binds the 7-methylguanosine-containing (m7Gppp) cap during an early step in the initiation of protein synthesis and facilitates ribosome binding to a mRNA by inducing the unwinding of its secondary structures. A tryptophan in the central part of the sequence of human eIF-4E seems to be implicated in cap-binding.

    Proteins where this domain is known:
    PY04859   


    PF01655 - Ribosomal_L32e (Pfam link)

    Interpro entry IPR001515 : Ribosomal protein L32e (Interpro link)

    Pfam description:
    This family includes ribosomal protein L32 from eukaryotes and archaebacteria.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    The L32e family consists of proteins that have 135 to 240 amino-acid residues.

    Proteins where this domain is known:
    PY01152   


    PF01658 - Inos-1-P_synth (Pfam link)

    Interpro entry IPR013021 : (Interpro link)

    Pfam description:
    This is a family of myo-inositol-1-phosphate synthases. Inositol-1-phosphate catalyses the conversion of glucose-6- phosphate to inositol-1-phosphate, which is then dephosphorylated to inositol. Inositol phosphates play an important role in signal transduction.

    Interpro description:

    This is a region of myo-inositol-1-phosphate synthases that is related to the glyceraldehyde-3-phosphate dehydrogenase-like, C-terminal domain.

    1L-myo-Inositol-1-phosphate synthase catalyzes the conversion of D-glucose 6-phosphate to 1L-myo-inositol-1-phosphate, the first committed step in the production of all inositol-containing compounds, including phospholipids, either directly or by salvage. The enzyme exists in a cytoplasmic form in a wide range of plants, animals, and fungi. It has also been detected in several bacteria and a chloroplast form is observed in alga and higher plants. Inositol phosphates play an important role in signal transduction.

    In Saccharomyces cerevisiae (Baker's yeast), the transcriptional regulation of the INO1 gene has been studied in detail and its expression is sensitive to the availability of phospholipid precursors as well as growth phase. The regulation of the structural gene encoding 1L-myo-inositol-1-phosphate synthase has also been analyzed at the transcriptional level in the aquatic angiosperm, Spirodela polyrrhiza (Giant duckweed) and the halophyte, Mesembryanthemum crystallinum (Common ice plant).

    Proteins where this domain is known:
    PY03995   


    PF01661 - Macro (Pfam link)

    Interpro entry IPR002589 : (Interpro link)

    Pfam description:
    This domain is an ADP-ribose binding module. It is found in a number of otherwise unrelated proteins. It is found at the C-terminus of the macro-H2A histone protein Swiss:Q02874. This domain is found in the non-structural proteins of several types of ssRNA viruses such as NSP3 from alphaviruses Swiss:P03317. This domain is also found on its own in a family of proteins from bacteria Swiss:P75918, archaebacteria Swiss:O59182 and eukaryotes Swiss:Q17432.

    Interpro description:

    The Macro or A1pp domain is a module of about 180 amino acids which can bind ADP-ribose, an NAD metabolite or related ligands. The domain was described originally in association with ADP-ribose 1''-phosphate (Appr-1''-P) processing activity (A1pp) of the yeast YBR022W protein. The domain is also called Macro domain as it is the C-terminal domain of mammalian core histone macro-H2A. Macro domain proteins can be found in eukaryotes, in (mostly pathogenic) bacteria, in archaea and in ssRNA viruses, such as coronaviruses, Rubella and Hepatitis E viruses. In vertebrates the domain occurs e.g. in histone macroH2A, in predicted poly-ADP-ribose polymerases (PARPs) and in B aggressive lymphoma (BAL) protein. The macro domain can be associated with catalytic domains, such as PARP, or sirtuin. The Macro domain can recognize ADP-ribose or in some cases poly-ADP-ribose, which can be involved in ADP-ribosylation reactions that occur in important processes, such as chromatin biology, DNA repair and transcription regulation. The human macroH2A1.1 Macro domain binds an NAD metabolite O-acetyl-ADP-ribose. The Macro domain has been suggested to play a regulatory role in ADP-ribosylation, which is involved in inter- and intracellular signaling, transcriptional regulation, DNA repair pathways and maintenance of genomic stability, telomere dynamics, cell differentiation and proliferation, and necrosis and apoptosis.

    The 3D structure of the Macro domain has a mixed alpha/beta fold of a mixed beta sheet sandwiched between four helices. Several Macro domain only domains are shorter than the structure of AF1521 and lack either the first strand or the C-terminal helix 5. Well conserved residues form a hydrophobic cleft and cluster around the AF1521-ADP-ribose binding site.

    Proteins where this domain is known:
    PY00478    PY05109   


    PF01667 - Ribosomal_S27e (Pfam link)

    Interpro entry IPR000592 : Ribosomal protein S27e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaeal ribosomal proteins can be grouped on the basis of sequence similarities. One of these families include mammalian, yeast, Chlamydomonas reinhardtii and Entamoeba histolytica S27, and Methanocaldococcus jannaschii (Methanococcus jannaschii) MJ0250. These proteins have from 62 to 87 amino acids. They contain, in their central section, a putative zinc-finger region of the type C-x(2)-C-x(14)-C-x(2)-C.

    Proteins where this domain is known:
    PY03153   


    PF01680 - SOR_SNZ (Pfam link)

    Interpro entry IPR001852 : (Interpro link)

    Pfam description:
    Members of this family are enzymes involved in a new pathway of pyridoxine/pyridoxal 5-phosphate biosynthesis. This family was formerly known as UPF0019.

    Interpro description:

    Snz1p is a highly conserved protein involved in growth arrest in Saccharomyces cerevisiae (Baker's yeast). Sor1 (singlet oxygen resistance) is essential in pyridoxine (vitamin B6) synthesis in Cercospora nicotianae and Aspergillus flavus. Pyridoxine quenches singlet oxygen at a rate comparable to that of vitamins C and E, two of the most highly efficient biological antioxidants, suggesting a previously unknown role for pyridoxine in active oxygen resistance..

    Proteins where this domain is known:
    PY04475   


    PF01687 - Flavokinase (Pfam link)

    Interpro entry IPR015865 : Riboflavin kinase (Interpro link)

    Pfam description:
    This family represents the C-terminal region of the bifunctional riboflavin biosynthesis protein known as RibC in Bacillus subtilis. The RibC protein from Bacillus subtilis has both flavokinase and flavin adenine dinucleotide synthetase (FAD-synthetase) activities. RibC plays an essential role in the flavin metabolism. This domain is thought to have kinase activity.

    Interpro description:

    Riboflavin is converted into catalytically active cofactors (FAD and FMN) by the actions of riboflavin kinase, which converts it into FMN, and FAD synthetase, which adenylates FMN to FAD. Eukaryotes usually have two separate enzymes, while most prokaryotes have a single bifunctional protein that can carry out both catalyses, although exceptions occur in both cases. While eukaryotic monofunctional riboflavin kinase is orthologous to the bifunctional prokaryotic enzyme, the monofunctional FAD synthetase differs from its prokaryotic counterpart, and is instead related to the PAPS-reductase family. The bacterial FAD synthetase that is part of the bifunctional enzyme has remote similarity to nucleotidyl transferases and, hence, it may be involved in the adenylylation reaction of FAD synthetases.

    This entry represents riboflavin kinase, which occurs as part of a bifunctional enzyme or a stand-alone enzyme.

    Proteins where this domain is known:
    PY00264   


    PF01694 - Rhomboid (Pfam link)

    Interpro entry IPR002610 : Peptidase S54, rhomboid (Interpro link)

    Pfam description:
    This family contains integral membrane proteins that are related to Drosophila rhomboid protein Swiss:P20350. Members of this family are found in bacteria and eukaryotes. Rhomboid promotes the cleavage of the membrane-anchored TGF-alpha-like growth factor Spitz, allowing it to activate the Drosophila EGF receptor. Analysis has shown that Rhomboid-1 is an intramembrane serine protea (EC:3.4.21.105). Parasite-encoded rhomboid enzymes are also important for invasion of host cells by Toxoplasma and the malaria parasite.

    Interpro description:

    Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases.

    Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base. The geometric orientations of the catalytic residues are similar between families, despite different protein folds. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC).

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of proteins contain serine peptidases belonging to the MEROPS peptidase family S54 (Rhomboid, clan S-). They are integral membrane proteins related to the Drosophila melanogaster (Fruit fly) rhomboid protein Members of this family are found in archaea, bacteria and eukaryotes.

    The D. melanogaster rhomboid protease cleaves type-1 transmembrane domains using a catalytic triad composed of serine, histidine and asparagine contributed by different transmembrane domains. It cleaves the transmembrane proteins Spitz, Gurken and Keren within their transmembrane domains to release a soluble TGFalpha-like growth factor. Cleavage occurs in the Golgi, following translocation of the substrates from the endoplasmic reticulum membrane by Star, another transmembrane protein. The growth factors are then able to activate the epidermal growth factor receptor.

    Few substrates of mammalian rhomboid homologues have been determined, but rhomboid-like protein 2 (MEROPS S54.002) has been shown to cleave ephrin B3. Parasite-encoded rhomboid enzymes are also important for invasion of host cells by Toxoplasma and the malaria parasite.

    In Saccharomyces cerevisiae (Baker's yeast) the Pcp1 (MDM37) protein (MEROPS S54.007) is a mitochondrial endopeptidase required for the activation of cytochrome c peroxidase and for the processing of the mitochondrial dynamin-like protein Mgm1. Mutations in Pcp1 result in cells have fragmented mitochondria, which have very few short tubulues.

    Proteins where this domain is known:
    PY00165    PY00587    PY00729    PY01364    PY01566    PY03223    PY04351   


    PF01699 - Na_Ca_ex (Pfam link)

    Interpro entry IPR004837 : Sodium/calcium exchanger membrane region (Interpro link)

    Pfam description:
    This is a family of sodium/calcium exchanger integral membrane proteins. This family covers the integral membrane regions of the proteins. Sodium/calcium exchangers regulate intracellular Ca2+ concentrations in many cells; cardiac myocytes, epithelial cells, neurons retinal rod photoreceptors and smooth muscle cells. Ca2+ is moved into or out of the cytosol depending on Na+ concentration. In humans and rats there are 3 isoforms; NCX1 NCX2 and NCX3 see Swiss:Q01728, Swiss:P48768 and Swiss:P70549 respectively.

    Interpro description:
    The sodium/calcium exchangers are a family of integral membrane proteins. This domain covers the integral membrane regions of these proteins. Sodium/calcium exchangers regulate intracellular Ca2+ concentrations in many cells; cardiac myocytes, epithelial cells, neurons retinal rod photoreceptors and smooth muscle cells. Ca2+ is moved into or out of the cytosol depending on Na+ concentration. In humans and rats there are 3 isoforms; NCX1 NCX2 and NCX3.

    Proteins where this domain is known:
    PY00061   


    PF01702 - TGT (Pfam link)

    Interpro entry IPR002616 : Queuine/other tRNA-ribosyltransferase (Interpro link)

    Pfam description:
    This is a family of queuine tRNA-ribosyltransferases EC:2.4.2.29, also known as tRNA-guanine transglycosylase and guanine insertion enzyme. Queuine tRNA-ribosyltransferase modifies tRNAs for asparagine, aspartic acid, histidine and tyrosine with queuine. It catalyses the exchange of guanine-34 at the wobble position with 7-aminomethyl-7-deazaguanine, and the addition of a cyclopentenediol moiety to 7-aminomethyl-7-deazaguanine-34 tRNA; giving a hypermodified base queuine in the wobble position. The aligned region contains a zinc binding motif C-x-C-x2-C-x29-H, and important tRNA and 7-aminomethyl-7deazaguanine binding residues.

    Interpro description:
    This is a family of queuine, archaeosine and general tRNA-ribosyltransferases also known as tRNA-guanine transglycosylase and guanine insertion enzyme. Queuine tRNA-ribosyltransferase modifies tRNAs for asparagine, aspartic acid, histidine and tyrosine with queuine at position 34 and with archaeosine at position 15 in archaeal tRNAs. In bacterial it catalyses the exchange of guanine-34 at the wobble position with 7-aminomethyl-7-deazaguanine, and the addition of a cyclopentenediol moiety to 7-aminomethyl-7-deazaguanine-34 tRNA; giving a hypermodified base queuine in the wobble position. The aligned region contains a zinc binding motif C-x-C-x2-C-x29-H, and important tRNA and 7-aminomethyl-7deazaguanine binding residues.

    Proteins where this domain is known:
    PY00825    PY01488    PY06909   


    PF01715 - IPPT (Pfam link)

    Interpro entry IPR002627 : tRNA isopentenyltransferase (Interpro link)

    Pfam description:
    This is a family of IPP transferases EC:2.5.1.8 also known as tRNA delta(2)-isopentenylpyrophosphate transferase. These enzymes modify both cytoplasmic and mitochondrial tRNAs at A(37) to give isopentenyl A(37).

    Interpro description:
    tRNA isopentenyltransferasesalso known as tRNA delta(2)-isopentenylpyrophosphate transferases or IPP transferases. These enzymes modify both cytoplasmic and mitochondrial tRNAs at A(37) to give isopentenyl A(37).

    Proteins where this domain is known:
    PY04164   


    PF01722 - BolA (Pfam link)

    Interpro entry IPR002634 : (Interpro link)

    Pfam description:
    This family consist of the morphoprotein BolA from E. coli and its various homologues. In E. coli over expression of this protein causes round morphology and may be involved in switching the cell between elongation and septation systems during cell division. The expression of BolA is growth rate regulated and is induced during the transition into the the stationary phase. BolA is also induced by stress during early stages of growth and may have a general role in stress response. It has also been suggested that BolA can induce the transcription of penicillin binding proteins 6 and 5.

    Interpro description:
    This family consist of the morpho-protein BolA from Escherichia coli and its various homologs. In E. coli, over-expression of this protein causes round morphology and may be involved in switching the cell between elongation and septation systems during cell division. The expression of BolA is growth rate regulated and is induced during the transition into the the stationary phase. BolA is also induced by stress during early stages of growth and may have a general role in stress response. It has also been suggested that BolA can induce the transcription of penicillin binding proteins 6 and 5.

    Proteins where this domain is known:
    PY00231   


    PF01725 - Ham1p_like (Pfam link)

    Interpro entry IPR002637 : Ham1-like protein (Interpro link)

    Pfam description:
    This family consists of the HAM1 protein Swiss:P47119 and hypothetical archaeal bacterial and C. elegans proteins. HAM1 controls 6-N-hydroxylaminopurine (HAP) sensitivity and mutagenesis in S. cerevisiae Swiss:P47119. The HAM1 protein protects the cell from HAP, either on the level of deoxynucleoside triphosphate or the DNA level by a yet unidentified set of reactions.

    Interpro description:

    This family contains the Saccharomyces cerevisiae (Baker's yeast) HAM1 proteinand other hypothetical archaeal, bacterial and Caenorhabditis elegans proteins. S. cerevisiae HAM1 protects against the mutagenic effects of the base analog 6-N-hydroxylaminopurine (HAP) which can be a natural product of monooxygenase activity on adenine. HAM1 protein protects the cell from HAP, either on the level of deoxynucleoside triphosphate or the DNA level by a yet unidentified set of reactions.

    Proteins where this domain is known:
    PY01726   


    PF01728 - FtsJ (Pfam link)

    Interpro entry IPR002877 : (Interpro link)

    Pfam description:
    This family consists of FtsJ from various bacterial and archaeal sources FtsJ is a methyltransferase, but actually has no effect on cell division. FtsJ\'s substrate is the 23S rRNA. The 1.5 A crystal structure of FtsJ in complex with its cofactor S-adenosylmethionine revealed that FtsJ has a methyltransferase fold. This family also includes the N terminus of flaviviral NS5 protein. It has been hypothesised that the N-terminal domain of NS5 is a methyltransferase involved in viral RNA capping.

    Interpro description:

    RrmJ (FtsJ) is a well conserved heat shock protein present in prokaryotes, archaea, and eukaryotes. RrmJ is responsible for methylating 23 S rRNA at position U2552 in the aminoacyl (A)1-site of the ribosome. U2552 is one of the five universally conserved A-loop residues and has been shown to be methylated at the ribose 2'-OH group in the majority of organisms investigated so far. This suggests that this modification plays an important role in the A-loop function. RrmJ recognises its methylation target only when the 23 S rRNA is present in 50 S ribosomal subunits. This suggests that the RrmJ-mediated methylation must occur late in the maturation process of the ribosome. This is in contrast to other known 23 S rRNA modifications that occur in earlier maturation steps.

    The 1.5 A crystal structure of RrmJ in complex with its cofactor S-adenosylmethionine revealed that RrmJ has a methyltransferase fold. The active site of RrmJ appears to be formed by a catalytic triad consisting of two lysine residues and the negatively charged aspartate residue. Another highly conserved glutamate residue that is present in the active site of RrmJ appears to play only a minor role in the methyltransfer reaction in vivo.

    Proteins where this domain is known:
    PY02347    PY03106    PY05388    PY05898   


    PF01734 - Patatin (Pfam link)

    Interpro entry IPR002641 : Patatin (Interpro link)

    Pfam description:
    This family consists of various patatin glycoproteins from plants. The patatin protein accounts for up to 40% of the total soluble protein in potato tubers. Patatin is a storage protein but it also has the enzymatic activity of lipid acyl hydrolase, catalysing the cleavage of fatty acids from membrane lipids. Members of this family have been found also in vertebrates.

    Interpro description:
    This family consists of various patatin glycoproteins from the total soluble protein in potato tubers. Patatin is a storage protein but it also has the enzymatic activity of lipid acyl hydrolase, catalysing the cleavage of fatty acids from membrane lipids.

    Proteins where this domain is known:
    PY02960    PY06683   


    PF01743 - PolyA_pol (Pfam link)

    Interpro entry IPR002646 : Polynucleotide adenylyltransferase region (Interpro link)

    Pfam description:
    This family includes nucleic acid independent RNA polymerases, such as Poly(A) polymerase, which adds the poly (A) tail to mRNA EC:2.7.7.19. This family also includes the tRNA nucleotidyltransferase that adds the CCA to the 3\' of the tRNA EC:2.7.7.25.

    Interpro description:

    This group includes nucleic acid independent RNA polymerases, such as polynucleotide adenylyltransferase, which adds the poly (A) tail to mRNA. This group also includes the tRNA nucleotidyltransferase that adds the CCA to the 3' of the tRNA

    Proteins where this domain is known:
    PY00891   


    PF01746 - tRNA_m1G_MT (Pfam link)

    Interpro entry IPR016009 : (Interpro link)

    Pfam description:
    This is a family of tRNA (Guanine-1)-methyltransferases EC:2.1.1.31. In E.coli K12 this enzyme catalyses the conversion of a guanosine residue to N1-methylguanine in position 37, next to the anticodon, in tRNA.

    Interpro description:

    In transfer RNA many different modified nucleosides are found, especially in the anticodon region. tRNA (guanine-N1-)-methyltransferaseis one of several nucleases operating together with the tRNA-modifying enzymes before the formation of the mature tRNA. It catalyses the reaction:

     S-adenosyl-L-methionine + tRNA -> S-adenosyl-L-homocysteine + tRNA containing                  N1-methylguanine 
    methylating guanosine(G) to N1-methylguanine (1-methylguanosine (m1G)) at position 37 of tRNAs that read CUN (leucine), CCN(proline), and CGG (arginine) codons. The presence of m1G improves the cellular growth rate and the polypeptide steptime and also prevents the tRNA from shifting the reading frame.

    The mechanism of the trmD3-induced frameshift involving mutant tRNA(Pro) and tRNA(Leu) species has been investigated. It has been suggested that the conformation of the anticodon loop may be a major determining element for the formation of m1G37 in vivo.

    Proteins where this domain is known:
    PY04675   


    PF01749 - IBB (Pfam link)

    Interpro entry IPR002652 : Importin-alpha-like, importin-beta-binding region (Interpro link)

    Pfam description:
    This family consists of the importin alpha (karyopherin alpha), importin beta (karyopherin beta) binding domain. The domain mediates formation of the importin alpha beta complex; required for classical NLS import of proteins into the nucleus, through the nuclear pore complex and across the nuclear envelope. Also in the alignment is the NLS of importin alpha which overlaps with the IBB domain.

    Interpro description:

    The exchange of macromolecules between the nucleus and cytoplasm takes place through nuclear pore complexes within the nuclear membrane. Active transport of large molecules through these pore complexes require carrier proteins, called karyopherins (importins and exportins), which shuttle between the two compartments.

    Members of the importin-alpha (karyopherin-alpha) family can form heterodimers with importin-beta. As part of a heterodimer, importin-beta mediates interactions with the pore complex, while importin-alpha acts as an adaptor protein to bind the nuclear localisation signal (NLS) on the cargo through the classical NLS import of proteins. Proteins can contain one (monopartite) or two (bipartite) NLS motifs. Importin-alpha contains several armadillo (ARM) repeats, which produce a curving structure with two NLS-binding sites, a major one close to the N-terminus and a minor one close to the C-terminus.

    Ran GTPase helps to control the unidirectional transfer of cargo. The cytoplasm contains primarily RanGDP and the nucleus RanGTP through the actions of RanGAP and RanGEF, respectively. In the nucleus, RanGTP binds to importin-beta within the importin/cargo complex, causing a conformational change in importin-beta that releases it from importin-alpha-bound cargo. The N-terminal importin-beta-binding (IBB) domain of importin-alpha contains an auto-regulatory region that mimics the NLS motif. The release of importin-beta frees the auto-regulatory region on importin-alpha to loop back and bind to the major NLS-binding site, causing the cargo to be released.

    This entry represents the N-terminal IBB domain of importin-alpha that contains the auto-regulatory region.

    More information about these proteins can be found at Protein of the Month: Importins.

    Proteins where this domain has been detected by our approach:
    PY01795   


    PF01751 - Toprim (Pfam link)

    Interpro entry IPR006171 : (Interpro link)

    Pfam description:
    This is a conserved region from DNA primase. This corresponds to the Toprim domain common to DnaG primases, topoisomerases, OLD family nucleases and RecR proteins. Both DnaG motifs IV and V are present in the alignment, the DxD (V) motif may be involved in Mg2+ binding and mutations to the conserved glutamate (IV) completely abolish DnaG type primase activity. DNA primase EC:2.7.7.6 is a nucleotidyltransferase it synthesises the oligoribonucleotide primers required for DNA replication on the lagging strand of the replication fork; it can also prime the leading stand and has been implicated in cell division. This family also includes the atypical archaeal A subunit from type II DNA topoisomerases. Type II DNA topoisomerases catalyse the relaxation of DNA supercoiling by causing transient double strand breaks.

    Interpro description:

    This is a conserved region from DNA primase. This corresponds to the Toprim (topoisomerase-primase) domain common to DnaG primases, topoisomerases, OLD family nucleases and RecR/M DNA repair proteins. Both DnaG motifs IV and V are present in the alignment, the DxD (V) motif may be involved in Mg2+ binding and mutations to the conserved glutamate (IV) completely abolish DnaG type primase activity. DNA primaseis a nucleotidyltransferase it synthesizes the oligoribonucleotide primers required for DNA replication on the lagging strand of the replication fork; it can also prime the leading stand and has been implicated in cell division. This family also includes the atypical archaeal A subunit from type II DNA topoisomerases. Type II DNA topoisomerases catalyse the relaxation of DNA supercoiling by causing transient double strand breaks.

    Proteins where this domain is known:
    PY01134    PY04024   

    Proteins where this domain has been detected by our approach:
    PY00163   


    PF01753 - zf-MYND (Pfam link)

    Interpro entry IPR002893 : Zinc finger, MYND-type (Interpro link)

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents MYND-type zinc finger domains. The MYND domain (myeloid, Nervy, and DEAF-1) is present in a large group of proteins that includes RP-8 (PDCD2), Nervy, and predicted proteins from Drosophila, mammals, Caenorhabditis elegans, yeast, and plants. The MYND domain consists of a cluster of cysteine and histidine residues, arranged with an invariant spacing to form a potential zinc-binding motif. Mutating conserved cysteine residues in the DEAF-1 MYND domain does not abolish DNA binding, which suggests that the MYND domain might be involved in protein-protein interactions. Indeed, the MYND domain of ETO/MTG8 interacts directly with the N-CoR and SMRT co-repressors. Aberrant recruitment of co-repressor complexes and inappropriate transcriptional repression is believed to be a general mechanism of leukemogenesis caused by the t(8;21) translocations that fuse ETO with the acute myelogenous leukemia 1 (AML1) protein. ETO has been shown to be a co-repressor recruited by the promyelocytic leukemia zinc finger (PLZF) protein. A divergent MYND domain present in the adenovirus E1A binding protein BS69 was also shown to interact with N-CoR and mediate transcriptional repression. The current evidence suggests that the MYND motif in mammalian proteins constitutes a protein-protein interaction domain that functions as a co-repressor-recruiting interface.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY04885   


    PF01754 - zf-A20 (Pfam link)

    Interpro entry IPR002653 : Zinc finger, A20-type (Interpro link)

    Pfam description:
    A20- (an inhibitor of cell death)-like zinc fingers. The zinc finger mediates self-association in A20. These fingers also mediate IL-1-induced NF-kappa B activation.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents the zinc finger domain found in A20. A20 is an inhibitor of cell death that inhibits NF-kappaB activation via the tumour necrosis factor receptor associated factor pathway. The zinc finger domains appear to mediate self-association in A20. These fingers also mediate IL-1-induced NF-kappa B activation.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY00088   


    PF01764 - Lipase_3 (Pfam link)

    Interpro entry IPR002921 : Lipase, class 3 (Interpro link)

    Interpro description:

    Triglyceride lipases are lipolytic enzymes that hydrolyse ester linkages of triglycerides. Lipases are widely distributed in animals, plants and prokaryotes. This family of lipases have been called Class 3 as they are not closely related to other lipase families.

    Proteins where this domain is known:
    PY07137   


    PF01765 - RRF (Pfam link)

    Interpro entry IPR002661 : Ribosome recycling factor (Interpro link)

    Pfam description:
    The ribosome recycling factor (RRF / ribosome release factor) dissociates the ribosome from the mRNA after termination of translation, and is essential bacterial growth. Thus ribosomes are "recycled" and ready for another round of protein synthesis.

    Interpro description:

    The ribosome recycling factor or ribosome release factor (RRF) dissociates ribosomes from mRNA after termination of translation, and is essential for bacterial growth. Thus ribosomes are 'recycled' and ready for another round of protein synthesis.

    Proteins where this domain is known:
    PY06329   


    PF01775 - Ribosomal_L18ae (Pfam link)

    Interpro entry IPR002670 : Ribosomal protein L18ae (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L18ae forms part of the 60S ribosomal subunit. This family is found in eukaryotes. Rat ribosomal protein L18 is homologous to Xenopus laevis L14.

    Proteins where this domain is known:
    PY01391   


    PF01776 - Ribosomal_L22e (Pfam link)

    Interpro entry IPR002671 : Ribosomal protein L22e (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L22e forms part of the 60S ribosomal subunit. This family is found in eukaryotes. Rattus norvegicus (Rat) L22 is related to ribosomal proteins from other eukaryotes and is identical in amino acid sequence to human EAP, the EBER 1 (Epstein-Barr virus (strain GD1) (HHV-4) (Human herpesvirus 4) encoded RNA) associated protein.

    Proteins where this domain is known:
    PY04247   


    PF01777 - Ribosomal_L27e (Pfam link)

    Interpro entry IPR001141 : Ribosomal protein L27e (Interpro link)

    Pfam description:
    The N-terminal region of the eukaryotic ribosomal L27 has the KOW motif. C-terminal region is represented by this family.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein, L27 is found in fungi, plants, algae and vertebrates. The family has a specific signature at the C terminus.

    Proteins where this domain is known:
    PY01750   


    PF01780 - Ribosomal_L37ae (Pfam link)

    Interpro entry IPR002674 : Ribosomal protein L37ae (Interpro link)

    Pfam description:
    This ribosomal protein is found in archaebacteria and eukaryotes. It contains four conserved cysteine residues that may bind to zinc.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This ribosomal protein is found in archaebacteria and eukaryotes. Ribosomal protein L37 has a single zinc finger-like motif of the C2-C2 type.

    Proteins where this domain is known:
    PY06446   


    PF01782 - RimM (Pfam link)

    Interpro entry IPR002676 : RimM protein (Interpro link)

    Pfam description:
    The RimM protein is essential for efficient processing of 16S rRNA. The RimM protein was shown to have affinity for free ribosomal 30S subunits but not for 30S subunits in the 70S ribosomes. This N-terminal domain is found associated with a PRC-barrel domain.

    Interpro description:

    The RimM protein is essential for efficient processing of 16S rRNA. The RimM protein was shown to have affinity for free ribosomal 30S subunits but not for 30S subunits in the 70S ribosomes.

    Proteins where this domain is known:
    PY04560   


    PF01791 - DeoC (Pfam link)

    Interpro entry IPR002915 : Deoxyribose-phosphate aldolase/phospho-2-dehydro-3-deoxyheptonate aldolase (Interpro link)

    Pfam description:
    This family includes diverse aldolase enzymes. This family includes the enzyme deoxyribose-phosphate aldolase EC:4.1.2.4, which is involved in nucleotide metabolism. The family also includes a group of related bacterial proteins of unknown function, see examples Swiss:Q57843 and Swiss:P76143. The family also includes tagatose 1,6-diphosphate aldolase (EC:4.1.2.40) is part of the tagatose-6-phosphate pathway of galactose-6-phosphate degradation.

    Interpro description:
    This family includes the enzyme deoxyribose-phosphate aldolase, which is involved in nucleotide metabolism.
    2-deoxy-D-ribose 5-phosphate = D-glyceraldehyde 3-phosphate + acetaldehyde
    The family also includes a group of related bacterial proteins of unknown function, see examplesand

    Proteins where this domain is known:
    PY02252   


    PF01795 - Methyltransf_5 (Pfam link)

    Interpro entry IPR002903 : Bacterial methyltransferase (Interpro link)

    Pfam description:
    Members of this family are probably SAM dependent methyltransferases based on Swiss:P18595. This family appears to be related to Pfam:PF01596.

    Interpro description:

    This is a family of methyltransferases, so called because they are responsible for the transfer of methyl groups between molecules. Despite its name, it does not occur solely in bacteria. This protein is essential in Escherichia coli and has been linked to peptidoglycan biosynthesis.

    Proteins where this domain is known:
    PY05542   


    PF01798 - Nop (Pfam link)

    Interpro entry IPR002687 : (Interpro link)

    Pfam description:
    This family consists of various Pre RNA processing ribonucleoproteins. The function of the aligned region is unknown however it may be a common RNA or snoRNA or Nop1p binding domain. Nop5p (Nop58p) Swiss:Q12499 from yeast is the protein component of a ribonucleoprotein protein required for pre-18s rRNA processing and is suggested to function with Nop1p in a snoRNA complex. Nop56p Swiss:O00567 and Nop5p interact with Nop1p and are required for ribosome biogenesis. Prp31p Swiss:p49704 is required for pre-mRNA splicing in S. cerevisiae.

    Interpro description:
    This domain is present in various pre-mRNA processing ribonucleoproteins. The function of the domain is unknown however it may be a common RNA or snoRNA or Nop1p binding domain.

    Proteins have been implicated in an expanding variety of functions during pre-mRNA splicing. Molecular cloning has identified genes encoding spliceosomal proteins that potentially act as novel RNA helicases, GTPases, or protein isomerases. Novel protein-protein and protein-RNA interactions that are required for functional spliceosome formation have also been described. Finally, growing evidence suggests that proteins may contribute directly to the spliceosome's active sites.

    Proteins where this domain is known:
    PY00908    PY01565    PY04097    PY05710   


    PF01805 - Surp (Pfam link)

    Interpro entry IPR000061 : SWAP/Surp (Interpro link)

    Pfam description:
    This domain is also known as the SWAP domain. SWAP stands for Suppressor-of-White-APricot. It has been suggested that these domains may be RNA binding.

    Interpro description:
    SWAP is derived from the Suppressor-of-White-APricot splicing regulator from Drosophila melanogaster. The domain is found in regulators responsible for pervasive, nonsex-specific alternative pre-mRNA splicing characteristics and has been found in splicing regulatory proteins. These ancient, conserved SWAP proteins share a colinearly arrayed series of novel sequence motifs.

    Proteins where this domain is known:
    PY00123    PY02637   


    PF01813 - ATP-synt_D (Pfam link)

    Interpro entry IPR002699 : ATPase, V1/A1 complex, subunit D (Interpro link)

    Pfam description:
    This is a family of subunit D form various ATP synthases including V-type H+ transporting and Na+ dependent. Subunit D is suggested to be an integral part of the catalytic sector of the V-ATPase.

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    The V-ATPases (or V1V0-ATPase) and A-ATPases (or A1A0-ATPase) are each composed of two linked complexes: the V1 or A1 complex contains the catalytic core that hydrolyses/synthesizes ATP, and the V0 or A0 complex that forms the membrane-spanning pore. The V- and A-ATPases both contain rotary motors, one that drives proton translocation across the membrane and one that drives ATP synthesis/hydrolysis . The V- and A-ATPases more closely resemble one another in subunit structure than they do the F-ATPases, although the function of A-ATPases is closer to that of F-ATPases.

    This entry represents the D subunit found in V1 and A1 complexes of V- and A-ATPases, respectively. Subunit D appears to be located in the central stalk, whereas subunits E and G form part of the peripheral stalk connecting V1 and V0. This subunit is the most likely homologue to the gamma subunit of the F1 complex in F-ATPases, which undergoes rotation during ATP hydrolysis and serves an essential function in rotary catalysis.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY03087   


    PF01823 - MACPF (Pfam link)

    Interpro entry IPR001862 : (Interpro link)

    Pfam description:
    The membrane-attack complex (MAC) of the complement system forms transmembrane channels. These channels disrupt the phospholipid bilayer of target cells, leading to cell lysis and death. A number of proteins participate in the assembly of the MAC. Freshly activated C5b binds to C6 to form a C5b-6 complex, then to C7 forming the C5b-7 complex. The C5b-7 complex binds to C8, which is composed of three chains (alpha, beta, and gamma), thus forming the C5b-8 complex. C5b-8 subsequently binds to C9 and acts as a catalyst in the polymerisation of C9. Active MAC has a subunit composition of C5b-C6-C7-C8-C9{n}. Perforin is a protein found in cytolytic T-cell and killer cells. In the presence of calcium, perforin polymerises into transmembrane tubules and is capable of lysing, non-specifically, a variety of target cells. There are a number of regions of similarity in the sequences of complement components C6, C7, C8-alpha, C8-beta, C9 and perforin. The X-ray crystal structure of a MACPF domain reveals that it shares a common fold with bacterial cholesterol dependent cytolysins (Pfam:PF01289) such as perfringolysin O. Three key pieces of evidence suggests that MACPF domains and CDCs are homologous: Functional similarity (pore formation), conservation of three glycine residues at a hinge in both families and conservation of a complex core fold.

    Interpro description:

    The membrane-attack complex (MAC) of the complement system forms transmembrane channels. These channels disrupt the phospholipid bilayer of target cells, leading to cell lysis and death. A number of proteins participate in the assembly of the MAC. Freshly activated C5b binds to C6 to form a C5b-6 complex, then to C7 forming the C5b-7 complex. The C5b-7 complex binds to C8, which is composed of three chains (alpha, beta, and gamma), thus forming the C5b-8 complex. C5b-8 subsequently binds to C9 and acts as a catalyst in the olymerization of C9. Active MAC has a subunit composition of C5b-C6-C7-C8-C9{n}.

    Perforin is a protein found in cytolytic T-cell and killer cells. In the presence of calcium, perforin polymerizes into transmembrane tubules and is capable of lysing, non-specifically, a variety of target cells.

    There are a number of regions of similarity in the sequences of complement components C6, C7, C8-alpha, C8-beta, C9 and perforin.

    Proteins where this domain is known:
    PY00181    PY00454    PY03076    PY03943    PY05180   


    PF01826 - TIL (Pfam link)

    Interpro entry IPR018453 : (Interpro link)

    Pfam description:
    This family contains trypsin inhibitors as well as a domain found in many extracellular proteins. The domain typically contains ten cysteine residues that form five disulphide bonds. The cysteine residues that form the disulphide bonds are 1-7, 2-6, 3-5, 4-10 and 8-9.

    Interpro description:

    This domain is found in proteinase inhibitors as well as in many extracellular proteins. The domain typically contains ten cysteine residues that form five disulphide bonds. The cysteine residues that form the disulphide bonds are 1-7, 2-6, 3-5, 4-10 and 8-9.

    This inhibitor domain belongs to MEROPS inhibitor family I8 (clan IA). Proteins containing this domain inhibit peptidases belonging to families S1, S8, and M4 and are restricted to the chordata, nematoda, arthropoda and echinodermata. Examples of proteins containing this domain are:

    Proteins where this domain has been detected by our approach:
    PY00357   


    PF01846 - FF (Pfam link)

    Interpro entry IPR002713 : (Interpro link)

    Pfam description:
    This domain has been predicted to be involved in protein-protein interaction. This domain was recently shown to bind the hyperphosphorylated C-terminal repeat domain of RNA polymerase II, confirming its role in protein-protein interactions.

    Interpro description:
    The FF domain may be involved in protein-protein interaction. It often occurs as multiple copies and often accompanies WW domains PRP40 from yeast encodes a novel, essential splicing component that associates with the yeast U1 small nuclear ribonucleoprotein particle.

    Proteins where this domain is known:
    PY03239   

    Proteins where this domain has been detected by our approach:
    PY02705   


    PF01849 - NAC (Pfam link)

    Interpro entry IPR002715 : (Interpro link)

    Interpro description:

    Nascent polypeptide-associated complex (NAC) is among the first ribosome-associated entities to bind the nascent polypeptide after peptide bond formation. The nascent polypeptide-associated complex (NAC) of yeast functions in the targeting process of ribosomes to the ER membrane. NAC may prevent binding of ribosome nascent chains (RNCs) without a signal sequence to yeast membranes.

    Proteins where this domain is known:
    PY01462    PY05141    PY06212   


    PF01851 - PC_rep (Pfam link)

    Interpro entry IPR002015 : (Interpro link)

    Interpro description:
    A weakly conserved repeat module of unknown function, which occurs in two regulatory subunits of the 26S-proteasome and in one subunit of the APC-complex (cyclosome).

    Proteins where this domain is known:
    PY00248    PY01123   


    PF01853 - MOZ_SAS (Pfam link)

    Interpro entry IPR002717 : MOZ/SAS-like protein (Interpro link)

    Pfam description:
    This region of these proteins has been suggested to be homologous to acetyltransferases.

    Interpro description:

    Moz is a monocytic leukemia Zn_finger protein and the SAS protein from Saccharomyces cerevisiae (Baker's yeast) is involved in silencing the Hmr locus. These proteins were reported to be homologous to acetyltransferases but this similarity is not supported by standard sequence analysis.

    Proteins where this domain is known:
    PY00909   


    PF01866 - Diphthamide_syn (Pfam link)

    Interpro entry IPR002728 : (Interpro link)

    Pfam description:
    Swiss:Q16439 is a candidate tumour suppressor gene. DPH2 from yeast Swiss:P32461, which confers resistance to diphtheria toxin has been found to be involved in diphthamide synthesis. Diphtheria toxin inhibits eukaryotic protein synthesis by ADP-ribosylating diphthamide, a posttranslationally modified histidine residue present in EF2. The exact function of the members of this family is unknown.

    Interpro description:
    Members of this family include a candidate tumour suppressor gene, and DPH2 from yeast which confers resistance to diphtheria toxin and has been found to be involved in diphthamide synthesis. Diphtheria toxin inhibits eukaryotic protein synthesis by ADP-ribosylating diphthamide, a posttranslationally modified histidine residue present in EF2. The exact function of the members of this family is unknown.

    Proteins where this domain is known:
    PY01400    PY06443   


    PF01871 - AMMECR1 (Pfam link)

    Interpro entry IPR002733 : (Interpro link)

    Pfam description:
    This family consists of several AMMECR1 as well as several uncharacterised proteins. The contiguous gene deletion syndrome AMME is characterised by Alport syndrome, midface hypoplasia, mental retardation and elliptocytosis and is caused by a deletion in Xq22.3, comprising several genes including COL4A5, FACL4 and AMMECR1. This family contains sequences from several eukaryotic species as well as archaebacteria and it has been suggested that the AMMECR1 protein may have a basic cellular function, potentially in either the transcription, replication, repair or translation machinery.

    Interpro description:

    The contiguous gene deletion syndrome is characterised by Alport syndrome (A), mental retardation (M), midface hypoplasia (M), and elliptocytosis (E), as well as generalized hypoplasia and cardiac abnormalities. It is caused by a deletion in Xq22.3, comprising several genes including AMME chromosomal region gene 1 (AMMECR1), which encodes a protein with a nuclear location and presently unknown function. The C-terminal region of AMMECR1 (from residue 122 to 333) is well conserved, and homologues appear in species ranging from bacteria and archaea to eukaryotes. The high level of conservation of the AMMECR1 domain points to a basic cellular function, potentially in either the transcription, replication, repair or translation machinery.

    The AMMECR1 domain contains a 6-amino-acid motif (LRGCIG) that might be functionally important since it is strikingly conserved throughout evolution. The AMMECR1 domain consists of two distinct subdomains of different sizes. The large subdomain, which contains both the N- and C-terminal regions, consists of five alpha-helices and five beta-strands. These five beta-strands form an antiparallel beta-sheet. The small subdomain consists of four alpha-helices and three beta-strands, and these beta-strands also form an antiparallel beta-sheet. The conserved 'LRGCIG' motif is located at beta(2) and its N-terminal loop, and most of the side chains of these residues point toward the interface of the two subdomains. The two subdomains are connected by only two loops, and the interaction between the two subdomains is not strong. Thus, these subdomains may move dynamically when the substrate enters the cleft. The size of the cleft suggests that the substrate is large, e.g., the substrate may be a nucleic acid or protein. However, the inner side of the cleft is not filled with positively charged residues, and therefore it is unlikely that negatively charged nucleic acids such as DNA or RNA interact at this site.

    Proteins where this domain is known:
    PY02102   


    PF01873 - eIF-5_eIF-2B (Pfam link)

    Interpro entry IPR002735 : Translation initiation factor IF2/IF5 (Interpro link)

    Pfam description:
    This family includes the N terminus of eIF-5 Swiss:P55010, and the C terminus of eIF-2 beta Swiss:P20042. This region corresponds to the whole of the archaebacterial eIF-2 beta homologue. The region contains a putative zinc binding C4 finger.

    Interpro description:

    The beta subunit of archaeal and eukaryotic translation initiation factor 2 (IF2beta) and the N-terminal domain of translation initiation factor 5 (IF5) show significant sequence homology. Archaeal IF2beta contains two independent structural domains: an N-terminal mixed alpha/beta core domain (topological similarity to the common core of ribosomal proteins L23 and L15e), and a C-terminal domain consisting of a zinc-binding C4 finger. Archaeal IF2beta is a ribosome-dependent GTPase that stimulates the binding of initiator Met-tRNA(i)(Met) to the ribosomes, even in the absence of other factors. The C-terminal domain of eukaryotic IF5 is involved in the formation of the multi-factor complex (MFC), an important intermediate for the 43S pre-initiation complex assembly. IF5 interacts directly with IF1, IF2beta and IF3c, which together with IF2-bound Met-tRNA(i)(Met) form the MFC.

    This entry represents both the N-terminal and zinc-binding domains of IF2, as well as a domain in IF5.

    Proteins where this domain is known:
    PY02091    PY07150   


    PF01875 - Memo (Pfam link)

    Interpro entry IPR002737 : (Interpro link)

    Pfam description:
    This family contains members from all branches of life. The molecular function of this protein is unknown, but Memo (mediator of ErbB2-driven cell motility) a human protein is included in this family. It has been suggested that Memo controls cell migration by relaying extracellular chemotactic signals to the microtubule cytoskeleton.

    Interpro description:

    This entry contains proteins from all branches of life. The molecular function of these proteins are unknown, but Memo (mediator of ErbB2-driven cell motility) a human protein is included in this family. It has been suggested that Memo controls cell migration by relaying extracellular chemotactic signals to the microtubule cytoskeleton.

    Proteins where this domain is known:
    PY04533   


    PF01883 - DUF59 (Pfam link)

    Interpro entry IPR002744 : (Interpro link)

    Pfam description:
    This family includes prokaryotic proteins of unknown function. The family also includes PhaH Swiss:O84984 from Pseudomonas putida. PhaH forms a complex with PhaF Swiss:O84982, PhaG Swiss:O84983 and PhaI Swiss:O84985, which hydroxylates phenylacetic acid to 2-hydroxyphenylacetic acid. So members of this family may all be components of ring hydroxylating complexes.

    Interpro description:
    This family includes prokaryotic proteins of unknown function. The family also includes PhaH from Pseudomonas putida. PhaH forms a complex with PhaF, PhaG and PhaI, which hydroxylates phenylacetic acid to 2-hydroxyphenylacetic acid. So members of this family may all be components of ring hydroxylating complexes.

    Proteins where this domain is known:
    PY02319    PY03468   


    PF01896 - DNA_primase_S (Pfam link)

    Interpro entry IPR002755 : DNA primase, small subunit (Interpro link)

    Pfam description:
    DNA primase synthesises the RNA primers for the Okazaki fragments in lagging strand DNA synthesis. DNA primase is a heterodimer of large and small subunits. This family also includes baculovirus late expression factor 1 or LEF-1 proteins. Baculovirus LEF-1 is a DNA primase enzyme. Bacterial DNA primase adopts a different fold to archaeal and eukaryotic primases.

    Interpro description:

    DNA primase synthesizes the RNA primers for the Okazaki fragments in lagging strand DNA synthesis. DNA primase is a heterodimer of large (p60) and small (p50) subunits in eukaryotes. This family represents sequences of the small subunit and the DNA primase sequences of the Archaea. No sequence similarity can be detected between the eukaryotic p50 and p60 subunits and the primases purified from bacteriophage and bacteria.

    Proteins where this domain is known:
    PY00667   


    PF01907 - Ribosomal_L37e (Pfam link)

    Interpro entry IPR001569 : Ribosomal protein L37e (Interpro link)

    Pfam description:
    This family includes ribosomal protein L37 from eukaryotes and archaebacteria. The family contains many conserved cysteines and histidines suggesting that this protein may bind to zinc.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    A number of eukaryotic and archaeal ribosomal proteins can be grouped on the basis of sequence similarities. One of these families consists of proteins of 56 to 96 amino-acid residues that share a highly conserved region located in the N-terminal part.

    Proteins where this domain is known:
    PY02795   


    PF01909 - NTP_transf_2 (Pfam link)

    Interpro entry IPR002934 : Nucleotidyltransferase (Interpro link)

    Pfam description:
    Members of this family belong to a large family of nucleotidyltransferases. This family includes kanamycin nucleotidyltransferase (KNTase) which is a plasmid-coded enzyme responsible for some types of bacterial resistance to aminoglycosides. KNTase in-activates antibiotics by catalysing the addition of a nucleotidyl group onto the drug.

    Interpro description:

    A small region that overlaps with a nuclear localization signal and binds to the RNA primer contains three aspartates that are essential for catalysis. Sequence and secondary structure comparisons of regions surrounding these aspartates with sequences of other polymerases revealed a significant homology to the palm structure of DNA polymerase beta, terminal deoxynucleotidyltransferase and DNA polymerase IV of Saccharomyces cerevisiae, all members of the family X of polymerases. This homology extends as far as cca: tRNA nucleotidyltransferase and streptomycin adenylyltransferase, an antibiotic resistance factor.

    Proteins containing this domain include kanamycin nucleotidyltransferase (KNTase) which is a plasmid-coded enzyme responsible for some types of bacterial resistance to aminoglycosides. KNTase inactivates antibiotics by catalysing the addition of a nucleotidyl group onto the drug. In experiments, Mn2+ strongly stimulated this reaction due to a 50-fold lower Ki for 8-azido-ATP in the presence of Mn2+. Mutations of the highly conserved Asp residues 113, 115, and 167, critical for metal binding in the catalytic domain of bovine poly(A) polymerase, led to a strong reduction of cross-linking efficiency, and Mn2+ no longer stimulated the reaction. Mutations in the region of the "helical turn motif" (a domain binding the triphosphate moiety of the nucleotide) and in the suspected nucleotide-binding helix of bovine poly(A) polymerase impaired ATP binding and catalysis. The results indicate that ATP is bound in part by the helical turn motif and in part by a region that may be a structural analogue of the fingers domain found in many polymerases.

    Proteins where this domain is known:
    PY05727   

    Proteins where this domain has been detected by our approach:
    PY04615   


    PF01912 - eIF-6 (Pfam link)

    Interpro entry IPR002769 : Translation initiation factor IF6 (Interpro link)

    Pfam description:
    This family includes eukaryotic translation initiation factor 6 as well as presumed archaebacterial homologues.

    Interpro description:

    This family includes eukaryotic translation initiation factor 6 (eIF6) as well as presumed archaeal homologues.

    The assembly of 80S ribosomes requires joining of the 40S and 60S subunits, which is triggered by the formation of an initiation complex on the 40S subunit. This event is rate-limiting for translation, and depends on external stimuli and the status of the cell. Eukaryotic translation initiation factor 6 (eIF6) binds specifically to the free 60S ribosomal subunit and prevents its association with the 40S ribosomal subunit ribosomes. Furthermore, eIF6 interacts in the cytoplasm with RACK1, a receptor for activated protein kinase C (PKC). RACK1 is a major component of translating ribosomes, which harbour significant amounts of PKC. Loading 60S subunits with eIF6 caused a dose-dependent translational block and impairment of 80S formation, which are reversed by expression of RACK1 and stimulation of PKC in vivo and in vitro. PKC stimulation leads to eIF6 phosphorylation and its release, promoting 80S subunit formation. RACK1 provides a physical and functional link between PKC signalling and ribosome activation.

    Proteins where this domain is known:
    PY01848   


    PF01916 - DS (Pfam link)

    Interpro entry IPR002773 : Deoxyhypusine synthase (Interpro link)

    Pfam description:
    Eukaryotic initiation factor 5A (eIF-5A) contains an unusual amino acid, hypusine. The first step in the post-translational formation of hypusine is catalysed by the enzyme deoxyhypusine synthase (DS) EC:1.1.1.249. The modified version of eIF-5A, and DS, are required for eukaryotic cell proliferation.

    Interpro description:
    Eukaryotic initiation factor 5A (eIF-5A) contains an unusual amino acid, hypusine [N epsilon-(4-aminobutyl-2-hydroxy)lysine]. The first step in the post-translational formation of hypusine is catalysed by the enzyme deoxyhypusine synthase (DS) The enzyme catalyses the following reaction:
     Spermidine + [eIF-5A]-lysine = 1,3-diaminopropane + [eIF-5A]-deoxyhypusine 
    The modified version of eIF-5A, and DS, are required for eukaryotic cell proliferation. The structure is known for this enzyme in complex with its NAD+ cofactor.

    Proteins where this domain is known:
    PY01546   


    PF01918 - Alba (Pfam link)

    Interpro entry IPR002775 : Alba, DNA/RNA-binding protein (Interpro link)

    Pfam description:
    Alba is a novel chromosomal protein that coats archaeal DNA without compacting it.

    Interpro description:

    Members of this family include the archaeal protein Alba and a number of eukaryotic proteins with no known function. The DNA/RNA-binding protein Alba binds double-stranded DNA tightly but without sequence specificity. It binds rRNA and mRNA in vivo, and may play a role in maintaining the structural and functional stability of RNA, and, perhaps, ribosomes. It is distributed uniformly and abundantly on the chromosome. Alba has been shown to bind DNA and affect DNA supercoiling in a temperature dependent manner. It is regulated by acetylation (alba = acetylation lowers binding affinity) by the Sir2 protein. Alba is proposed to play a role in establishment or maintenace of chromatin architecture and thereby in transcription repression. For further information see.

    Proteins where this domain is known:
    PY01330    PY07825   


    PF01920 - Prefoldin_2 (Pfam link)

    Interpro entry IPR002777 : Prefoldin beta-like (Interpro link)

    Pfam description:
    This family includes prefoldin subunits that are not detected by Pfam:PF02996.

    Interpro description:

    Prefoldin (PFD) is a chaperone that interacts exclusively with type II chaperonins, hetero-oligomers lacking an obligate co-chaperonin that are found only in eukaryotes (chaperonin-containing T-complex polypeptide-1 (CCT)) and archaea. Eukaryotic PFD is a multi-subunit complex containing six polypeptides in the molecular mass range of 14Â23 kDa. In archaea, on the other hand, PFD is composed of two types of subunits, two alpha and four beta. The six subunits associate to form two back-to-back up-and-down eight-stranded barrels, from which hang six coiled coils. Each subunit contributes one (beta subunits) or two (alpha subunits) beta hairpin turns to the barrels. The coiled coils are formed by the N and C termini of an individual subunit. Overall, this unique arrangement resembles a jellyfish. The eukaryotic PFD hexamer is composed of six different subunits; however, these can be grouped into two alpha-like (PFD3 and -5) and four beta-like (PFD1, -2, -4, and -6) subunits based on amino acid sequence similarity with their archaeal counterparts. Eukaryotic PFD has a six-legged structure similar to that seen in the archaeal homologue. This family contains the archaeal beta subunit, eukaryotic prefoldin subunits 1, 2, 4 and 6.

    Eukaryotic PFD has been shown to bind both actin and tubulin co-translationally. The chaperone then delivers the target protein to CCT, interacting with the chaperonin through the tips of the coiled coils. No authentic target proteins of any archaeal PFD have been identified, to date.

    Proteins where this domain is known:
    PY05659    PY06424   


    PF01922 - SRP19 (Pfam link)

    Interpro entry IPR002778 : Signal recognition particle, SRP19 subunit (Interpro link)

    Pfam description:
    The signal recognition particle (SRP) binds to the signal peptide of proteins as they are being translated. The binding of the SRP halts translation and the complex is then transported to the endoplasmic reticulum\'s cytoplasmic surface. The SRP then aids translocation of the protein through the ER membrane. The SRP is a ribonucleoprotein that is composed of a small RNA and several proteins. One of these proteins is the SRP19 protein (Sec65 in yeast).

    Interpro description:

    The signal recognition particle (SRP) is a multimeric protein, which along with its conjugate receptor (SR), is involved in targeting secretory proteins to the rough endoplasmic reticulum (RER) membrane in eukaryotes, or to the plasma membrane in prokaryotes. SRP recognises the signal sequence of the nascent polypeptide on the ribosome, retards its elongation, and docks the SRP-ribosome-polypeptide complex to the RER membrane via the SR receptor. SRP consists of six polypeptides (SRP9, SRP14, SRP19, SRP54, SRP68 and SRP72) and a single 300 nucleotide 7S RNA molecule. The RNA component catalyses the interaction of SRP with its SR receptor. In higher eukaryotes, the SRP complex consists of the Alu domain and the S domain linked by the SRP RNA. The Alu domain consists of a heterodimer of SRP9 and SRP14 bound to the 5' and 3' terminal sequences of SRP RNA. This domain is necessary for retarding the elongation of the nascent polypeptide chain, which gives SRP time to dock the ribosome-polypeptide complex to the RER membrane.

    This entry represents the SRP19 subunit. The SRP19 protein is unstructured but forms a compact core domain and two extended RNA-binding loops upon binding the signal recognition particle (SRP) RNA.

    Proteins where this domain is known:
    PY05584   


    PF01926 - MMR_HSR1 (Pfam link)

    Interpro entry IPR002917 : GTP-binding protein, HSR1-related (Interpro link)

    Interpro description:
    Human HSR1, has been localized to the human MHC class I region and is highly homologous to a putative GTP-binding protein, MMR1 from mouse. These proteins represent a new subfamily of GTP-binding proteins that has both prokaryote and eukaryote members.

    Proteins where this domain is known:
    PY00019    PY00161    PY00616    PY00694    PY00735    PY02080    PY02266    PY02303    PY02306    PY02995    PY03875    PY03928    PY04363    PY04621    PY05368    PY06180    PY06396    PY06530    PY07173    PY07455   

    Proteins where this domain has been detected by our approach:
    PY03663    PY04853   


    PF01929 - Ribosomal_L14e (Pfam link)

    Interpro entry IPR002784 : Ribosomal protein L14 (Interpro link)

    Pfam description:
    This family includes the eukaryotic ribosomal protein L14.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This entry includes the eukaryotic ribosomal protein L14, which binds to the 60S ribosomal subunit, and archaebacterial ribosomal protein L14E, which binds to the 50S ribosomal subunit.

    Proteins where this domain is known:
    PY02173   


    PF01937 - DUF89 (Pfam link)

    Interpro entry IPR002791 : (Interpro link)

    Pfam description:
    This family has no known function.

    Interpro description:

    This entry contains uncharacterised proteins. Those with structural information consist of two domains: an all-alpha domain with a 3-helical bundle fold, and an alpha-beta domain in 3 layers, alpha/beta/alpha.

    Proteins where this domain is known:
    PY00322    PY00323   


    PF01938 - TRAM (Pfam link)

    Interpro entry IPR002792 : (Interpro link)

    Pfam description:
    This small domain has no known function. However it may perform a nucleic acid binding role (Bateman A. unpublished observation).

    Interpro description:

    The TRAM (after TRM2 and miaB) domain is a 60-70-residue-long module that is found in:

    The TRAM domain can be found alone or in association with other domains, such as the catalytic biotin/lipoate synthetase-like domain, the RNA methylase domain, the ribosomal S2 domain and the eIF2-beta domain. The TRAM domain is predicted to bind tRNA and deliver the RNA-modifying enzymatic domain to their targets. Secondary structure prediction indicates that the TRAM domain adopts a simple beta-barrel fold. The conservation pattern of the TRAM domain consists primarily of small and hydrophobic residues that correspond to five beta-strands in the predicted secondary structure.

    Proteins where this domain has been detected by our approach:
    PY01291   


    PF01951 - DUF101 (Pfam link)

    Interpro entry IPR002804 : (Interpro link)

    Pfam description:
    The members of this family are uncharacterised. The alignment of these proteins contains several conserved polar residues that might be potential catalytic residues.

    Interpro description:

    Proteins in this entry are found in archaea, bacteria and eukaryotes. Their function is unknown, but alignment shows several conserved polar residues which are potential catalytic residues. The structure of one of these proteins has been determined and shows homolgy to heat shock protein 33, which is a chaperone protein that inhibits the aggregation of partially denatured proteins.

    Proteins where this domain is known:
    PY00687   


    PF01965 - DJ-1_PfpI (Pfam link)

    Interpro entry IPR002818 : (Interpro link)

    Pfam description:
    The family includes the protease PfpI Swiss:Q51732. This domain is also found in transcriptional regulators such as Swiss:Q9RJG8.

    Interpro description:

    This signature defines a diverse group of protein families which include proteins involved in RNA-protein interaction regulation, thiamine biosynthesis, Ras-related signal transduction, and those with protease activity. Examples of annotation are:

    Proteins where this domain is known:
    PY04638   


    PF01966 - HD (Pfam link)

    Interpro entry IPR006674 : (Interpro link)

    Pfam description:
    HD domains are metal dependent phosphohydrolases.

    Interpro description:
    This domain is found in a superfamily of enzymes with a predicted or known phosphohydrolase activity. These enzymes appear to be involved in the nucleic acid metabolism, signal transduction and possibly other functions in bacteria, archaea and eukaryotes. The fact that all the highly conserved residues in the HD superfamily are histidines or aspartates suggests that coordination of divalent cations is essential for the activity of these proteins.

    Proteins where this domain is known:
    PY05854   


    PF01974 - tRNA_int_endo (Pfam link)

    Interpro entry IPR006677 : tRNA intron endonuclease, catalytic domain-like (Interpro link)

    Pfam description:
    Members of this family cleave pre tRNA at the 5\' and 3\' splice sites to release the intron EC:3.1.27.9.

    Interpro description:

    This entry represents a 3-layer alpha/beta/alpha domain found as the catalytic domain at the C-terminal in homotetrameric tRNA-intron endonucleases, and as domains 2 and 4 (C-terminal) in the homodimeric enzymes. tRNA-intron endonucleases remove tRNA introns by cleaving pre-tRNA at the 5'- and 3'-splice sites to release the intron. The products are an intron and two tRNA half-molecules bearing 2',3' cyclic phosphate and 5'-hydroxyl termini. These enzymes recognise a pseudosymmetric substrate in which 2 bulged loops of 3 bases are separated by a stem of 4 bp. Although homotetrameric enzymes contain four active sites, only two participate in the cleavage, and should therefore, be considered as a dimer of dimers.

    Proteins where this domain is known:
    PY01570   


    PF01979 - Amidohydro_1 (Pfam link)

    Interpro entry IPR006680 : Amidohydrolase 1 (Interpro link)

    Pfam description:
    This family of enzymes are a a large metal dependent hydrolase superfamily. The family includes Adenine deaminase EC:3.5.4.2 that hydrolyses adenine to form hypoxanthine and ammonia. Adenine deaminases reaction is important for adenine utilisation as a purine and also as a nitrogen source. This family also includes dihydroorotase and N-acetylglucosamine-6-phosphate deacetylases, EC:3.5.1.25 These enzymes catalyse the reaction N-acetyl-D-glucosamine 6-phosphate + H2O <=> D-glucosamine 6-phosphate + acetate. This family includes the catalytic domain of urease alpha subunit. Dihydroorotases (EC:3.5.2.3) are also included.

    Interpro description:

    This group of enzymes represents a large metal dependent hydrolase superfamily. The family includes adenine deaminase that hydrolyses adenine to form hypoxanthine and ammonia. The adenine deaminase reaction is important for adenine utilization as a purine and also as a nitrogen source. This family also includes dihydroorotase and N-acetylglucosamine-6-phosphate deacetylases. These enzymes catalyse the reaction:

     N-acetyl-D-glucosamine 6-phosphate + H2O = D-glucosamine 6-phosphate + acetate
    This family includes dihydroorotase and urease which belong to MEROPS peptidase family M38 (beta-aspartyl dipeptidase, clan MJ), where they are classified as non-peptidase homologs.

    Proteins where this domain is known:
    PY04293   


    PF01981 - PTH2 (Pfam link)

    Interpro entry IPR002833 : Peptidyl-tRNA hydrolase, PTH2 (Interpro link)

    Pfam description:
    Peptidyl-tRNA hydrolases are enzymes that release tRNAs from peptidyl-tRNA during translation.

    Interpro description:
    Peptidyl-tRNA hydrolases are enzymes that release tRNAs from peptidyl-tRNA during translation.

    Proteins where this domain is known:
    PY00497    PY05692   


    PF01984 - dsDNA_bind (Pfam link)

    Interpro entry IPR002836 : DNA-binding TFAR19-related protein (Interpro link)

    Pfam description:
    This domain is believed to bind double-stranded DNA of 20 bases length.

    Interpro description:

    This protein family is found in archaea and eukaryota. The human TFAR19 encodes a protein which shares significant homology to the corresponding proteins of species ranging from yeast to mice. TFAR19 exhibits a ubiquitous expression pattern and its expression is up-regulated in the tumour cells undergoing apoptosis. TFAR19 may play a general role in the apoptotic process. Also included in this family is a DNA-binding protein from the archaea, Methanobacterium thermoautotrophicum.

    Proteins where this domain is known:
    PY03132   


    PF01988 - DUF125 (Pfam link)

    Interpro entry IPR008217 : (Interpro link)

    Pfam description:
    This family of predicted integral membrane proteins has no known function. However it does include Swiss:P47818, that may have a role in regulating calcium levels.

    Interpro description:

    Proteins containing this entry have no known function and are predicted to be integral membrane proteins. They include the Ccc1 protein from Saccharomyces cerevisiae (Baker's yeast) that may have a role in regulating calcium levels.

    Proteins where this domain is known:
    PY06735   


    PF01990 - ATP-synt_F (Pfam link)

    Interpro entry IPR008218 : ATPase, V1/A1 complex, subunit F (Interpro link)

    Pfam description:
    This family includes 14-kDa subunit from vATPases, which is in the peripheral catalytic part of the complex. The family also includes archaebacterial ATP synthase subunit F.

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    The V-ATPases (or V1V0-ATPase) and A-ATPases (or A1A0-ATPase) are each composed of two linked complexes: the V1 or A1 complex contains the catalytic core that hydrolyses/synthesizes ATP, and the V0 or A0 complex that forms the membrane-spanning pore. The V- and A-ATPases both contain rotary motors, one that drives proton translocation across the membrane and one that drives ATP synthesis/hydrolysis . The V- and A-ATPases more closely resemble one another in subunit structure than they do the F-ATPases, although the function of A-ATPases is closer to that of F-ATPases.

    This entry represents subunit F found in the V1 complex of V-ATPases (both eukaryotic and bacterial), as well as in the A1 complex of A-ATPases. Subunit F is a 16 kDa protein that is required for the assembly and activity of V-ATPase, and has a potential role in the differential targeting and regulation of the enzyme for specific organelles. This subunit is not necessary for the rotation of the ATPase V1 rotor, but it does promote catalysis.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY00743   


    PF01991 - vATP-synt_E (Pfam link)

    Interpro entry IPR002842 : ATPase, V1/A1 complex, subunit E (Interpro link)

    Pfam description:
    This family includes the vacuolar ATP synthase E subunit, as well as the archaebacterial ATP synthase E subunit.

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    The V-ATPases (or V1V0-ATPase) and A-ATPases (or A1A0-ATPase) are each composed of two linked complexes: the V1 or A1 complex contains the catalytic core that hydrolyses/synthesizes ATP, and the V0 or A0 complex that forms the membrane-spanning pore. The V- and A-ATPases both contain rotary motors, one that drives proton translocation across the membrane and one that drives ATP synthesis/hydrolysis . The V- and A-ATPases more closely resemble one another in subunit structure than they do the F-ATPases, although the function of A-ATPases is closer to that of F-ATPases.

    This entry represents subunit E from the V1 and A1 complexes of V- and A-ATPases, respectively. Subunit E appears to form a tight interaction with subunit G in the F0 complex, which together may act as stators to prevent certain subunits from rotating with the central rotary element, much in the same way as the F0 complex subunit B does in F-ATPases. In addition to its key role in stator structure, subunit E appears to have a role in mediating interactions with putative regulatory subunits.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY06101   


    PF01992 - vATP-synt_AC39 (Pfam link)

    Interpro entry IPR002843 : ATPase, V0/A0 complex, subunit C/D (Interpro link)

    Pfam description:
    This family includes the AC39 subunit from vacuolar ATP synthase Swiss:P32366, and the C subunit from archaebacterial ATP synthase. The family also includes subunit C from the Sodium transporting ATP synthase from Enterococcus hirae Swiss:P43456.

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    The V-ATPases (or V1V0-ATPase) and A-ATPases (or A1A0-ATPase) are each composed of two linked complexes: the V1 or A1 complex contains the catalytic core that hydrolyses/synthesizes ATP, and the V0 or A0 complex that forms the membrane-spanning pore. The V- and A-ATPases both contain rotary motors, one that drives proton translocation across the membrane and one that drives ATP synthesis/hydrolysis . The V- and A-ATPases more closely resemble one another in subunit structure than they do the F-ATPases, although the function of A-ATPases is closer to that of F-ATPases.

    This entry represents subunit C from the A0 complex of A-ATPases, and subunits C and D from the V0 complex of V-ATPases, all of which are involved in the translocation of protons across a membrane. There is more than one type of D subunit in V-ATPases, where the D1 subunit is ubiquitous, while the D2 subunit has limited tissue expressivity, possibly to account for differential functions, targeting or regulation of V-ATPase activity .

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY04403   


    PF02002 - TFIIE_alpha (Pfam link)

    Interpro entry IPR002853 : Transcription factor TFIIE, alpha subunit (Interpro link)

    Pfam description:
    The general transcription factor TFIIE has an essential role in eukaryotic transcription initiation together with RNA polymerase II and other general factors. Human TFIIE consists of two subunits TFIIE-alpha Swiss:P29083 and TFIIE-beta Swiss:P29084 and joins the preinitiation complex after RNA polymerase II and TFIIF. This family consists of the conserved amino terminal region of eukaryotic TFIIE-alpha and proteins from archaebacteria that are presumed to be TFIIE-alpha subunits also Swiss:O29501.

    Interpro description:

    Initiation of eukaryotic mRNA transcription requires melting of promoter DNA with the help of the general transcription factors TFIIE and TFIIH. In higher eukaryotes, the general transcription factor TFIIE consists of two subunits: the large alpha subunit and the small beta. TFIIE beta has been found to bind to the region where the promoter starts to open to be single-stranded upon transcription initiation by RNA polymerase II. The approximately 120-residue central core domain of TFIIE beta plays a role in double-stranded DNA binding of TFIIE.

    The TFIIE beta central core DNA-binding domain consists of three helices with a beta hairpin at the C-terminus, resembling the winged helix proteins. It shows a novel double-stranded DNA-binding activity where the DNA-binding surface locates on the opposite side to the previously reported winged helix motif by forming a positively charged furrow.

    This entry represents the conserved amino terminal region of eukaryotic TFIIE-alpha and proteins from archaebacteria (TFE) that are also presumed to be TFIIE-alpha subunits.

    Proteins where this domain is known:
    PY00824   


    PF02005 - TRM (Pfam link)

    Interpro entry IPR002905 : N2,N2-dimethylguanosine tRNA methyltransferase (Interpro link)

    Pfam description:
    This enzyme EC:2.1.1.32 used S-AdoMet to methylate tRNA. The TRM1 gene of Saccharomyces cerevisiae is necessary for the N2,N2-dimethylguanosine modification of both mitochondrial and cytoplasmic tRNAs. The enzyme is found in both eukaryotes and archaebacteria

    Interpro description:
    This enzymeuses S-adenosyl-L-methionine to methylate tRNA:
     S-AdoMet + tRNA = S-adenosyl-L-homocysteine + tRNA containing N2-methylguanine
    The TRM1 gene of Saccharomyces cerevisiae is necessary for the N2,N2-dimethylguanosine modification of both mitochondrial and cytoplasmic tRNAs. The enzyme is found in both eukaryotes and archaea.

    Proteins where this domain is known:
    PY04503    PY07448   


    PF02020 - W2 (Pfam link)

    Interpro entry IPR003307 : (Interpro link)

    Pfam description:
    This domain of unknown function is found at the C-terminus of several translation initiation factors.

    Interpro description:

    This entry represents the W2 domain (two invariant tryptophans) and is a region of ~165 amino acids which is found in the C-terminus of the following eIFs:

    Translation initiation is a sophisticated, well regulated and highly coordinated cellular process in eukaryotes, in which at least 11 eukayrotic initiation factors (eIFs) are included.

    The W2 domain has a globular fold and is exclusively composed out of alpha-helices. The structure can be divided into a structural C-terminal core onto which the two N-terminal helices are attached. The core contains two aromatic/acidic residue-rich regions (AA boxes), which are important for mediating protein-protein interactions.

    The entry covers the entire W2 domain.

    Proteins where this domain has been detected by our approach:
    PY07150   


    PF02022 - Integrase_Zn (Pfam link)

    Interpro entry IPR003308 : Integrase, N-terminal zinc-binding domain (Interpro link)

    Pfam description:
    Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is composed of three domains. This domain is the amino-terminal domain zinc binding domain. The central domain is the catalytic domain Pfam:PF00665. The carboxyl terminal domain is a DNA binding domain Pfam:PF00552.

    Interpro description:

    Retroviral integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is composed of three domains: an N-terminal zinc binding domain, a central catalytic core and a C-terminal DNA-binding domain. Often found as part of the POL polyprotein.

    Proteins where this domain is known:
    PY07014   


    PF02045 - CBFB_NFYA (Pfam link)

    Interpro entry IPR001289 : CCAAT-binding transcription factor, subunit B (Interpro link)

    Interpro description:
    The CCAAT-binding factor (CBFB/NF-YA) is a mammalian transcription factor that binds to a CCAAT motif in the promoters of a wide variety of genes, including type I collagen and albumin. The factor is a heteromeric complex of A and B subunits, both of which are required for DNA-binding. The subunits can interact in the absence of DNA-binding, conserved regions in each being important in mediating this interaction.

    The B subunit contains a region of similarity with the yeast protein HAP2. For the B subunit it has been suggested that the N-terminal portion of the conserved region is involved in subunit interaction and the C-terminal region involved in DNA-binding.

    Proteins where this domain is known:
    PY04520   


    PF02099 - Josephin (Pfam link)

    Interpro entry IPR006155 : (Interpro link)

    Interpro description:
    Human genes containing triplet repeats can markedly expand in length, leading to neuropsychiatric disease. Expansion of triplet repeats explains the phenomenon of anticipation, i.e. the increasing severity or earlier age of onset in successive generations in a pedigree. A novel gene containing CAG repeats has been identified and mapped to chromosome 14q32.1, the genetic locus for Machado-Joseph disease (MJD). Normally, the gene contains 13-36 CAG repeats, but most clinically diagnosed patients and all affected members of a family with the clinical and pathological diagnosis of MJD show expansion of the repeat number, from 68-79. Similar abnormalities in related genes may give rise to diseases similar to MJD. MJD is a neurodegenerative disorder characterised by cerebellar ataxia, pyramidal and extra-pyramidal signs, peripheral nerve palsy, external ophtalmoplegia, facial and lingual fasciculation and bulging. The disease is autosomal dominant, with late onset of symptoms, generally after the fourth decade.

    Proteins where this domain is known:
    PY06998   


    PF02114 - Phosducin (Pfam link)

    Interpro entry IPR001200 : (Interpro link)

    Interpro description:
    The outer and inner segments of vertebrate rod photoreceptor cells contain phosducin, a soluble phosphoprotein that complexes with the beta/gamma-subunits of the GTP-binding protein, transducin. Light-induced changes in cyclic nucleotide levels modulate the phosphorylation of phosducin by protein kinase A. The protein is thought to participate in the regulation of visual phototransduction or in the integration of photo-receptor metabolism. Similar proteins have been isolated from the pineal gland and it is believed that the functional role of the protein is the same in both retina and pineal gland.

    Proteins where this domain is known:
    PY01011    PY03313   


    PF02121 - IP_trans (Pfam link)

    Interpro entry IPR001666 : Phosphatidylinositol transfer protein (Interpro link)

    Pfam description:
    Along with the structurally unrelated Sec14p family (found in Pfam:PF00650), this family can bind/exchange one molecule of phosphatidylinositol (PI) or phosphatidylcholine (PC) and thus aids their transfer between different membrane compartments. There are three sub-families - all share an N-terminal PITP-like domain, whose sequence is highly conserved. It is described as consisting of three regions. The N-terminal region is thought to bind the lipid and contains two helices and an eight-stranded, mostly antiparallel beta-sheet. An intervening loop region, which is thought to play a role in protein-protein interactions, separates this from the C-terminal region, which exhibits the greatest sequence variation and may be involved in membrane binding. PITP alpha (Swiss:Q00169) has a 16-fold greater affinity for PI than PC. Together with PITP beta (Swiss:P48739), it is expressed ubiquitously in all tissues.

    Interpro description:
    Phosphatidylinositol transfer protein (PITP) is a ubiquitous cytosolic protein, thought to be involved in transport of phospholipids from their site of synthesis in the endoplasmic reticulum and Golgi to other cell membranes. More recently, PITP has been shown to be an essential component of the polyphosphoinositide synthesis machinery and is hence required for proper signalling by epidermal growth factor and f-Met-Leu-Phe, as well as for exocytosis. The role of PITP in polyphosphoinositide synthesis may also explain its involvement in intracellular vesicular traffic.

    Proteins where this domain is known:
    PY00102   


    PF02127 - Peptidase_M18 (Pfam link)

    Interpro entry IPR001948 : Peptidase M18, aminopeptidase I (Interpro link)

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of metallopeptidases belong to the MEROPS peptidase family M18, (clan MH). The proteins have two catalytic zinc ions at the active site, bound by His/Asp, Asp, Glu, Asp/Glu and His. The catalysed reaction involves the release of an N-terminal aminoacid, usually neutral or hydrophobic, from a polypeptide.

    The type example is aminopeptidase I from Saccharomyces cerevisiae (Baker's yeast), the sequence of which has been deduced, and the mature protein shown to consist of 469 amino acids. A 45-residue presequence contains both positively- and negatively-charged and hydrophobic residues, which could be arranged in an N-terminal amphiphilic alpha-helix. The presequence differs from signal sequences that direct proteins across bacterial plasma membranes and endoplasmic reticulum or into mitochondria. It is unclear how this unique presequence targets aminopeptidase I to yeast vacuoles, and how this sorting utilises classical protein secretory pathways.

    Proteins where this domain is known:
    PY03205   


    PF02130 - UPF0054 (Pfam link)

    Interpro entry IPR002036 : (Interpro link)

    Interpro description:

    These, as yet, uncharacterised proteins are of 17 to 21 kDa. They contain a conserved region with three histidines at the C terminus. The protein family is represented by a single member sequence only in nearly every bacterium.

    The crystal structure of the protein from the hyperthermophilic bacteria Aquifex aeolicus has been determined. The overall fold consists of one central alpha-helix surrounded by a four-stranded beta-sheet and four other alpha-helices. Structure-based homology analysis reveals a good resemblance to the metal-dependent proteinases such as collagenases and gelatinases. However, experimental tests for collagenase and gelatinase-type function show no detectable activity under standard assay conditions.

    Proteins where this domain is known:
    PY06285   


    PF02134 - UBACT (Pfam link)

    Interpro entry IPR000127 : Ubiquitin-activating enzyme repeat (Interpro link)

    Interpro description:

    The post-translational attachment of ubiquitin to proteins (ubiquitinylation) alters the function, location or trafficking of a protein, or targets it to the 26S proteasome for degradation. Ubiquitinylation is an ATP-dependent process that involves the action of at least three enzymes: a ubiquitin-activating enzyme (E1), a ubiquitin-conjugating enzyme (E2), and a ubiquitin ligase (E3, which work sequentially in a cascade. The E1 enzyme is responsible for activating ubiquitin, the first step in ubiquitinylation. The E1 enzyme hydrolyses ATP and adenylates the C-terminal glycine residue of ubiquitin, and then links this residue to the active site cysteine of E1, yielding a ubiquitin-thioester and free AMP. To be fully active, E1 must non-covalently bind to and adenylate a second ubiquitin molecule. The E1 enzyme can then transfer the thioester-linked ubiquitin molecule to a cysteine residue on the ubiquitin-conjugating enzyme, E2, in an ATP-dependent reaction.

    This domain is found 2 times in each member of the ubiquitin activating enzymes and is located downstream of the active site cysteine.

    Proteins where this domain is known:
    PY01879    PY05539   

    Proteins where this domain has been detected by our approach:
    PY06413   


    PF02136 - NTF2 (Pfam link)

    Interpro entry IPR002075 : Nuclear transport factor 2 (Interpro link)

    Pfam description:
    This family includes the NTF2-like Delta-5-3-ketosteroid isomerase proteins.

    Interpro description:

    Ran is an evolutionary conserved member of the Ras superfamily of small GTPases that regulates all receptor-mediated transport between the nucleus and the cytoplasm. Import receptors bind their cargos in the cytoplasm where the concentration of RanGTP is low and release their cargos in the nucleus where the concentration of RanGTP is high. Export receptors respond to Ran GTP in the opposite manner.

    Nuclear transport factor 2 (NTF2) is a homodimer of approximately 14kDa subunits which stimulates efficient nuclear import of a cargo protein. NTF2 binds to both RanGDP and FxFG repeat-containing nucleoporins. NTF2 binds to RanGDP sufficiently strongly for the complex to remain intact during transport through NPCs, but the interaction between NTF2 and FxFG nucleoporins is much more transient, which would enable NTF2 to move through the NPC by hopping from one repeat to another.

    NTF2 folds into a cone with a deep hydrophobic cavity, the opening of which is surrounded by several negatively charged residues. RanGDP binds to NTF2 by inserting a conserved phenylalanine residue into the hydrophobic pocket of NTF2 and making electrostatic interactions with the conserved negatively charged residues that surround the cavity.

    A structurally similar domain appears in other nuclear import proteins.

    Proteins where this domain is known:
    PY00424   


    PF02138 - Beach (Pfam link)

    Interpro entry IPR000409 : (Interpro link)

    Interpro description:

    The "beige" mouse is established as an animal model of Chediak-Higashi Syndrome (CHS). The BEACH domain was described in the BEIGE protein (D1035670) and in the highly homologous CHS protein It is also found in distantly related proteins like, for example,andwhich are factor associated with neutral sphingomyelinase activation.

    The BEACH domain is usually followed by a series of WD repeats. The function of the BEACH domain is unknown.

    Proteins where this domain is known:
    PY01158   


    PF02142 - MGS (Pfam link)

    Interpro entry IPR011607 : (Interpro link)

    Pfam description:
    This domain composes the whole protein of methylglyoxal synthetase and the domain is also found in Carbamoyl phosphate synthetase (CPS) where it forms a regulatory domain that binds to the allosteric effector ornithine. This family also includes inosicase. The known structures in this family show a common phosphate binding site.

    Interpro description:

    This domain composes the whole protein of methylglyoxal synthetase and the domain is also found in Carbamoyl phosphate synthetase (CPS) where it forms a regulatory domain that binds to the allosteric effector ornithine. This family also includes inosicase. The known structures in this family show a common phosphate binding site.

    Proteins where this domain is known:
    PY06257   


    PF02146 - SIR2 (Pfam link)

    Interpro entry IPR003000 : NAD-dependent histone deacetylase, silent information regulator Sir2 (Interpro link)

    Pfam description:
    This region is characteristic of Silent information regulator 2 (Sir2) proteins, or sirtuins. These are protein deacetylases that depend on nicotine adenine dinucleotide (NAD). They are found in many subcellular locations, including the nucleus, cytoplasm and mitochondria. Eukaryotic forms play in important role in the regulation of transcriptional repression. Moreover, they are involved in microtubule organisation and DNA damage repair processes.

    Interpro description:
    These sequences represent the Sir2 family of NAD+-dependent deacetylases. Silent Information Regulator protein of Saccharomyces cerevisiae (Sir2p) is one of several factors critical for silencing at least three loci. Among them, it is unique because it silences the rDNA as well as the mating type loci and telomeres. Sir2p interacts in a complex with itself and with Sir3p and Sir4p, two proteins that are able to interact with nucleosomes. In addition Sir2p also interacts with ubiquitination factors and/or complexes. Unlike Sir3p and Sir4p, for which no homologues are known, Sir2p is part of a multigene family in yeast, the homolgues being HST1, HST2, HST3 and HST4. Highly conserved structural homologues also occur in other organisms ranging from bacteria to man and plants. Proteins of this family have been proposed to play a role in silencing, chromosome stability and agein. In addition, an in vitro ADP ribosyltransferase activity has been associated with Escherichia coli and human members of this family. Homologues of Sir2 share a core domain including the GAG and NID motifs and a putative C4 Zinc finger. The regions containing these three conserved motifs are individually essential for Sir2 silencing function, as are the four cysteins. In addition, the conserved residues HG next to the putative Zn finger have been shown to be essential for the ADP ribosyltransferase activity. Sir2-like enzymes catalyze a reaction in which the cleavage of NAD(+)and histone and/or protein deacetylation are coupled to the formation of O-acetyl-ADP-ribose, a novel metabolite. The dependence of the reaction on both NAD(+) and the generation of this potential second messenger offers new clues to understanding the function and regulation of nuclear, cytoplasmic and mitochondrial Sir2-like enzymes.

    Proteins where this domain is known:
    PY01554    PY01626   


    PF02148 - zf-UBP (Pfam link)

    Interpro entry IPR001607 : Zinc finger, UBP-type (Interpro link)

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents UBP-type zinc finger domains, which display some similarity with the Zn-binding domain of the insulinase family. The UBP-type zinc finger domain is found only in a small subfamily of ubiquitin C-terminal hydrolases (deubiquitinases or UBP), All members of this subfamily are isopeptidase-T, which are known to cleave isopeptide bonds between ubiquitin moieties.

    Some of the proteins containing an UBP zinc finger include:

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY00546   

    Proteins where this domain has been detected by our approach:
    PY03410   


    PF02150 - RNA_POL_M_15KD (Pfam link)

    Interpro entry IPR001529 : DNA-directed RNA polymerase, M/15 kDa subunit (Interpro link)

    Interpro description:

    DNA-directed RNA polymerases(also known as DNA-dependent RNA polymerases) are responsible for the polymerisation of ribonucleotides into a sequence complementary to the template DNA. In eukaryotes, there are three different forms of DNA-directed RNA polymerases transcribing different sets of genes. Most RNA polymerases are multimeric enzymes and are composed of a variable number of subunits. The core RNA polymerase complex consists of five subunits (two alpha, one beta, one beta-prime and one omega) and is sufficient for transcription elongation and termination but is unable to initiate transcription. Transcription initiation from promoter elements requires a sixth, dissociable subunit called a sigma factor, which reversibly associates with the core RNA polymerase complex to form a holoenzyme. The core RNA polymerase complex forms a "crab claw"-like structure with an internal channel running along the full length. The key functional sites of the enzyme, as defined by mutational and cross-linking analysis, are located on the inner wall of this channel.

    RNA synthesis follows after the attachment of RNA polymerase to a specific site, the promoter, on the template DNA strand. The RNA synthesis process continues until a termination sequence is reached. The RNA product, which is synthesised in the 5' to 3'direction, is known as the primary transcript. Eukaryotic nuclei contain three distinct types of RNA polymerases that differ in the RNA they synthesise:

    Eukaryotic cells are also known to contain separate mitochondrial and chloroplast RNA polymerases. Eukaryotic RNA polymerases, whose molecular masses vary in size from 500 to 700 kD, contain two non-identical large (>100 kDa) subunits and an array of up to 12 different small (less than 50 kDa) subunits.

    In archaebacteria, there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 polypeptides. It has recently been shown that small subunits of about 15 kDa, found in polymerase types I and II, are highly conserved. These proteins contain a probable zinc finger in their N-terminal region and a C-terminal zinc ribbon domain.

    Proteins where this domain is known:
    PY00422   


    PF02163 - Peptidase_M50 (Pfam link)

    Interpro entry IPR008915 : Peptidase M50 (Interpro link)

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This entry contains metallopeptidases belonging to MEROPS peptidase family M50 (S2P protease family, clan MM).

    Members of the M50 metallopeptidase family include: mammalian sterol-regulatory element binding protein (SREBP) site 2 protease, Escherichia coli protease EcfE, stage IV sporulation protein FB and various hypothetical bacterial and eukaryotic homologues. A number of proteins are classified as non-peptidase homologues as they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity.

    Proteins where this domain is known:
    PY00562   


    PF02167 - Cytochrom_C1 (Pfam link)

    Interpro entry IPR002326 : Cytochrome c1 (Interpro link)

    Interpro description:
    Cytochrome bc1 complex (ubiquinol:ferricytochrome c oxidoreductase) is found in mitochondria, photosynthetic bacteria and other prokaryotes. It is minimally composed of three subunits: cytochrome b, carrying a low- and a high-potential haem group; cytochrome c1 (cyt c1); and a high-potential Rieske iron-sulphur protein. The general function of the complex is electron transfer between two mobile redox carriers, ubiquinol and cytochrome c; the electron transfer is coupled with proton translocation across the membrane, thus generating proton-motive force in the form of an electrochemical potential that can drive ATP synthesis. In its structure and functions, the cytochrome bc1 complex bears extensive analogy to the cytochrome b6f complex of chloroplasts and cyanobacteria; cyt c1 plays an analogous role to cytochrome f, in spite of their different structures.

    Proteins where this domain is known:
    PY03278   


    PF02178 - AT_hook (Pfam link)

    Interpro entry IPR017956 : AT hook, DNA-binding, conserved site (Interpro link)

    Pfam description:
    At hooks are DNA binding motifs with a preference for A/T rich regions.

    Interpro description:

    AT hooks are DNA-binding motifs with a preference for A/T rich regions. These motifs are found in a variety of proteins, including the high mobility group (HMG) proteins, in DNA-binding proteins from plants and in hBRG1 protein, a central ATPase of the human switching/sucrose non-fermenting (SWI/SNF) remodeling complex.

    High mobility group (HMG) proteins are a family of relatively low molecular weight non-histone components in chromatin. HMG-I and HMG-Y (HMGA) are proteins of about 100 amino acid residues which are produced by the alternative splicing of a single gene. HMG-I/Y proteins bind preferentially to the minor groove of AT-rich regions in double-stranded DNA in a non-sequence specific manner. It is suggested that these proteins could function in nucleosome phasing and in the 3' end processing of mRNA transcripts. They are also involved in the transcription regulation of genes containing, or in close proximity to, AT-rich regions.

    Proteins where this domain is known:
    PY00764    PY05495    PY07351   

    Proteins where this domain has been detected by our approach:
    PY03808    PY05553   


    PF02181 - FH2 (Pfam link)

    Interpro entry IPR015425 : (Interpro link)

    Interpro description:

    Formin homology (FH) proteins play a crucial role in the reorganization of the actin cytoskeleton, which mediates various functions of the cell cortex including motility, adhesion, and cytokinesis. Formins are multidomain proteins that interact with diverse signalling molecules and cytoskeletal proteins, although some formins have been assigned functions within the nucleus. Formins are characterised by the presence of three FH domains (FH1, FH2 and FH3), although members of the formin family do not necessarily contain all three domains. The proline-rich FH1 domain mediates interactions with a variety of proteins, including the actin-binding protein profilin, SH3 (Src homology 3) domain proteins, and WW domain proteins. The FH2 domain is required for the self-association of formin proteins through the ability of FH2 domains to directly bind each other, and may also act to inhibit actin polymerisation. The FH3 domain is less well conserved and may be important for determining intracellular localisation of formin family proteins. In addition, some formins can contain a GTPase-binding domain (GBD) required for binding to Rho small GTPases, and a C-terminal conserved Dia-autoregulatory domain (DAD).

    This entry represents the FH2 domain, which was shown by X-ray crystallography to have an elongated, crescent shape containing three helical subdomains.

    Proteins where this domain is known:
    PY01292    PY01855    PY07114   


    PF02182 - YDG_SRA (Pfam link)

    Interpro entry IPR003105 : (Interpro link)

    Pfam description:
    The function of this domain is unknown, it contains a conserved motif YDG after which it has been named.

    Interpro description:

    This domain has been termed SRAÂYDG, for SET and Ring finger Associated, and because of the conserved YDG motif within the domain. Further characteristics of the domain are the conservation of up to 13 evenly spaced glycine residues and a VRV(I/V)RG motif. The domain is mainly found in plants and animals and in bacteria. In animals, this domain is associated with the Np95-like ring finger protein and the related gene product Np97, which contains PHD and RING FINGER domains and which is an important determinant in cell cycle progression. Np95 is a chromatin-associated ubiquitin ligase, binding to histones is direct and shows a remarkable preference for histone H3 and its N-terminal tail. The SRA-YDG domain contained in Np95 is indispensable both for the interaction with histones and for chromatin binding in vivo. In plants the SRA-YDG domain is associated with the SET domain, found in a family of histone methyl transferases, and in bacteria it is found in association with HNH, a non-specific nuclease motif.

    Proteins where this domain has been detected by our approach:
    PY00637   


    PF02184 - HAT (Pfam link)

    Interpro entry IPR003107 : RNA-processing protein, HAT helix (Interpro link)

    Pfam description:
    The HAT (Half A TPR) repeat is found in several RNA processing proteins.

    Interpro description:

    The HAT (Half A TPR) repeat has a repetitive pattern characterised by three aromatic residues with a conserved spacing. They are structurally and sequentially similar to TPRs (tetratricopeptide repeats), though they lack the highly conserved alanine and glycine residues found in TPRs. The number of HAT repeats found in different proteins varies between 9 and 12. HAT-repeat-containing proteins appear to be components of macromolecular complexes that are required for RNA processing. The repeats may be involved in protein-protein interactions. The HAT motif has striking structural similarities to HEAT repeats, being of a similar length and consisting of two short helices connected by a loop domain, as in HEAT repeats.

    Proteins where this domain is known:
    PY02447   

    Proteins where this domain has been detected by our approach:
    PY03704   


    PF02190 - LON (Pfam link)

    Interpro entry IPR003111 : Peptidase S16, lon N-terminal (Interpro link)

    Interpro description:

    Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases.

    Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base. The geometric orientations of the catalytic residues are similar between families, despite different protein folds. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC).

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This signature defines the N-terminal domain of the archael, bacterial and eukaryotic lon proteases, which are ATP-dependent serine peptidases belonging to the MEROPS peptidase family S16 (lon protease family, clan SF). In the eukaryotes the majority of the proteins are located in the mitochondrial matrix. In yeast, Pim1, is located in the mitochondrial matrix, is required for mitochondrial function, is constitutively expressed but is increased after thermal stress, suggesting that Pim1 may play a role in the heat shock response.

    Proteins where this domain is known:
    PY04458   

    Proteins where this domain has been detected by our approach:
    PY06314   


    PF02191 - OLF (Pfam link)

    Interpro entry IPR003112 : (Interpro link)

    Interpro description:

    The olfactomedin-domain was first identified in olfactomedin, an extracellular matrix protein of the olfactory neuroepithelium. Members of this extracellular domain-family have since been shown to be present in several metazoan proteins, such as latrophilins, myocilins, optimedins and noelins, the latter being involved in the generation of neural crest cells. Myocilin is of considerable interest, as mutations in its olfactomedin-domain can lead to glaucoma. The olfactomedin-domains in myocilin and optimedin are essential for the interaction between these two proteins.

    Proteins where this domain has been detected by our approach:
    PY03876   


    PF02201 - SWIB (Pfam link)

    Interpro entry IPR003121 : (Interpro link)

    Pfam description:
    This family includes the SWIB domain and the MDM2 domain. The p53-associated protein (MDM2) is an inhibitor of the p53 tumour suppressor gene binding the transactivation domain and down regulating the ability of p53 to activate transcription. This family contains the p53 binding domain of MDM2.

    Interpro description:

    The SWI/SNF family of complexes, which are conserved from yeast to humans, are ATP-dependent chromatin-remodelling proteins that facilitate transcription activation. The mammalian complexes are made up of 9-12 proteins called BAFs (BRG1-associated factors). The BAF60 family have at least three members: BAF60a, which is ubiquitous, BAF60b and BAF60c, which are expressed in muscle and pancreatic tissues, respectively. BAF60b is present in alternative forms of the SWI/SNF complex, including complex B (SWIB), which lacks BAF60a. The SWIB domain is a conserved region found within the BAF60b proteins, and can be found fused to the C-terminus of DNA topoisomerase in Chlamydia.

    MDM2 is an oncoprotein that acts as a cellular inhibitor of the p53 tumour suppressor by binding to the transactivation domain of p53 and suppressing its ability to activate transcription. p53 acts in response to DNA damage, inducing cell cycle arrest and apoptosis. Inactivation of p53 is a common occurrence in neoplastic transformations. The core of MDM2 folds into an open bundle of four helices, which is capped by two small 3-stranded beta-sheets. It consists of a duplication of two structural repeats. MDM2 has a deep hydrophobic cleft on which the p53 alpha-helix binds; p53 residues involved in transactivation are buried deep within the cleft of MDM2, thereby concealing the p53 transactivation domain.

    The SWIB and MDM2 domains are homologous and share a common fold.

    Proteins where this domain is known:
    PY02684    PY03119   


    PF02204 - VPS9 (Pfam link)

    Interpro entry IPR003123 : (Interpro link)

    Interpro description:
    This domain is present in yeast vacuolar sorting protein 9 and other proteins.

    Proteins where this domain is known:
    PY00538    PY05865   


    PF02207 - zf-UBR (Pfam link)

    Interpro entry IPR003126 : Zinc finger, N-recognin (Interpro link)

    Pfam description:
    This region is found in E3 ubiquitin ligases that recognise N-recognins.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    The N-end rule-based degradation signal, which targets a protein for ubiquitin-dependent proteolysis, comprises a destabilizing amino-terminal residue and a specific internal lysine residue. This entry describes a putative zinc finger in N-recognin, a recognition component of the N-end rule pathway.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY03832   


    PF02212 - GED (Pfam link)

    Interpro entry IPR003130 : Dynamin GTPase effector (Interpro link)

    Interpro description:

    Dynamin GTPase effector domain found in proteins related to dynamin.

    Dynamin is a GTP-hydrolysing protein that is an essential participant in clathrin-mediated endocytosis by cells. It self-assembles into 'collars' in vivo at the necks of invaginated coated pits; the self-assembly of dynamin being coordinated by the GTPase domain. Mutation studies indicate that dynamin functions as a molecular regulator of receptor-mediated endocytosis.

    Proteins where this domain is known:
    PY00714    PY04073   


    PF02213 - GYF (Pfam link)

    Interpro entry IPR003169 : (Interpro link)

    Pfam description:
    The GYF domain is named because of the presence of Gly-Tyr-Phe residues. The GYF domain is a proline-binding domain in CD2-binding protein Swiss:O95400.

    Interpro description:

    The glycine-tyrosine-phenylalanine (GYF) domain is an around 60-amino acid domain which contains a conserved GP[YF]xxxx[MV]xxWxxx[GN]YF motif. It was identified in the human intracellular protein termed CD2 binding protein 2 (CD2BP2), which binds to a site containing two tandem PPPGHR segments within the cytoplasmic region of CD2. Binding experiments and mutational analyses have demonstrated the critical importance of the GYF tripeptide in ligand binding. A GYF domain is also found in several other eukaryotic proteins of unknown function . It has been proposed that the GYF domain found in these proteins could also be involved in proline-rich sequence recognition. Resolution of the structure of the CD2BP2 GYF domain by NMR spectroscopy revealed a compact domain with a beta-beta-alpha-beta-beta topology, where the single alpha-helix is tilted away from the twisted, anti-parallel beta-sheet. The conserved residues of the GYF domain create a contiguous patch of predominantly hydrophobic nature which forms an integral part of the ligand-binding site. There is limited homology within the C-terminal 20-30 amino acids of various GYF domains, supporting the idea that this part of the domain is structurally but not functionally important.

    Proteins where this domain is known:
    PY02334    PY04040    PY04822   


    PF02214 - K_tetra (Pfam link)

    Interpro entry IPR003131 : Potassium channel, voltage dependent, Kv, tetramerisation (Interpro link)

    Pfam description:
    The N-terminal, cytoplasmic tetramerisation domain (T1) of voltage-gated K+ channels encodes molecular determinants for subfamily-specific assembly of alpha-subunits into functional tetrameric channels. It is distantly related to the BTB/POZ domain Pfam:PF00651.

    Interpro description:

    Potassium channels are the most diverse group of the ion channel family. They are important in shaping the action potential, and in neuronal excitability and plasticity. The potassium channel family is composed of several functionally distinct isoforms, which can be broadly separated into 2 groups: the practically non-inactivating 'delayed' group and the rapidly inactivating 'transient' group.

    These are all highly similar proteins, with only small amino acid changes causing the diversity of the voltage-dependent gating mechanism, channel conductance and toxin binding properties. Each type of K+ channel is activated by different signals and conditions depending on their type of regulation: some open in response to depolarisation of the plasma membrane; others in response to hyperpolarisation or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter, together with intracellular kinases; while others are regulated by GTP-binding proteins or other second messengers. In eukaryotic cells, K+ channels are involved in neural signalling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors (GPCRs) and may have a role in target cell lysis by cytotoxic T-lymphocytes. In prokaryotic cells, they play a role in the maintenance of ionic homeostasis.

    All K+ channels discovered so far possess a core of alpha subunits, each comprising either one or two copies of a highly conserved pore loop domain (P-domain). The P-domain contains the sequence (T/SxxTxGxG), which has been termed the K+ selectivity sequence. In families that contain one P-domain, four subunits assemble to form a selective pathway for K+ across the membrane. However, it remains unclear how the 2 P-domain subunits assemble to form a selective pore. The functional diversity of these families can arise through homo- or hetero-associations of alpha subunits or association with auxiliary cytoplasmic beta subunits. K+ channel subunits containing one pore domain can be assigned into one of two superfamilies: those that possess six transmembrane (TM) domains and those that possess only two TM domains. The six TM domain superfamily can be further subdivided into conserved gene families: the voltage-gated (Kv) channels; the KCNQ channels (originally known as KvLQT channels); the EAG-like K+ channels; and three types of calcium (Ca)-activated K+ channels (BK, IK and SK). The 2TM domain family comprises inward-rectifying K+ channels. In addition, there are K+ channel alpha-subunits that possess two P-domains. These are usually highly regulated K+ selective leak channels.

    The Kv family can be divided into several subfamilies on the basis of sequence similarity and function. Four of these subfamilies, Kv1 (Shaker), Kv2 (Shab), Kv3 (Shaw) and Kv4 (Shal), consist of pore-forming alpha subunits that associate with different types of beta subunit. Each alpha subunit comprises six hydrophobic TM domains with a P-domain between the fifth and sixth, which partially resides in the membrane. The fourth TM domain has positively charged residues at every third residue and acts as a voltage sensor, which triggers the conformational change that opens the channel pore in response to a displacement in membrane potential. More recently, 4 new electrically-silent alpha subunits have been cloned: Kv5 (KCNF), Kv6 (KCNG), Kv8 and Kv9 (KCNS). These subunits do not themselves possess any functional activity, but appear to form heteromeric channels with Kv2 subunits, and thus modulate Shab channel activity. When highly expressed, they inhibit channel activity, but at lower levels show more specific modulatory actions.

    The N-terminal, cytoplasmic tetramerization domain (T1) of voltage-gated potassium channels encodes molecular determinants for subfamily-specific assembly of alpha-subunits into functional tetrameric channels. This domain is found in a subset of a larger group of proteins that contain the BTB/POZ domain.

    Proteins where this domain is known:
    PY02946   


    PF02223 - Thymidylate_kin (Pfam link)

    Interpro entry IPR000062 : Thymidylate kinase-like (Interpro link)

    Interpro description:

    Thymidylate kinase (dTMP kinase) catalyzes the phosphorylation of thymidine 5'-monophosphate (dTMP) to form thymidine 5'-diphosphate (dTDP) in the presence of ATP and magnesium:

     ATP + thymidine 5'-phosphate = ADP + thymidine 5'-diphosphate  

    Thymidylate kinase is an ubiquitous enzyme of about 25 Kd and is important in the dTTP synthesis pathway for DNA synthesis. The function of dTMP kinase in eukaryotes comes from the study of a cell cycle mutant, cdc8, in Saccharomyces cerevisiae. Structural and functional analyses suggest that the cDNA codes for authentic human dTMP kinase. The mRNA levels and enzyme activities corresponded to cell cycle progression and cell growth stages.

    This entry reprsents known and predicted kinases, and related enzymes such as UMP-CMP kinase.

    Proteins where this domain is known:
    PY01336   


    PF02233 - PNTB (Pfam link)

    Interpro entry IPR004003 : NAD(P) transhydrogenase, beta subunit (Interpro link)

    Pfam description:
    This family corresponds to the beta subunit of NADP transhydrogenase in prokaryotes, and either the protein N- or C terminal in eukaryotes. The domain is often found in conjunction with Pfam:PF01262. Pyridine nucleotide transhydrogenase catalyses the reduction of NAD+ to NADPH. A complete loss of activity occurs upon mutation of Gly314 in E. coli.

    Interpro description:
    Alanine dehydrogenases and pyridine nucleotide transhydrogenase have been shown to share regions of similarity. Alanine dehydrogenase catalyzes the NAD-dependent reversible reductive amination of pyruvate into alanine. Pyridine nucleotide transhydrogenase catalyzes the reduction of NADP+ to NADPH with the concomitant oxidation of NADH to NAD+. This enzyme is located in the plasma membrane of prokaryotes and in the inner membrane of the mitochondria of eukaryotes. The transhydrogenation between NADH and NADP is coupled with the translocation of a proton across the membrane. In prokaryotes the enzyme is composed of two different subunits, an alpha chain (gene pntA) and a beta chain (gene pntB), while in eukaryotes it is a single chain protein. The sequence of alanine dehydrogenase from several bacterial species are related with those of the alpha subunit of bacterial pyridine nucleotide transhydrogenase and of the N-terminal half of the eukaryotic enzyme. The two most conserved regions correspond respectively to the N-terminal extremity of these proteins and to a central glycine-rich region which is part of the NAD(H)-binding site.

    Proteins where this domain is known:
    PY05907   


    PF02237 - BPL_C (Pfam link)

    Interpro entry IPR003142 : Biotin protein ligase, C-terminal (Interpro link)

    Pfam description:
    The function of this structural domain is unknown. It is found to the C terminus of the biotin protein ligase catalytic domain Pfam:PF01317.

    Interpro description:

    This C-terminal domain has an SH3-like barrel fold, the function of which is unknown. It is found associated with prokaryotic bifunctional transcriptional repressors and eukaryotic enzymes involved in biotin utilization.

    In Escherichia coli the biotin operon repressor (BirA) is a bifunctional protein. BirA acts both as the acetyl-coA carboxylase biotin holoenzyme synthetase and as the biotin operon repressor. DNA sequence analysis of mutations indicates that the helix-turn-helix DNA binding region is located at the N-terminus while mutations affecting enzyme function, although mapping over a large region, are found mainly in the central part of the protein's primary sequence.

    Proteins where this domain has been detected by our approach:
    PY01917   


    PF02245 - Pur_DNA_glyco (Pfam link)

    Interpro entry IPR003180 : Methylpurine-DNA glycosylase (MPG) (Interpro link)

    Pfam description:
    Methylpurine-DNA glycosylase is a base excision-repair protein. It is responsible for the hydrolysis of the deoxyribose N-glycosidic bond, excising 3-methyladenine and 3-methylguanine from damaged DNA.

    Interpro description:

    Methylpurine-DNA glycosylase is a base excision-repair protein. It is responsible for the hydrolysis of the deoxyribose N-glycosidic bond, excising 3-methyladenine and 3-methylguanine from damaged DNA. Its action is induced by alkylating chemotherapeutics, as well as deaminated and lipid peroxidation-induced purine adducts. MPG without an N-terminal extension excises hypoxanthine with one-third of the efficiency of full-length MPG under similar conditions, suggesting that is function may largely be attributable to the N-terminal extension.

    Proteins where this domain is known:
    PY05580   


    PF02252 - PA28_beta (Pfam link)

    Interpro entry IPR003186 : Proteasome activator pa28, REG beta subunit (Interpro link)

    Pfam description:
    PA28 activator complex (also known as 11s regulator of 20S proteasome) is a ring shaped hexameric structure of alternating alpha and beta subunits. This family represents the beta subunit. The activator complex binds to the 20S proteasome ana simulates peptidase activity in and ATP-independent manner.

    Interpro description:

    PA28 activator complex (also known as 11S regulator of 20S proteasome) is a ring shaped hexameric structure of alternating alpha (PA28alpha) and beta (PA28beta) subunits. The catalytic properties of PA28alpha and PA28beta-activated proteosome are similar. This entry represents the beta subunit. The activator complex binds to the 20S proteasome and stimulates peptidase activity in and ATP-independent manner.

    Proteins where this domain is known:
    PY04117   


    PF02263 - GBP (Pfam link)

    Interpro entry IPR015894 : Guanylate-binding protein, N-terminal (Interpro link)

    Pfam description:
    Transcription of the anti-viral guanylate-binding protein (GBP) is induced by interferon-gamma during macrophage induction. This family contains GBP1 and GPB2, both GTPases capable of binding GTP, GDP and GMP.

    Interpro description:

    Guanylate-binding protein is a GTPase that is induced by interferon (IFN)-gamma. GTPases induced by IFN-gamma are key to the protective immunity against microbial and viral pathogens. These GTPases are classified into three groups: the small 47-kd GTPases, the Mx proteins, and the large 65- to 67-kd GTPases. Guanylate-binding proteins (GBP) fall into the last class. In humans, there are seven GBPs (hGBP1-7). Structurally, hGBP1 consists of two domains: a compact globular N-terminal domain harbouring the GTPase function, and an alpha-helical finger-like C-terminal domain. Human GBP1 is secreted from cells without the need of a leader peptide, and has been shown to exhibit antiviral activity against Vesicular stomatitis virus and Encephalomyocarditis virus, as well as being able to regulate the inhibition of proliferation and invasion of endothelial cells in response to IFN-gamma.

    Proteins where this domain is known:
    PY05870   


    PF02272 - DHHA1 (Pfam link)

    Interpro entry IPR003156 : Phosphoesterase, DHHA1 (Interpro link)

    Pfam description:
    This domain is often found adjacent to the DHH domain Pfam:PF01368 and is called DHHA1 for DHH associated domain. This domain is diagnostic of DHH subfamily 1 members. This domains is also found in alanyl tRNA synthetase e.g. Swiss:P00957, suggesting that this domain may have an RNA binding function. The domain is about 60 residues long and contains a conserved GG motif.

    Interpro description:

    This domain is often found adjacent to the DHH domain, found in the RecJ-like phosphoesterase family and is called DHHA1 for DHH associated domain. DHHA1 is diagnostic of DHH subfamily 1 members. This domain is also found in alanyl tRNA synthetase e.g. suggesting that it may have an RNA binding function. The domain is about 60 residues long and contains a conserved GG motif.

    Proteins where this domain has been detected by our approach:
    PY03081   


    PF02290 - SRP14 (Pfam link)

    Interpro entry IPR003210 : Signal recognition particle, SRP14 subunit (Interpro link)

    Pfam description:
    The signal recognition particle (SRP) is a multimeric protein involved in targeting secretory proteins to the rough endoplasmic reticulum membrane. SRP14 and SRP9 form a complex essential for SRP RNA binding.

    Interpro description:

    The signal recognition particle (SRP) is a multimeric protein, which along with its conjugate receptor (SR), is involved in targeting secretory proteins to the rough endoplasmic reticulum (RER) membrane in eukaryotes, or to the plasma membrane in prokaryotes. SRP recognises the signal sequence of the nascent polypeptide on the ribosome, retards its elongation, and docks the SRP-ribosome-polypeptide complex to the RER membrane via the SR receptor. SRP consists of six polypeptides (SRP9, SRP14, SRP19, SRP54, SRP68 and SRP72) and a single 300 nucleotide 7S RNA molecule. The RNA component catalyses the interaction of SRP with its SR receptor. In higher eukaryotes, the SRP complex consists of the Alu domain and the S domain linked by the SRP RNA. The Alu domain consists of a heterodimer of SRP9 and SRP14 bound to the 5' and 3' terminal sequences of SRP RNA. This domain is necessary for retarding the elongation of the nascent polypeptide chain, which gives SRP time to dock the ribosome-polypeptide complex to the RER membrane.

    This entry represents the 14 kDa SRP14 component. Both SRP9 and SRP14 have the same (beta)-alpha-beta(3)-alpha fold. The heterodimer has pseudo two-fold symmetry and is saddle-like, consisting of a curved six-stranded beta-sheet that has four helices packed on the convex side and an exposed concave surface lined with positively charged residues. The SRP9/SRP14 heterodimer is essential for SRP RNA binding, mediating the pausing of synthesis of ribosome associated nascent polypeptides that have been engaged by the targeting domain of SRP.

    Proteins where this domain is known:
    PY05285   


    PF02301 - HORMA (Pfam link)

    Interpro entry IPR003511 : DNA-binding HORMA (Interpro link)

    Pfam description:
    The HORMA (for Hop1p, Rev7p and MAD2) domain has been suggested to recognise chromatin states that result from DNA adducts, double stranded breaks or non-attachment to the spindle and acts as an adaptor that recruits other proteins. MAD2 is a spindle checkpoint protein which prevents progression of the cell cycle upon detection of a defect in mitotic spindle integrity.

    Interpro description:
    The HORMA (for Hop1p, Rev7p and MAD2) domain has been suggested to recognise chromatin states that result from DNA adducts, double stranded breaks or non-attachment to the spindle and acts as an adaptor that recruits other proteins. Hop1 is a meiosis-specific protein, Rev7 is required for DNA damage induced mutagenesis, and MAD2 is a spindle checkpoint protein which prevents progression of the cell cycle upon detection of a defect in mitotic spindle integrity.

    Proteins where this domain is known:
    PY02325   


    PF02320 - UCR_hinge (Pfam link)

    Interpro entry IPR003422 : Ubiquinol-cytochrome C reductase hinge (Interpro link)

    Pfam description:
    The ubiquinol-cytochrome C reductase complex (cytochrome bc1 complex) is a respiratory multienzyme complex. This Pfam family represents the \'hinge\' protein of the complex which is thought to mediate formation of the cytochrome c1 and cytochrome c complex.

    Interpro description:
    The ubiquinol-cytochrome C reductase complex (cytochrome bc1 complex) is a respiratory multienzyme complex. The bc1 complex contains 11 subunits; 3 respiratory subunits (cytochrome B, cytochrome C1, Rieske protein), 2 core proteins and 6 low molecular weight proteins. This family represents the 'hinge' protein of the complex which is thought to mediate formation of the cytochrome c1 and cytochrome c complex.

    Proteins where this domain is known:
    PY01064   


    PF02330 - MAM33 (Pfam link)

    Interpro entry IPR003428 : Mitochondrial glycoprotein (Interpro link)

    Pfam description:
    This mitochondrial matrix protein family contains members of the MAM33 family which bind to the globular \'heads\' of C1Q. It is thought to be involved in mitochondrial oxidative phosphorylation and in nucleus-mitochondrion interactions.

    Interpro description:
    This mitochondrial matrix protein family contains members of the MAM33 family which bind to the globular 'heads' of C1Q.

    Proteins where this domain is known:
    PY02557   


    PF02338 - OTU (Pfam link)

    Interpro entry IPR003323 : (Interpro link)

    Pfam description:
    This family is comprised of a group of predicted cysteine proteases, homologous to the Ovarian Tumour (OTU) gene in Drosophila. Members include proteins from eukaryotes, viruses and pathogenic bacterium. The conserved cysteine and histidine, and possibly the aspartate, represent the catalytic residues in this putative group of proteases.

    Interpro description:

    This is a group of proteins found primarily in viruses, eukaryotes and in the pathogenic bacterium Chlamydia pneumoniae. In viruses they are annotated as replicase or RNA-dependent RNA polymerase. The eukaryotic sequences are related to the Ovarian Tumour (OTU) gene in Drosophila, cezanne deubiquitinating peptidase and tumor necrosis factor, alpha-induced protein 3 (MEROPS peptidase family C64) and otubain 1 and otubain 2 (MEROPS peptidase family C65).

    None of these proteins has a known biochemical function but low sequence similarity with the polyprotein regions of arteriviruses, and conserved cysteine and histidine, and possibly the aspartate, residues suggests that those not yet recognised as peptidases could possess cysteine protease activity.

    Proteins where this domain is known:
    PY02414    PY05983   


    PF02359 - CDC48_N (Pfam link)

    Interpro entry IPR003338 : ATPase, AAA-type, VAT, N-terminal (Interpro link)

    Pfam description:
    This domain has a double psi-beta barrel fold and includes VCP-like ATPase and N-ethylmaleimide sensitive fusion protein N-terminal domains. Both the VAT and NSF N-terminal functional domains consist of two structural domains of which this is at the N-terminus. The VAT-N domain found in AAA ATPases Pfam:PF00004 is a substrate 185-residue recognition domain.

    Interpro description:

    AAA ATPases (ATPases Associated with diverse cellular Activities) form a large protein family and play a number of roles in the cell including cell-cycle regulation, protein proteolysis and disaggregation, organelle biogenesis and intracellular transport. Some of them function as molecular chaperones, subunits of proteolytic complexes or independent proteases (FtsH, Lon). They also act as DNA helicases and transcription factors..

    AAA ATPases belong to the AAA+ superfamily of ringshaped P-loop NTPases, which act via the energy-dependent unfolding of macromolecules. There are six major clades of AAA domains (proteasome subunits, metalloproteases, domains D1 and D2 of ATPases with two AAA domains, the MSP1/katanin/spastin group and BCS1 and it homologues), as well as a number of deeply branching minor clades.

    They assemble into oligomeric assemblies (often hexamers) that form a ring-shaped structure with a central pore. These proteins produce a molecular motor that couples ATP binding and hydrolysis to changes in conformational states that act upon a target substrate, either translocating or remodelling it.

    They are found in all living organisms and share the common feature of the presence of a highly conserved AAA domain called the AAA module. This domain is responsible for ATP binding and hydrolysis. It contains 200-250 residues, among them there are two classical motifs, Walker A (GX4GKT) and Walker B (HyDE).

    The VAT protein of the archaebacterium Thermoplasma acidophilum, like all other members of the Cdc48/p97 family of AAA ATPases, has two ATPase domains and a 185-residue amino-terminal substrate-recognition domain, VAT-N. VAT shows activity in protein folding and unfolding and thus shares the common function of these ATPases in disassembly and/or degradation of protein complexes.

    VAT-N is composed of two equally sized subdomains. The amino-terminal subdomain VAT-Nn forms a double-psi beta-barrel whose pseudo-twofold symmetry is mirrored by an internal sequence repeat of 42 residues. The carboxy-terminal subdomain VAT-Nc forms a novel six-stranded beta-clam fold. Together, VAT-Nn and VAT-Nc form a kidney-shaped structure, in close agreement with results from electron microscopy. VAT-Nn is related to numerous proteins including prokaryotic transcription factors, metabolic enzymes, the protease cofactors UFD1 and PrlF, and aspartic proteinases.

    Proteins where this domain is known:
    PY03639   

    Proteins where this domain has been detected by our approach:
    PY05787   


    PF02374 - ArsA_ATPase (Pfam link)

    Interpro entry IPR003348 : ATPase, anion-transporting (Interpro link)

    Pfam description:
    This Pfam family represents a conserved domain, which is sometimes repeated, in an anion-transporting ATPase. The ATPase is involved in the removal of arsenate, antimonite, and arsenate from the cell.

    Interpro description:

    This ATPase is involved in the removal of arsenate, antimonite, and arsenate from the cell.

    In Escherichia coli an anion-translocating ATPase has been identified as the product of the arsenical resistance operon of resistance Plasmid R773. This ATP-driven oxyanion pump catalyses extrusion of the oxyanions arsenite, antimonite and arsenate. Maintenance of a low intracellular concentration of oxyanion produces resistance to the toxic agents. The pump is composed of two polypeptides, the products of the arsA and arsB genes. This two-subunit enzyme produces resistance to arsenite and antimonite. A third gene, arsC, expands the substrate specificity to allow for arsenate pumping and resistance.

    The ArsA and ArsB proteins form a membrane-bound pump that functions as an oxyanion-translocating ATPase. The ArsC protein is an arsenate reductase that reduces arsenate to arsenite, which is subsequently pumped out of the cell.

    Proteins where this domain is known:
    PY02198   


    PF02383 - Syja_N (Pfam link)

    Interpro entry IPR002013 : (Interpro link)

    Pfam description:
    This Pfam family represents a protein domain which shows homology to the yeast protein SacI Swiss:P32368. The SacI homology domain is most notably found at the amino terminal of the inositol 5\'-phosphatase synaptojanin.

    Interpro description:
    Synaptic vesicles are recycled with remarkable speed and precision in nerve terminals. A major recycling pathway involves clathrin-mediated endocytosis at endocytic zones located around sites of release. Different 'accessory' proteins linked to this pathway have been shown to alter the shape and composition of lipid membranes, to modify membrane-coat protein interactions, and to influence actin polymerization. These include the GTPase dynamin, the lysophosphatidic acid acyl transferase endophilin, and the phosphoinositide phosphatase synaptojanin.

    The recessive suppressor of secretory defect in yeast Golgi and yeast actin function belongs to this family. This protein may be involved in the coordination of the activities of the secretory pathway and the actin cytoskeleton.

    Human synaptojanin which may be localised on coated endocytic intermediates in nerve terminals also belongs to this family.

    Proteins where this domain is known:
    PY02348    PY03237    PY04281   


    PF02390 - Methyltransf_4 (Pfam link)

    Interpro entry IPR003358 : tRNA (guanine-N-7) methyltransferase (Interpro link)

    Pfam description:
    This is a family of putative methyltransferases. The aligned region contains the GXGXG S-AdoMet binding site suggesting a putative methyltransferase activity.

    Interpro description:

    This entry represents tRNA (guanine-N-7) methyltransferase, which catalyses the formation of N(7)-methylguanine at position 46 (m7G46) in tRNA. Capping of the pre-mRNA 5' end by addition a monomethylated guanosine cap (m(7)G) is an essential and the earliest modification in the biogenesis of mRNA. The reaction is catalysed by three enzymes: triphosphatase, guanylyltransferase, and tRNA (guanine-N-7) methyltransferase.

    Proteins where this domain is known:
    PY05818   


    PF02401 - LYTB (Pfam link)

    Interpro entry IPR003451 : LytB protein (Interpro link)

    Pfam description:
    The mevalonate-independent 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway for isoprenoid biosynthesis is essential in many eubacteria, plants, and the malaria parasite. The LytB gene is involved in the trunk line of the MEP pathway.

    Interpro description:

    Terpenes are among the largest groups of natural products and include compounds such as vitamins, cholesterol and carotenoids. The biosynthesis of all terpenoids begins with one or both of the two C5 precursors of the pathway: isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP). In animals, fungi, and certain bacteria, the synthesis of IPP and DMAPP occurs via the well-known mevalonate pathway, however, a second, nonmevalonate terpenoid pathway has been identified in many eubacteria, algae and the chloroplasts of higher plants.

    LytB(IspH) catalyses the conversion of 1-hydroy-2-methyl-2-(E)-butenyl 4-diphosphate into IPP and DMAPP in this second pathway The enzyme appears to be responsible for a branch-step in the nonmevalonate pathway, in that IPP and DMAPP are produced in parallel from a single precursor although the exact mechanism of this is not currently fully understood. Escherichia coli LytB protein had been found to regulate the activity of RelA (guanosine 3',5'-bispyrophosphate synthetase I), which in turn controls the level of a regulatory metabolite. It is involved in penicillin tolerance and the stringent response.

    Proteins where this domain is known:
    PY01243   


    PF02403 - Seryl_tRNA_N (Pfam link)

    Interpro entry IPR015866 : Seryl-tRNA synthetase, class IIa, N-terminal (Interpro link)

    Pfam description:
    This domain is found associated with the Pfam tRNA synthetase class II domain (Pfam:PF00587) and represents the N-terminal domain of seryl-tRNA synthetase.

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

    This entry represents the N-terminal domain of Seryl-tRNA synthetase, which consists of two helices in a long alpha-hairpin. Seryl-tRNA synthetase exists as monomer and belongs to class IIa.

    Proteins where this domain is known:
    PY03295   


    PF02430 - AMA-1 (Pfam link)

    Interpro entry IPR003298 : Apical membrane antigen 1 (Interpro link)

    Pfam description:
    Apical membrane antigen 1 (AMA-1) is a Plasmodium asexual blood-stage antigen. It has been suggested that positive selection operates on the AMA-1 gene in regions coding for antigenic sites.

    Interpro description:

    A novel antigen of Plasmodium falciparum has been cloned that contains a hydrophobic domain typical of an integral membrane protein. The antigen is designated apical membrane antigen 1 (AMA-1) by virtue of appearing to be located in the apical complex. AMA-1 appears to be transported to the merozoite surface close to the time of schizont rupture.

    The 66kDa merozoite surface antigen (PK66) of Plasmodium knowlesi, a simian malaria, possesses vaccine-related properties believed to originate from a receptor-like role in parasite invasion of erythrocytes. The sequence of PK66 is conserved throughout plasmodium, and shows high similarity to P. falciparum AMA-1. Following schizont rupture, the distribution of PK66 changes in a coordinate manner associated with merozoite invasion. Prior to rupture, the protein is concentrated at the apical end, following which it distributes itself entirely across the surface of the free merozoite. Immunofluorescence studies suggest that, during invasion, PK66 is excluded from the erythrocyte at, and behind, the invasion interface.

    Proteins where this domain is known:
    PY01581   


    PF02441 - Flavoprotein (Pfam link)

    Interpro entry IPR003382 : Flavoprotein (Interpro link)

    Pfam description:
    This family contains diverse flavoprotein enzymes. This family includes epidermin biosynthesis protein, EpiD Swiss:P30197, which has been shown to be a flavoprotein that binds FMN. This enzyme catalyses the removal of two reducing equivalents from the cysteine residue of the C-terminal meso-lanthionine of epidermin to form a --C==C-- double bond. This family also includes the B chain of dipicolinate synthase a small polar molecule that accumulates to high concentrations in bacterial endospores, and is thought to play a role in spore heat resistance, or the maintenance of heat resistance. dipicolinate synthase catalyses the formation of dipicolinic acid from dihydroxydipicolinic acid. This family also includes phenylacrylic acid decarboxylase Swiss:P33751 (EC:4.1.1.-).

    Interpro description:
    This entry contains a diverse range of flavoprotein enzymes, including epidermin biosynthesis protein, EpiD, which has been shown to be a flavoprotein that binds FMN. This enzyme catalyzes the removal of two reducing equivalents from the cysteine residue of the C-terminal meso-lanthionine of epidermin to form a --C==C-- double bond. This family also includes the B chain of dipicolinate synthase a small polar molecule that accumulates to high concentrations in bacterial endospores, and is thought to play a role in spore heat resistance, or the maintenance of heat resistance. Dipicolinate synthase catalyses the formation of dipicolinic acid from dihydroxydipicolinic acid. This family also includes phenylacrylic acid decarboxylase

    Proteins where this domain is known:
    PY00571   


    PF02450 - LACT (Pfam link)

    Interpro entry IPR003386 : Lecithin:cholesterol acyltransferase (Interpro link)

    Pfam description:
    Lecithin:cholesterol acyltransferase (LACT) is involved in extracellular metabolism of plasma lipoproteins, including cholesterol.

    Interpro description:
    Lecithin:cholesterol acyltransferase (LACT) also known as phosphatidylcholine-sterol acyltransferase, is involved in extracellular metabolism of plasma lipoproteins, including cholesterol. It esterifies the free cholesterol transported in plasma lipoproteins, and is activated by apolipoprotein A-I. Defects in LACT cause Norum and Fish eye diseases.

    Proteins where this domain is known:
    PY04170   


    PF02460 - Patched (Pfam link)

    Interpro entry IPR003392 : Patched (Interpro link)

    Pfam description:
    The transmembrane protein Patched Swiss:P18502 is a receptor for the morphogene Sonic Hedgehog. This protein associates with the smoothened protein to transduce hedgehog signals.

    Interpro description:
    The transmembrane protein, patched, is a receptor for the morphogene Sonic Hedgehog. In Drosophila melanogaster, this protein associates with the smoothened protein to transduce hedgehog signals, leading to the activation of wingless, decapentaplegic and patched itself. It participates in cell interactions that establish pattern within the segment and imaginal disks during development. The mouse homolog may play a role in epidermal development. The human Niemann-Pick C1 protein, defects in which cause Niemann-Pick type II disease, is also a member of this family. This protein is involved in the intracellular trafficking of cholesterol, and may play a role in vesicular trafficking in glia, a process that may be crucial for maintaining the structural functional integrity of nerve terminals.

    Proteins where this domain is known:
    PY03953   


    PF02463 - SMC_N (Pfam link)

    Interpro entry IPR003395 : RecF/RecN/SMC protein, N-terminal (Interpro link)

    Pfam description:
    This domain is found at the N terminus of SMC proteins. The SMC (structural maintenance of chromosomes) superfamily proteins have ATP-binding domains at the N- and C-termini, and two extended coiled-coil domains separated by a hinge in the middle. The eukaryotic SMC proteins form two kind of heterodimers: the SMC1/SMC3 and the SMC2/SMC4 types. These heterodimers constitute an essential part of higher order complexes, which are involved in chromatin and DNA dynamics. This family also includes the RecF and RecN proteins that are involved in DNA metabolism and recombination.

    Interpro description:

    The SMC (structural maintenance of chromosomes) family of proteins exists in virtually all organisms including both bacteria and archaea. The SMC proteins are essential for successful chromosome transmission during replication and segregation of the genome in all organisms and form three types of heterodimer (SMC1ÂSMC3, SMC2ÂSMC4, SMC5ÂSMC6), which are core components of large multiprotein complexes. The best known complexes are cohesin, which is responsible for sister-chromatid cohesion, and condensin, which is required for full chromosome condensation in mitosis.

    SMCs are generally present as single proteins in bacteria, and as at least six distinct proteins in eukaryotes. The proteins range in size from approximately 110 to 170 kDa, and share a five-domain structure, with globular N- and C-terminal domains separated by a long (circa 100 nm or 900 residues) coiled coil segment in the centre of which is a globular ''hinge'' domain, characterised by a set of four highly conserved glycine residues that are typical of flexible regions in a protein. The amino-terminal domain contains a 'Walker A' nucleotide-binding domain (GxxGxGKS/T), which by mutational studies has been shown to be essential in several proteins. The carboxy-terminal domain contains a sequence (the DA-box) that resembles a 'Walker B' motif (XXXXD, where X is any hydrophobic residue), and a LSGG motif with homology to the signature sequence of the ATP-binding cassette (ABC) family of ATPases.

    All SMC proteins appear to form dimers, either forming homodimers with themselves, as in the case of prokaryotic SMC proteins, or heterodimers between different but related SMC proteins. The dimers are arranged in an antiparallel alignment. This orientation brings the N- and C-terminal globular domains (from either different or identical protamers) together, which unites an ATP binding site (Walker A motif) within the N-terminal domain with a Walker B motif (DA box) within the C-terminal domain, to form a potentially functional ATPase. Protein interaction and microscopy data suggest that SMC dimers form a ring-like structure which might embrace DNA molecules. Non-SMC subunits associate with the SMC amino- and carboxy-terminal domains. The sequence homology within the carboxy-terminal domain is relatively high within the SMC1-SMC4 group, whereas SMC5 and SMC6 show some divergence in both of these sequences.

    SMCs share not only sequence similarity but also structural similarity with ABC proteins. SMC proteins function together with other proteins in a range of chromosomal transactions, including chromosome condensation, sister-chromatid cohesion, recombination, DNA repair and epigenetic silencing of gene expression.

    This domain is found at the N terminus of SMC proteins.

    Proteins where this domain is known:
    PY00547    PY00549    PY00626    PY00653    PY01780    PY03258    PY06472   


    PF02466 - Tim17 (Pfam link)

    Interpro entry IPR003397 : Mitochondrial import inner membrane translocase, subunit Tim17/22 (Interpro link)

    Pfam description:
    The pre-protein translocase of the mitochondrial outer membrane (Tom) allows the import of pre-proteins from the cytoplasm. Tom forms a complex with a number of proteins, including Tim17. Tim17 and Tim23 are thought to form the translocation channel of the inner membrane. This family includes Tim17, Tim22 and Tim23.

    Interpro description:

    The membrane-embedded multi-protein complexes of mitochondria mediate the transport of nuclear-encoded proteins across and into the outer or inner mitochondrial membranes. The TOM (translocase of the outer mitochondrial membrane) complex consists of cytosol-exposed receptors and a pore-forming core, and mediates the transport of proteins from the cytosol across and into the outer mitochondrial membrane. A novel protein complex in the outer membrane of mitochondria, called the SAM complex (sorting and assembly machinery), is involved in the biogenesis of beta-barrel proteins of the outer membrane. Two translocases of the inner mitochondrial membrane (TIM complexes) mediate protein transport at the inner membrane.

    The TIM23 complex (a presequence translocase) mediates the transport of presequence-containing proteins across and into the inner membrane. TIM17 forms a part of this complex, although its role is not yet fully understood. The TIM22 complex (a twin-pore carrier translocase) catalyses the insertion of multi-spanning proteins that have internal targeting signals into the inner membrane. The TIM22 complex mediates the membrane insertion of multi-spanning inner-membrane proteins that have internal targeting signals, and it uses a as an external driving force. The Tim22 subunit of the mitochondrial import inner membrane translocase is included in this family.

    Proteins where this domain is known:
    PY02558    PY04639   


    PF02475 - Met_10 (Pfam link)

    Interpro entry IPR003402 : (Interpro link)

    Pfam description:
    The methionine-10 mutant allele of N. crassa codes for a protein of unknown function, Swiss:O27901. However, homologous proteins have been found in yeast (Swiss:P38793) suggesting this protein may be involved in methionine biosynthesis, transport and/or utilisation.

    Interpro description:
    The methionine-10 mutant allele of Neurospora crassa codes for a protein of unknown function. However, homologous proteins have been found in yeast, suggesting this protein may be involved in methionine biosynthesis, transport and/or utilization.

    Proteins where this domain is known:
    PY04160    PY05008   


    PF02492 - cobW (Pfam link)

    Interpro entry IPR003495 : (Interpro link)

    Pfam description:
    This domain is found in HypB, a hydrogenase expression / formation protein, and UreG a urease accessory protein. Both these proteins contain a P-loop nucleotide binding motif. HypB has GTPase activity and is a guanine nucleotide binding protein. It is not known whether UreG binds GTP or some other nucleotide. Both enzymes are involved in nickel binding. HypB can store nickel and is required for nickel dependent hydrogenase expression. UreG is required for functional incorporation of the urease nickel metallocenter GTP hydrolysis may required by these proteins for nickel incorporation into other nickel proteins. This family of domains also contains P47K (Swiss:P31521), a Pseudomonas chlororaphis protein needed for nitrile hydratase expression, and the cobW gene product (Swiss:P29937), which may be involved in cobalamin biosynthesis in Pseudomonas denitrificans.

    Interpro description:

    Cobalamin (vitamin B12) is a structurally complex cofactor, consisting of a modified tetrapyrrole with a centrally chelated cobalt. Cobalamin is usually found in one of two biologically active forms: methylcobalamin and adocobalamin. Most prokaryotes, as well as animals, have cobalamin-dependent enzymes, whereas plants and fungi do not appear to use it. In bacteria and archaea, these include methionine synthase, ribonucleotide reductase, glutamate and methylmalonyl-CoA mutases, ethanolamine ammonia lyase, and diol dehydratase. In mammals, cobalamin is obtained through the diet, and is required for methionine synthase and methylmalonyl-CoA mutase.

    There are at least two distinct cobalamin biosynthetic pathways in bacteria:

    Either pathway can be divided into two parts: (1) corrin ring synthesis (differs in aerobic and anaerobic pathways) and (2) adenosylation of corrin ring, attachment of aminopropanol arm, and assembly of the nucleotide loop (common to both pathways). There are about 30 enzymes involved in either pathway, where those involved in the aerobic pathway are prefixed Cob and those of the anaerobic pathway Cbi. Several of these enzymes are pathway-specific: CbiD, CbiG, and CbiK are specific to the anaerobic route of S. typhimurium, whereas CobE, CobF, CobG, CobN, CobS, CobT, and CobW are unique to the aerobic pathway of P. denitrificans.

    CobW proteins are generally found proximal to the trimeric cobaltochelatase subunit CobN, which is essential for vitamin B12 (cobalamin) biosynthesis. They contain a P-loop nucleotide-binding loop in the N-terminal domain and a histidine-rich region in the C-terminal portion suggesting a role in metal binding, possibly as an intermediary between the cobalt transport and chelation systems. CobW might be involved in cobalt reduction leading to cobalt(I) corrinoids.

    This entry represents CobW-like proteins, including P47K, a Pseudomonas chlororaphis protein needed for nitrile hydratase expression, and urease accessory protein UreG, which acts as a chaperone in the activation of urease upon insertion of nickel into the active site.

    Proteins where this domain is known:
    PY03672   


    PF02493 - MORN (Pfam link)

    Interpro entry IPR003409 : (Interpro link)

    Pfam description:
    The MORN (Membrane Occupation and Recognition Nexus) repeat is found in multiple copies in several proteins including junctophilins (See Takeshima et al. Mol. Cell 2000;6:11-22). A MORN-repeat protein has been identified in the parasite Toxoplasma gondiis a dynamic component of cell division apparatus in Toxoplasma gondii. It has been hypothesised to functions as a linker protein between certain membrane regions and the parasite\'s cytoskeleton.

    Interpro description:
    The MORN (Membrane Occupation and Recognition Nexus) motif is found in multiple copies in several proteins including junctophilins (). The function of this motif is unknown.

    Proteins where this domain is known:
    PY00208    PY00425    PY02503    PY02591    PY02790    PY04141    PY04894    PY05981    PY06184   

    Proteins where this domain has been detected by our approach:
    PY00472    PY00762    PY04013   


    PF02516 - STT3 (Pfam link)

    Interpro entry IPR003674 : Oligosaccharyl transferase, STT3 subunit (Interpro link)

    Pfam description:
    This family consists of the oligosaccharyl transferase STT3 subunit and related proteins. The STT3 subunit is part of the oligosaccharyl transferase (OTase) complex of proteins and is required for its activity. OTase transfers a lipid-linked core-oligosaccharide to selected asparagine residues in the ER.

    Interpro description:

    N-linked glycosylation is a ubiquitous protein modification, and is essential for viability in eukaryotic cells. A lipid-linked core-oligosaccharide is assembled at the membrane of the endoplasmic reticulum and transferred to selected asparagine residues of nascent polypeptide chains by the oligosaccharyl transferase (OTase) complex.

    This family consists of the oligsacharyl transferase STT3 subunit and related proteins. The STT3 subunit is part of the oligosccharyl transferase (OTase) complex of proteins and is required for its activity.

    Proteins where this domain is known:
    PY02151   


    PF02518 - HATPase_c (Pfam link)

    Interpro entry IPR003594 : ATP-binding region, ATPase-like (Interpro link)

    Pfam description:
    This family represents the structurally related ATPase domains of histidine kinase, DNA gyrase B and HSP90.

    Interpro description:

    This domain is found in several ATP-binding proteins for example: histidine kinase, DNA gyrase B, topoisomerases, heat shock protein HSP90, phytochrome-like ATPases and DNA mismatch repair proteins.

    More information about this protein can be found at Protein of the Month: DNA Topoisomerase.

    Proteins where this domain is known:
    PY00131    PY00582    PY01412    PY01438    PY01906    PY02844    PY03394    PY04024    PY05214    PY05217   


    PF02524 - KID (Pfam link)

    Interpro entry IPR003900 : (Interpro link)

    Pfam description:
    This is family contains the KID repeat as found in Borrelia spirochete RepA / Rep+ proteins. The function of these proteins is unknown. RepA and related Borrelia proteins have been suggested to play an important genus-wide role in the biology of the Borrelia.

    Interpro description:
    This group of proteins contains the KID repeat as found in Borrelia and spirochete RepA / Rep+ proteins. The function of these proteins is unknown. RepA and related Borrelia proteins have been suggested to play an important genus-wide role in the biology of the Borrelia.

    Proteins where this domain is known:
    PY04214    PY04930   


    PF02535 - Zip (Pfam link)

    Interpro entry IPR003689 : Zinc/iron permease (Interpro link)

    Pfam description:
    The ZIP family consists of zinc transport proteins and many putative metal transporters. The main contribution to this family is from the Arabidopsis thaliana ZIP protein family these proteins are responsible for zinc uptake in the plant. Also found within this family are C. elegans proteins of unknown function which are annotated as being similar to human growth arrest inducible gene product, although this protein in not found within this family.

    Interpro description:
    These ZIP zinc transporter proteins define a family of metal ion transporters that are found in plants, protozoa, fungi, invertebrates, and vertebrates, making it now possible to address questions of metal ion accumulation and homeostasis in diverse organisms.

    Proteins where this domain is known:
    PY01325   


    PF02536 - mTERF (Pfam link)

    Interpro entry IPR003690 : (Interpro link)

    Pfam description:
    This family contains one sequence of known function Human mitochondrial transcription termination factor (mTERF) the rest of the family consists of hypothetical proteins none of which have any functional information. mTERF is a multizipper protein possessing three putative leucine zippers one of which is bipartite. The protein binds DNA as a monomer. The leucine zippers are not implicated in a dimerisation role as in other leucine zippers.

    Interpro description:

    This family currently contains one sequence of known function human mitochondrial transcription termination factor (mTERF), a multizipper protein but binds to DNA as a monomer, with evidence pointing to intramolecular leucine zipper interactions. The precursors contain a mitochondrial targeting sequence, and the mature mTERF exhibits three leucine zippers, of which one is bipartite, and two widely spaced basic domains. Both basic domains and the three leucine zipper motifs are necessary for DNA binding. The leucine zippers are not implicated in a dimerisation role as in other leucine zippers.

    The rest of the family consists of hypothetical proteins none of which have any functional information.

    Proteins where this domain is known:
    PY02093   


    PF02540 - NAD_synthase (Pfam link)

    Interpro entry IPR003694 : NAD+ synthase (Interpro link)

    Pfam description:
    NAD synthase (EC:6.3.5.1) is involved in the de novo synthesis of NAD and is induced by stress factors such as heat shock and glucose limitation.

    Interpro description:
    NAD+ synthase catalyzes the last step in the biosynthesis of nicotinamide adenine dinucleotide and is induced by stress factors such as heat shock and glucose limitation. The three-dimensional structure of NH3-dependent NAD+ synthetase from Bacillus subtilis, in its free form and in complex with ATP shows that the enzyme consists of a tight homodimer with alpha/beta subunit topology.

    Proteins where this domain is known:
    PY00252   


    PF02542 - YgbB (Pfam link)

    Interpro entry IPR003526 : 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase, core (Interpro link)

    Pfam description:
    The ygbB protein is a putative enzyme of deoxy-xylulose pathway (terpenoid biosynthesis).

    Interpro description:

    This entry represents MECDP (2-C-methyl-D-erythritol 2,4-cyclodiphosphate) synthetase, an enzyme in the non-mevalonate pathway of isoprenoid synthesis, isoprenoids being essential in all organisms. Isoprenoids can also be synthesized through the mevalonate pathway. The non-mevolante route is used by many bacteria and human pathogens, including Mycobacterium tuberculosis and Plasmodium falciparum. This route appears to involve seven enzymes. MECDP synthetase catalyses the intramolecular attack by a phosphate group on a diphosphate, with cytidine monophosphate (CMP) acting as the leaving group to give the cyclic diphosphate product MEDCP. The enzyme is a trimer with three active sites shared between adjacent copies of the protein. The enzyme also has two metal binding sites, the metals playing key roles in catalysi.

    A number of proteins from eukaryotes and prokaryotes share this common N-terminal signature and appear to be involved in terpenoid biosynthesis. The ygbB protein is a putative enzyme of this type.

    Proteins where this domain is known:
    PY00321   


    PF02544 - Steroid_dh (Pfam link)

    Interpro entry IPR001104 : 3-oxo-5-alpha-steroid 4-dehydrogenase, C-terminal (Interpro link)

    Pfam description:
    This family consists of 3-oxo-5-alpha-steroid 4-dehydrogenases, EC:1.3.99.5 Also known as Steroid 5-alpha-reductase, the reaction catalysed by this enzyme is: 3-oxo-5-alpha-steroid + acceptor <=> 3-oxo-delta(4)-steroid + reduced acceptor. The Steroid 5-alpha-reductase enzyme is responsible for the formation of dihydrotestosterone, this hormone promotes the differentiation of male external genitalia and the prostate during fetal development. In humans mutations in this enzyme can cause a form of male pseudohermaphorditism in which the external genitalia and prostate fail to develop normally. A related enzyme is also found in plants is Swiss:Q38944 (DET2) a steroid reductase from Arabidopsis. Mutations in this enzyme cause defects in light-regulated development.

    Interpro description:

    Synonym(s): Steroid 5-alpha-reductase

    3-oxo-5-alpha-steroid 4-dehydrogenases,catalyse the conversion of 3-oxo-5-alpha-steroid + acceptor to 3-oxo-delta(4)-steroid + reduced acceptor. The steroid 5-alpha-reductase enzyme is responsible for the formation of dihydrotestosterone, this hormone promotes the differentiation of male external genitalia and the prostate during foetal development. In humans mutations in this enzyme can cause a form of male pseudohermaphorditism in which the external genitalia and prostate fail to develop normally. A related steroid reductase enzyme, DET2, is found in plants such as Arabidopsis. Mutations in this enzyme cause defects in light-regulated development. This domain is present in both type 1 and type 2 forms.

    Proteins where this domain is known:
    PY01579   


    PF02545 - Maf (Pfam link)

    Interpro entry IPR003697 : Maf-like protein (Interpro link)

    Pfam description:
    Maf is a putative inhibitor of septum formation in eukaryotes, bacteria, and archaea.

    Interpro description:

    Maf is a putative inhibitor of septum formation in eukaryotes, bacteria, and archaea. The Maf protein shares substantial amino acid sequence identity with the Escherichia coli OrfE protein.

    Proteins where this domain is known:
    PY02834    PY05017    PY07153   


    PF02580 - Tyr_Deacylase (Pfam link)

    Interpro entry IPR003732 : D-tyrosyl-tRNA(Tyr) deacylase (Interpro link)

    Pfam description:
    This family comprises of several D-Tyr-tRNA(Tyr) deacylase proteins. Cell growth inhibition by several d-amino acids can be explained by an in vivo production of d-aminoacyl-tRNA molecules. Escherichia coli and yeast cells express an enzyme, d-Tyr-tRNA(Tyr) deacylase, capable of recycling such d-aminoacyl-tRNA molecules into free tRNA and d-amino acid. Accordingly, upon inactivation of the genes of the above deacylases, the toxicity of d-amino acids increases. Orthologues of the deacylase are found in many cells.

    Interpro description:

    This homodimeric enzyme appears able to cleave any D-amino acid (and glycine, which does not have distinct D/L forms) from charged tRNA. The name reflects characterization with respect to D-Tyr on tRNA(Tyr) as established in the literature, but substrate specificity seems much broader.

    Proteins where this domain is known:
    PY00917   


    PF02582 - DUF155 (Pfam link)

    Interpro entry IPR003734 : (Interpro link)

    Interpro description:

    This entry describes proteins of unknown function.

    Proteins where this domain is known:
    PY02494    PY02602    PY03075   


    PF02585 - PIG-L (Pfam link)

    Interpro entry IPR003737 : (Interpro link)

    Pfam description:
    Members of this family are related to PIG-L an N-acetylglucosaminylphosphatidylinositol de-N-acetylase (EC:3.5.1.89) that catalyses the second step in GPI biosynthesis.

    Interpro description:

    A number of the members of this family have been characterised as a probable N-acetylglucosaminyl-phosphatidylinositol de-N-acetylase, that catalyses the second step in glycosylphosphatidylinositol (GPI) biosynthesis.

    Proteins where this domain is known:
    PY06379   


    PF02598 - DUF171 (Pfam link)

    Interpro entry IPR003750 : (Interpro link)

    Interpro description:

    This entry describes proteins of unknown function.

    Proteins where this domain is known:
    PY04902   


    PF02617 - ClpS (Pfam link)

    Interpro entry IPR003769 : Adaptor protein ClpS, core (Interpro link)

    Pfam description:
    In the bacterial cytosol, ATP-dependent protein degradation is performed by several different chaperone-protease pairs, including ClpAP. ClpS directly influences the ClpAP machine by binding to the N-terminal domain of the chaperone ClpA. The degradation of ClpAP substrates, both SsrA-tagged proteins and ClpA itself, is specifically inhibited by ClpS. ClpS modifies ClpA substrate specificity, potentially redirecting degradation by ClpAP toward aggregated proteins.

    Interpro description:

    In the bacterial cytosol, ATP-dependent protein degradation is performed by several different chaperone-protease pairs, including ClpAP. ClpS directly influences the ClpAP machine by binding to the N-terminal domain of the chaperone ClpA. The degradation of ClpAP substrates, both SsrA-tagged proteins and ClpA itself, is specifically inhibited by ClpS. ClpS modifies ClpA substrate specificity, potentially redirecting degradation by ClpAP toward aggregated proteins.

    ClpS is a small alpha/beta protein that consists of three alpha-helices connected to three antiparallel beta-strands. The protein has a globular shape, with a curved layer of three antiparallel alpha-helices over a twisted antiparallel beta-sheet. Dimerization of ClpS may occur through its N-terminal domain. This short extended N-terminal region in ClpS is followed by the central seven-residue beta-strand, which is flanked by two other beta-strands in a small beta-sheet.

    Proteins where this domain is known:
    PY03872   


    PF02628 - COX15-CtaA (Pfam link)

    Interpro entry IPR003780 : Cytochrome oxidase assembly (Interpro link)

    Pfam description:
    This is a family of integral membrane proteins. CtaA is required for cytochrome aa3 oxidase assembly in Bacillus subtilis. COX15 is required for cytochrome c oxidase assembly in yeast (Swiss:P40086).

    Interpro description:
    Cytochrome aa3 is one of two terminal oxidase complexes in the Bacillus subtilis electron transport chain. CtaA is required for cytochrome aa3 biosynthesis and sporulation in B. subtilis. In yeast the COX15 protein is required for cytochrome c oxidase assembly.

    Proteins where this domain is known:
    PY02555   


    PF02629 - CoA_binding (Pfam link)

    Interpro entry IPR003781 : (Interpro link)

    Pfam description:
    This domain has a Rossmann fold and is found in a number of proteins including succinyl CoA synthetases, malate and ATP-citrate ligases.

    Interpro description:
    This domain has a Rossmann fold and is found in a number of proteins including succinyl CoA synthetases, malate and ATP-citrate ligases.

    Proteins where this domain is known:
    PY05049    PY05175   


    PF02630 - SCO1-SenC (Pfam link)

    Interpro entry IPR003782 : (Interpro link)

    Pfam description:
    This family is involved in biogenesis of respiratory and photosynthetic systems. SCO1 (Swiss:P23833) is required for a post-translational step in the accumulation of subunits COXI and COXII of cytochrome c oxidase. SenC (Swiss:Q52720) is required for optimal cytochrome c oxidase activity and maximal induction of genes encoding the light-harvesting and reaction centre complexes of R. capsulatus.

    Interpro description:

    This family is involved in biogenesis of respiratory and photosynthetic systems. In yeast the SCO1 protein is specifically required for a post-translational step in the accumulation of subunits 1 and 2 of cytochrome c oxidase (COXI and COX-II). It is a mitochondrion-associated cytochrome c oxidase assembly factor.

    The purple nonsulphur photosynthetic eubacterium Rhodobacter capsulatus is a versatile organism that can obtain cellular energy by several means, including the capture of light energy for photosynthesis as well as the use of light-independent respiration, in which molecular oxygen serves as a terminal electron acceptor. The SenC protein is required for optimal cytochrome c oxidase activity in aerobically grown R. capsulatus cells and is involved in the induction of structural polypeptides of the light-harvesting and reaction centre complexes.

    Proteins where this domain is known:
    PY05062   


    PF02637 - GatB_Yqey (Pfam link)

    Interpro entry IPR018027 : (Interpro link)

    Pfam description:
    This domain is found in GatB. It is about 140 amino acid residues long. This domain is found at the C terminus of GatB Swiss:O30509 which transamidates Glu-tRNA to Gln-tRNA.

    Interpro description:

    The GatB domain, the function of which is uncertain, is associated with aspartyl/glutamyl amidotransferase subunit B and glutamyl amidotransferase subunit E. These are involved in the formation of correctly charged Asn-tRNA(Asn) or Gln-tRNA(Gln) through the transamidation of misacylated Asp-tRNA(Asn) or Glu-tRNA(Gln) in organisms which lack either or both of asparaginyl-tRNA or glutaminyl-tRNA synthetases. The reaction takes place in the presence of glutamine and ATP through an activated phospho-Asp-tRNA(Asn) or phospho-Glu-tRNA(Gln).

    Proteins where this domain has been detected by our approach:
    PY03547   


    PF02656 - DUF202 (Pfam link)

    Interpro entry IPR003807 : (Interpro link)

    Pfam description:
    This family consists of hypothetical proteins some of which are putative membrane proteins. No functional information or experimental verification of function is known. This domain is around 100 amino acids long.

    Interpro description:

    This entry describes proteins of unknown function.

    Proteins where this domain is known:
    PY03770   


    PF02657 - SufE (Pfam link)

    Interpro entry IPR003808 : (Interpro link)

    Pfam description:
    This family consists of the SufE-related proteins. These have been implicated in Fe-S metabolism and export).

    Interpro description:

    This family consists of the SufE-related proteins. These have been implicated in Fe-S metabolism and export.

    Proteins where this domain is known:
    PY05037   


    PF02666 - PS_Dcarbxylase (Pfam link)

    Interpro entry IPR003817 : Phosphatidylserine decarboxylase-related (Interpro link)

    Pfam description:
    This is a family of phosphatidylserine decarboxylases, EC:4.1.1.65. These enzymes catalyse the reaction: Phosphatidyl-L-serine <=> phosphatidylethanolamine + CO2. Phosphatidylserine decarboxylase plays a central role in the biosynthesis of aminophospholipids by converting phosphatidylserine to phosphatidylethanolamine.

    Interpro description:
    Phosphatidylserine decarboxylase plays a pivotal role in the synthesis of phospholipid by the mitochondria. The substrate phosphatidylserine is synthesized extramitochondrially and must be translocated to the mitochondria prior to decarboxylation. Phosphatidylserine decarboxylasesis responsible for conversion of phosphatidylserine to phosphatidylethanolamine and plays a central role in the biosynthesis of aminophospholipids.

    Proteins where this domain is known:
    PY05588   


    PF02670 - DXP_reductoisom (Pfam link)

    Interpro entry IPR013512 : 1-deoxy-D-xylulose 5-phosphate reductoisomerase, N-terminal (Interpro link)

    Pfam description:
    \N 'hmmbuild -F HMM_ls SEED' 'hmmcalibrate --cpu 1 --seed 0 HMM_ls' 'hmmbuild -f -F HMM_fs SEED' 'hmmcalibrate --cpu 1 --seed 0 HMM_fs' '87' '682' '2008-07-18 15:09:16' '2003-04-07 12:59:11' '8' '3' '661' '27' '2284' '1740' '125.8' '42' '32.09' 'CHANGED' 'lslLGSTGSIGspTL-V..lp...pp....P-.pFplsuLsA.spNlchlhpQs..pp..F..pPph..l.sls-ppt....hppLpptl........t........tsplhhGpcult...clAsh....scsDhVlsAIVGsAGLhPTlAAlcuG.KpluLANKEoLVsuG' 'lslLGSTGSIGss............TL-V..l.p.....pp........P-.cFcVhALsA.spNlshLhcQs..pc..F..pPch..s..sls-pss.....tppLcptL.........s...........tsclhsGppult...clA.sh..........s....csDhVlsAIVGuAGLhPTlAAlcAG.KcluLANKEoLVsuG'

    Interpro description:
    1-deoxy-D-xylulose 5-phosphate reductoisomerase synthesises 2-C-methyl-D-erythritol 4-phosphate from 1-deoxy-D-xylulose 5-phosphate in a single step by intramolecular rearrangement and reduction and is responsible for terpenoid biosynthesis in some organisms. In Arabidopsis thaliana 1-deoxy-D-xylulose 5-phosphate reductoisomerase is the first committed enzyme of the non-mevalonate pathway for isoprenoid biosynthesis. The enzyme requires Mn2+, Co2+ or Mg2+ for activity, with the first being most effective. This domain is found at the N terminus of bacterial and plant 1-deoxy-D-xylulose 5-phosphate reductoisomerases.

    Proteins where this domain is known:
    PY05578   


    PF02714 - DUF221 (Pfam link)

    Interpro entry IPR003864 : Protein of unknown function DUF221 (Interpro link)

    Pfam description:
    This family consists of hypothetical transmembrane proteins none of which have any function, the aligned region is at 538 residues at maximum length.

    Interpro description:
    This domain is found in a family of hypothetical transmembrane proteins none of which have any known function, the aligned region is at 538 residues at maximum length.

    Proteins where this domain is known:
    PY06730   


    PF02729 - OTCace_N (Pfam link)

    Interpro entry IPR006132 : Aspartate/ornithine carbamoyltransferase, carbamoyl-P binding (Interpro link)

    Interpro description:

    This entry contains two related enzymes:

    1. Aspartate carbamoyltransferase (ATCase) catalyzes the conversion of aspartate and carbamoyl phosphate to carbamoylaspartate, the second step in the de novo biosynthesis of pyrimidine nucleotides. In prokaryotes ATCase consists of two subunits: a catalytic chain (gene pyrB) and a regulatory chain (gene pyrI), while in eukaryotes it is a domain in a multi- functional enzyme (called URA2 in yeast, rudimentary in Drosophila, and CAD in mammals) that also catalyzes other steps of the biosynthesis of pyrimidines.
    2. Ornithine carbamoyltransferase (OTCase) catalyzes the conversion of ornithine and carbamoyl phosphate to citrulline. In mammals this enzyme participates in the urea cycle and is located in the mitochondrial matrix. In prokaryotes and eukaryotic microorganisms it is involved in the biosynthesis of arginine. In some bacterial species it is also involved in the degradation of arginine (the arginine deaminase pathway).
    It has been shown that these two enzymes are evolutionary related. The predicted secondary structure of both enzymes are similar and there are some regions of sequence similarities. One of these regions includes three residues which have been shown, by crystallographic studies , to be implicated in binding the phosphoryl group of carbamoyl phosphate and may also play a role in trimerization of the molecules. The carboxyl-terminal, aspartate/ornithine-binding domain is is described by

    Proteins where this domain is known:
    PY06210   


    PF02731 - SKIP_SNW (Pfam link)

    Interpro entry IPR004015 : SKI-interacting protein SKIP, SNW domain (Interpro link)

    Pfam description:
    This domain is found in chromatin proteins.

    Interpro description:

    SKIP (SKI-interacting protein) is an essential spliceosomal component and transcriptional coregulator, which may provide regulatory coupling of transcription initiation and splicing. SKIP was identified in a yeast 2-hybrid screen, where it was shown to interact with both the cellular and viral forms of SKI through the highly conserved region on SKIP known as the SNW domain. SKIP is now known to interact with a number of other proteins as well. SKIP potentiates the activity of important transcription factors, such as vitamin D receptor, CBF1 (RBP-Jkappa), Smad2/3, and MyoD. It works with Ski in overcoming pRb-mediated cell cycle arrest, and it is targeted by the viral transactivators EBNA2 and E7.

    This entry represents the SNW domain.

    Proteins where this domain is known:
    PY04746   


    PF02732 - ERCC4 (Pfam link)

    Interpro entry IPR006166 : DNA repair nuclease, XPF-type/Helicase (Interpro link)

    Pfam description:
    This domain is a family of nucleases. The family includes EME1 which is an essential component of a Holliday junction resolvase. EME1 interacts with MUS81 to form a DNA structure-specific endonuclease.

    Interpro description:

    This entry represents a structural motif found in several DNA repair nucleases, such as Rad1/Mus81/XPF endonucleases, and in ATP-dependent helicases. The XPF/Rad1/Mus81-dependent nuclease family specifically cleaves branched structures generated during DNA repair, replication, and recombination, and is essential for maintaining genome stability. The nuclease domain architecture exhibits remarkable similarity to those of restriction endonucleases.

    Proteins where this domain is known:
    PY03254    PY03745    PY04288   


    PF02739 - 5_3_exonuc_N (Pfam link)

    Interpro entry IPR002421 : 5'-3' exonuclease (Interpro link)

    Interpro description:

    The N-terminal and internal 5'3'-exonuclease domains are commonly found together, and are most often associated with 5' to 3' nuclease activities. The XPG protein signatures are never found outside the '53EXO' domains. The latter are found in more diverse proteins. The number of amino acids that separate the two 53EXO domains, and the presence of accompanying motifs allow the diagnosis of several protein families.

    In the eubacterial type A DNA-polymerases, the N-terminal and internal domains are separated by a few amino acids, usually four. The pattern DNA_POLYMERASE_A is always present towards the C-terminus. Several eukaryotic structure-dependent endonucleases and exonucleases have the 53EXO domains separated by 24 to 27 amino acids, and the XPG protein signatures are always present. In several proteins from herpesviridae, the two 53EXO domains are separated by 50 to 120 amino acids. These proteins are implicated in the inhibition of the expression of the host genes. Eukaryotic DNA repair proteins with 600 to 700 amino acids between the 53_EXO domains all carry the XPG protein signatures.

    Proteins where this domain is known:
    PY01683   


    PF02747 - PCNA_C (Pfam link)

    Interpro entry IPR000730 : Proliferating cell nuclear antigen, PCNA (Interpro link)

    Pfam description:
    N-terminal and C-terminal domains of PCNA are topologically identical. Three PCNA molecules are tightly associated to form a closed ring encircling duplex DNA.

    Interpro description:

    Proliferating cell nuclear antigen (PCNA), or cyclin, is a non-histone acidic nuclear protein that plays a key role in the control of eukaryotic DNA replication. It acts as a co-factor for DNA polymerase delta, which is responsible for leading strand DNA replication. The sequence of PCNA is well conserved between plants and animals, indicating a strong selective pressure for structure conservation, and suggesting that this type of DNA replication mechanism is conserved throughout eukaryotes. In Saccharomyces cerevisiae (Baker's yeast), POL30, is associated with polymerase III, the yeast analog of polymerase delta.

    Homologues of PCNA have also been identified in the archaea (Euryarchaeota and Crenarchaeota) and in Paramecium bursaria Chlorella virus 1 (PBCV-1) and in nuclear polyhedrosis viruses.

    Proteins where this domain is known:
    PY01758    PY06718   


    PF02769 - AIRS_C (Pfam link)

    Interpro entry IPR010918 : (Interpro link)

    Pfam description:
    This family includes Hydrogen expression/formation protein HypE Swiss:P24193, AIR synthases Swiss:P08178 EC:6.3.3.1, FGAM synthase Swiss:P35852 EC:6.3.5.3 and selenide, water dikinase Swiss:P16456 EC:2.7.9.3. The function of the C-terminal domain of AIR synthase is unclear, but the cleft formed between N and C domains is postulated as a sulphate binding site.

    Interpro description:

    This entry includes Hydrogen expression/formation protein, HypE, which may be involved in the maturation of NifE hydrogenase; AIR synthase and FGAM synthase, which are involved in de novo purine biosynthesis; and selenide, water dikinase, an enzyme which synthesizes selenophosphate from selenide and ATP.

    Proteins where this domain is known:
    PY05530   


    PF02772 - S-AdoMet_synt_M (Pfam link)

    Interpro entry IPR002133 : S-adenosylmethionine synthetase (Interpro link)

    Pfam description:
    The three domains of S-adenosylmethionine synthetase have the same alpha+beta fold.

    Interpro description:

    S-adenosylmethionine synthetase (MAT) is the enzyme that catalyzes the formation of S-adenosylmethionine (AdoMet) from methionine and ATP. AdoMet is an important methyl donor for transmethylation and is also the propylamino donor in polyamine biosynthesis.

    In bacteria there is a single isoform of AdoMet synthetase (gene metK), there are two in budding yeast (genes SAM1 and SAM2) and in mammals while in plants there is generally a multigene family.

    The sequence of AdoMet synthetase is highly conserved throughout isozymes and species. The active sites of both the Escherichia coli and rat liver MAT reside between two subunits, with contributions from side chains of residues from both subunits, resulting in a dimer as the minimal catalytic entity. The side chains that contribute to the ligand binding sites are conserved between the two proteins. In the structures of complexes with the E. coli enzyme, the phosphate groups have the same positions in the (PPi plus Pi) complex and the (ADP plus Pi) complex, and are located at the bottom of a deep cavity with the adenosyl group nearer the entrance.

    Proteins where this domain is known:
    PY06246   


    PF02773 - S-AdoMet_synt_C (Pfam link)

    Interpro entry IPR002133 : S-adenosylmethionine synthetase (Interpro link)

    Pfam description:
    The three domains of S-adenosylmethionine synthetase have the same alpha+beta fold.

    Interpro description:

    S-adenosylmethionine synthetase (MAT) is the enzyme that catalyzes the formation of S-adenosylmethionine (AdoMet) from methionine and ATP. AdoMet is an important methyl donor for transmethylation and is also the propylamino donor in polyamine biosynthesis.

    In bacteria there is a single isoform of AdoMet synthetase (gene metK), there are two in budding yeast (genes SAM1 and SAM2) and in mammals while in plants there is generally a multigene family.

    The sequence of AdoMet synthetase is highly conserved throughout isozymes and species. The active sites of both the Escherichia coli and rat liver MAT reside between two subunits, with contributions from side chains of residues from both subunits, resulting in a dimer as the minimal catalytic entity. The side chains that contribute to the ligand binding sites are conserved between the two proteins. In the structures of complexes with the E. coli enzyme, the phosphate groups have the same positions in the (PPi plus Pi) complex and the (ADP plus Pi) complex, and are located at the bottom of a deep cavity with the adenosyl group nearer the entrance.

    Proteins where this domain is known:
    PY06246   


    PF02775 - TPP_enzyme_C (Pfam link)

    Interpro entry IPR011766 : Thiamine pyrophosphate enzyme, C-terminal TPP-binding (Interpro link)

    Interpro description:

    A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cofactor. It has been shown that some of these enzymes are structurally related. This represents the C-terminal TPP binding domain of TPP enzymes.

    Proteins where this domain is known:
    PY05696   


    PF02776 - TPP_enzyme_N (Pfam link)

    Interpro entry IPR012001 : Thiamine pyrophosphate enzyme, N-terminal TPP binding region (Interpro link)

    Interpro description:

    A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cofactor. It has been shown that some of these enzymes are structurally related. This represents the N-terminal TPP binding domain of TPP enzymes.

    Proteins where this domain is known:
    PY05696   


    PF02777 - Sod_Fe_C (Pfam link)

    Interpro entry IPR001189 : Manganese and iron superoxide dismutase (Interpro link)

    Pfam description:
    superoxide dismutases (SODs) catalyse the conversion of superoxide radicals to hydrogen peroxide and molecular oxygen. Three evolutionarily distinct families of SODs are known, of which the Mn/Fe-binding family is one. In humans, there is a cytoplasmic Cu/Zn SOD, and a mitochondrial Mn/Fe SOD. C-terminal domain is a mixed alpha/beta fold.

    Interpro description:

    Superoxide dismutases (SODs) catalyse the conversion of superoxide radicals to molecular oxygen. Their function is to destroy the radicals that are normally produced within cells and are toxic to biological systems. Three evolutionarily distinct families of SODs are known, of which the Mn/Fe-binding family is one. This family includes both single metal-binding SODs and cambialistic SOD, which can bind either Mn or Fe. Fe/MnSODs are ubiquitous enzymes that are responsible for the majority of SOD activity in prokaryotes, fungi, blue-green algae and mitochondria. Fe/MnSODs are found as homodimers or homotetramers.

    The structure of Fe/MnSODs can be divided into two domains, an alpha N-terminal domain and an alpha/beta C-terminal domain, connected by a loop. The structure of the N-terminal domain consists of a two helices in an antiparallel hairpin, with a left-handed twist. The structure of the C-terminal domain is of the alpha/beta type, and consists of a three-stranded antiparallel beta-sheet in the order 213, along with four helices in the arrangement alpha/beta(2)/alpha/beta/alpha(2).

    Proteins where this domain is known:
    PY04892    PY05422   


    PF02779 - Transket_pyr (Pfam link)

    Interpro entry IPR005475 : (Interpro link)

    Pfam description:
    This family includes transketolase enzymes, pyruvate dehydrogenases, and branched chain alpha-keto acid decarboxylases.

    Interpro description:

    Transketolase(TK) catalyzes the reversible transfer of a two-carbon ketol unit from xylulose 5-phosphate to an aldose receptor, such as ribose 5-phosphate, to form sedoheptulose 7-phosphate and glyceraldehyde 3- phosphate. This enzyme, together with transaldolase, provides a link between the glycolytic and pentose-phosphate pathways. TK requires thiamine pyrophosphate as a cofactor. In most sources where TK has been purified, it is a homodimer of approximately 70 Kd subunits. TK sequences from a variety of eukaryotic and prokaryotic sources show that the enzyme has been evolutionarily conserved. In the peroxisomes of methylotrophic yeast Pichia angusta (Yeast) (Hansenula polymorpha), there is a highly related enzyme, dihydroxy-acetone synthase (DHAS)(also known as formaldehyde transketolase), which exhibits a very unusual specificity by including formaldehyde amongst its substrates.

    1-deoxyxylulose-5-phosphate synthase (DXP synthase) is an enzyme so far found in bacteria (gene dxs) and plants (gene CLA1) which catalyzes the thiamine pyrophosphoate-dependent acyloin condensation reaction between carbon atoms 2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xylulose-5-phosphate (dxp), a precursor in the biosynthetic pathway to isoprenoids, thiamine (vitamin B1), and pyridoxol (vitamin B6). DXP synthase is evolutionary related to TK. The N-terminal section, contains a histidine residue which appears to function in proton transfer during catalysis . In the central section there are conserved acidic residues that are part of the active cleft and may participate in substrate-binding. This family includes transketolase enzymesand also partially matches to 2-oxoisovalerate dehydrogenase beta subunit. Both these enzymes utilise thiamine pyrophosphate as a cofactor, suggesting there may be common aspects in their mechanism of catalysis.

    Proteins where this domain is known:
    PY02421    PY03111    PY03843    PY04970    PY07062   


    PF02780 - Transketolase_C (Pfam link)

    Interpro entry IPR005476 : Transketolase, C-terminal (Interpro link)

    Pfam description:
    The C-terminal domain of transketolase has been proposed as a regulatory molecule binding site.

    Interpro description:

    Transketolase(TK) catalyzes the reversible transfer of a two-carbon ketol unit from xylulose 5-phosphate to an aldose receptor, such as ribose 5-phosphate, to form sedoheptulose 7-phosphate and glyceraldehyde 3- phosphate. This enzyme, together with transaldolase, provides a link between the glycolytic and pentose-phosphate pathways. TK requires thiamine pyrophosphate as a cofactor. In most sources where TK has been purified, it is a homodimer of approximately 70 Kd subunits. TK sequences from a variety of eukaryotic and prokaryotic sources show that the enzyme has been evolutionarily conserved. In the peroxisomes of methylotrophic yeast Pichia angusta (Yeast) (Hansenula polymorpha), there is a highly related enzyme, dihydroxy-acetone synthase (DHAS)(also known as formaldehyde transketolase), which exhibits a very unusual specificity by including formaldehyde amongst its substrates.

    1-deoxyxylulose-5-phosphate synthase (DXP synthase) is an enzyme so far found in bacteria (gene dxs) and plants (gene CLA1) which catalyzes the thiamine pyrophosphoate-dependent acyloin condensation reaction between carbon atoms 2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xylulose-5-phosphate (dxp), a precursor in the biosynthetic pathway to isoprenoids, thiamine (vitamin B1), and pyridoxol (vitamin B6). DXP synthase is evolutionary related to TK. The N-terminal section, contains a histidine residue which appears to function in proton transfer during catalysis . In the central section there are conserved acidic residues that are part of the active cleft and may participate in substrate-binding. This family includes transketolase enzymesand also partially matches to 2-oxoisovalerate dehydrogenase beta subunit. Both these enzymes utilise thiamine pyrophosphate as a cofactor, suggesting there may be common aspects in their mechanism of catalysis.

    Proteins where this domain is known:
    PY03843    PY04970    PY07062   

    Proteins where this domain has been detected by our approach:
    PY03111   


    PF02781 - G6PD_C (Pfam link)

    Interpro entry IPR001282 : Glucose-6-phosphate dehydrogenase (Interpro link)

    Interpro description:

    Glucose-6-phosphate dehydrogenase (G6PDH) is a ubiquitous protein, present in bacteria and all eukaryotic cell types. The enzyme catalyses the the first step in the pentose pathway, i.e. the conversion of glucose-6-phosphate to gluconolactone 6-phosphate in the presence of NADP, producing NADPH. The ubiquitous expression of the enzyme gives it a major role in the production of NADPH for the many NADPH-mediated reductive processes in all cells. Deficiency of G6PDH is a common genetic abnormality affecting millions of people worldwide. Many sequence variants, most caused by single point mutations, are known, exhibiting a wide variety of phenotypes.

    Proteins where this domain is known:
    PY00793   


    PF02782 - FGGY_C (Pfam link)

    Interpro entry IPR018485 : Carbohydrate kinase, FGGY, C-terminal (Interpro link)

    Pfam description:
    This domain adopts a ribonuclease H-like fold and is structurally related to the N-terminal domain.

    Interpro description:
    It has been shown that four different type of carbohydrate kinases seem to be evolutionary related. These enzymes include L-fucolokinase (gene fucK); gluconokinase (gene gntK); glycerol kinase (gene glpK); xylulokinase (gene xylB); and L-xylulose kinase (gene lyxK). These enzymes are proteins of from 480 to 520 amino acid residues.

    This entry represents the C-terminal domain of these proteins. It adopts a ribonuclease H-like fold and is structurally related to the N-terminal domain.

    Proteins where this domain is known:
    PY00935   


    PF02784 - Orn_Arg_deC_N (Pfam link)

    Interpro entry IPR000183 : Orn/DAP/Arg decarboxylase 2 (Interpro link)

    Pfam description:
    These pyridoxal-dependent decarboxylases acting on ornithine, lysine, arginine and related substrates This domain has a TIM barrel fold.

    Interpro description:
    These enzymes are collectively known as group IV decarboxylases. Pyridoxal-dependent decarboxylases acting on ornithine, lysine, arginine and related substrates can be classified into two different families on the basis of sequence similarities. Members of this family while most probably evolutionary related, do not share extensive regions of sequence similarities. The proteins contain a conserved lysine residue which is known, in mouse ODC, to be the site of attachment of the pyridoxal-phosphate group. The proteins also contain a stretch of three consecutive glycine residues and has been proposed to be part of a substrate- binding region.

    Proteins where this domain is known:
    PY04754   


    PF02785 - Biotin_carb_C (Pfam link)

    Interpro entry IPR005482 : Biotin carboxylase, C-terminal (Interpro link)

    Pfam description:
    Biotin carboxylase is a component of the acetyl-CoA carboxylase multi-component enzyme which catalyses the first committed step in fatty acid synthesis in animals, plants and bacteria. Most of the active site residues reported in reference are in this C-terminal domain.

    Interpro description:

    Acetyl-CoA carboxylase is found in all animals, plants, and bacteria and catalyzes the first committed step in fatty acid synthesis. It is a multicomponent enzyme containing a biotin carboxylase activity, a biotin carboxyl carrier protein, and a carboxyltransferase functionality. The "B-domain" extends from the main body of the subunit where it folds into two alpha-helical regions and three strands of beta-sheet. Following the excursion into the B-domain, the polypeptide chain folds back into the body of the protein where it forms an eight-stranded antiparallel beta-sheet. In addition to this major secondary structural element, the C-terminal domain also contains a smaller three-stranded antiparallel beta-sheet and seven alpha-helices.

    Proteins where this domain is known:
    PY01695   


    PF02786 - CPSase_L_D2 (Pfam link)

    Interpro entry IPR005479 : Carbamoyl phosphate synthetase, large subunit, ATP-binding (Interpro link)

    Pfam description:
    Carbamoyl-phosphate synthase catalyses the ATP-dependent synthesis of carbamyl-phosphate from glutamine or ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis of arginine and/or pyrimidines. The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large chain. The small chain promotes the hydrolysis of glutamine to ammonia, which is used by the large chain to synthesise carbamoyl phosphate. See Pfam:PF00988. The small chain has a GATase domain in the carboxyl terminus. See Pfam:PF00117. The ATP binding domain (this one) has an ATP-grasp fold.

    Interpro description:

    Carbamoyl phosphate synthase (CPSase) is a heterodimeric enzyme composed of a small and a large subunit (with the exception of CPSase III, see below). CPSase catalyses the synthesis of carbamoyl phosphate from biocarbonate, ATP and glutamine or ammonia, and represents the first committed step in pyrimidine and arginine biosynthesis in prokaryotes and eukaryotes, and in the urea cycle in most terrestrial vertebrates. CPSase has three active sites, one in the small subunit and two in the large subunit. The small subunit contains the glutamine binding site and catalyses the hydrolysis of glutamine to glutamate and ammonia. The large subunit has two homologous carboxy phosphate domains, both of which have ATP-binding sites; however, the N-terminal carboxy phosphate domain catalyses the phosphorylation of biocarbonate, while the C-terminal domain catalyses the phosphorylation of the carbamate intermediate. The carboxy phosphate domain found duplicated in the large subunit of CPSase is also present as a single copy in the biotin-dependent enzymes acetyl-CoA carboxylase (ACC), propionyl-CoA carboxylase (PCCase), pyruvate carboxylase (PC) and urea carboxylase.

    Most prokaryotes carry one form of CPSase that participates in both arginine and pyrimidine biosynthesis, however certain bacteria can have separate forms. The large subunit in bacterial CPSase has four structural domains: the carboxy phosphate domain 1, the oligomerisation domain, the carbamoyl phosphate domain 2 and the allosteric domain. CPSase heterodimers from Escherichia coli contain two molecular tunnels: an ammonia tunnel and a carbamate tunnel. These inter-domain tunnels connect the three distinct active sites, and function as conduits for the transport of unstable reaction intermediates (ammonia and carbamate) between successive active sites. The catalytic mechanism of CPSase involves the diffusion of carbamate through the interior of the enzyme from the site of synthesis within the N-terminal domain of the large subunit to the site of phosphorylation within the C-terminal domain.

    Eukaryotes have two distinct forms of CPSase: a mitochondrial enzyme (CPSase I) that participates in both arginine biosynthesis and the urea cycle; and a cytosolic enzyme (CPSase II) involved in pyrimidine biosynthesis. CPSase II occurs as part of a multi-enzyme complex along with aspartate transcarbamoylase and dihydroorotase; this complex is referred to as the CAD protein. The hepatic expression of CPSase is transcriptionally regulated by glucocorticoids and/or cAMP. There is a third form of the enzyme, CPSase III, found in fish, which uses glutamine as a nitrogen source instead of ammonia. CPSase III is closely related to CPSase I, and is composed of a single polypeptide that may have arisen from gene fusion of the glutaminase and synthetase domains.

    This entry represents the ATP-binding domain found in the large subunit of carbamoyl phosphate synthase, as well as in related proteins.

    Proteins where this domain is known:
    PY01695    PY04781    PY06257   


    PF02787 - CPSase_L_D3 (Pfam link)

    Interpro entry IPR005480 : Carbamoyl phosphate synthetase, large subunit, oligomerisation (Interpro link)

    Pfam description:
    Carbamoyl-phosphate synthase catalyses the ATP-dependent synthesis of carbamyl-phosphate from glutamine or ammonia and bicarbonate. The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large chain.

    Interpro description:

    Carbamoyl phosphate synthase (CPSase) is a heterodimeric enzyme composed of a small and a large subunit (with the exception of CPSase III, see below). CPSase catalyses the synthesis of carbamoyl phosphate from biocarbonate, ATP and glutamine or ammonia, and represents the first committed step in pyrimidine and arginine biosynthesis in prokaryotes and eukaryotes, and in the urea cycle in most terrestrial vertebrates. CPSase has three active sites, one in the small subunit and two in the large subunit. The small subunit contains the glutamine binding site and catalyses the hydrolysis of glutamine to glutamate and ammonia. The large subunit has two homologous carboxy phosphate domains, both of which have ATP-binding sites; however, the N-terminal carboxy phosphate domain catalyses the phosphorylation of biocarbonate, while the C-terminal domain catalyses the phosphorylation of the carbamate intermediate. The carboxy phosphate domain found duplicated in the large subunit of CPSase is also present as a single copy in the biotin-dependent enzymes acetyl-CoA carboxylase (ACC), propionyl-CoA carboxylase (PCCase), pyruvate carboxylase (PC) and urea carboxylase.

    Most prokaryotes carry one form of CPSase that participates in both arginine and pyrimidine biosynthesis, however certain bacteria can have separate forms. The large subunit in bacterial CPSase has four structural domains: the carboxy phosphate domain 1, the oligomerisation domain, the carbamoyl phosphate domain 2 and the allosteric domain. CPSase heterodimers from Escherichia coli contain two molecular tunnels: an ammonia tunnel and a carbamate tunnel. These inter-domain tunnels connect the three distinct active sites, and function as conduits for the transport of unstable reaction intermediates (ammonia and carbamate) between successive active sites. The catalytic mechanism of CPSase involves the diffusion of carbamate through the interior of the enzyme from the site of synthesis within the N-terminal domain of the large subunit to the site of phosphorylation within the C-terminal domain.

    Eukaryotes have two distinct forms of CPSase: a mitochondrial enzyme (CPSase I) that participates in both arginine biosynthesis and the urea cycle; and a cytosolic enzyme (CPSase II) involved in pyrimidine biosynthesis. CPSase II occurs as part of a multi-enzyme complex along with aspartate transcarbamoylase and dihydroorotase; this complex is referred to as the CAD protein. The hepatic expression of CPSase is transcriptionally regulated by glucocorticoids and/or cAMP. There is a third form of the enzyme, CPSase III, found in fish, which uses glutamine as a nitrogen source instead of ammonia. CPSase III is closely related to CPSase I, and is composed of a single polypeptide that may have arisen from gene fusion of the glutaminase and synthetase domains.

    This entry represents the oligomerisation domain found in the large subunit of carbamoyl phosphate synthases as well as in certain other carboxy phsophate domain-containing enzymes.

    Proteins where this domain is known:
    PY04781    PY06257   


    PF02789 - Peptidase_M17_N (Pfam link)

    Interpro entry IPR008283 : Peptidase M17, leucyl aminopeptidase, N-terminal (Interpro link)

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This group of metallopeptidases belong to the MEROPS peptidase family M17 (leucyl aminopeptidase family, clan MF), the type example being leucyl aminopeptidase from Bos taurus (Bovine).

    Aminopeptidases are exopeptidases involved in the processing and regular turnover of intracellular proteins, although their precise role in cellular metabolism is unclear. Leucine aminopeptidases cleave leucine residues from the N-terminal of polypeptide chains, but substantial rates are evident for all amino acids.

    The enzymes exist as homo-hexamers, comprising 2 trimers stacked on top of one another. Each monomer binds 2 zinc ions and folds into 2 alpha/beta-type quasi-spherical globular domains, producing a comma-like shape. The N-terminal 150 residues form a 5-stranded beta-sheet with 4 parallel and 1 anti-parallel strand sandwiched between 4 alpha-helices. An alpha-helix extends into the C-terminal domain, which comprises a central 8-stranded saddle-shaped beta-sheet sandwiched between groups of helices, forming the monomer hydrophobic core. A 3-stranded beta-sheet resides on the surface of the monomer, where it interacts with other members of the hexamer. The two zinc ions and the active site are entirely located in the C-terminal catalytic domain.

    Proteins where this domain has been detected by our approach:
    PY01898   


    PF02792 - Mago_nashi (Pfam link)

    Interpro entry IPR004023 : Mago nashi protein (Interpro link)

    Pfam description:
    This family was originally identified in Drosophila and called mago nashi, it is a strict maternal effect, grandchildless-like, gene. The human homologue has been shown to interact with an RNA binding protein Swiss:Q9Y5S9. An RNAi knockout of the C. elegans homologue causes masculinization of the germ line (Mog phenotype) hermaphrodites, suggesting it is involved in hermaphrodite germ-line sex determination. Mago nashi has been found to be part of the exon-exon junction complex that binds 20 nucleotides upstream of exon-exon junctions.

    Interpro description:
    This family was originally identified in drosophila and called mago nashi, it is a strict maternal effect, grandchildless-like, gene. The human homologue has been shown to interact with an RNA binding protein, ribonucleoprotein rbm8. An RNAi knockout of the Caenorhabditis elegans homologue causes masculinization of the germ line (Mog phenotype) hermaphrodites, suggesting it is involved in hermaphrodite germ-line sex determination but the protein is also found in hermaphrodites and other organisms without a sexual differentiation.

    Proteins where this domain is known:
    PY03676   


    PF02798 - GST_N (Pfam link)

    Interpro entry IPR004045 : (Interpro link)

    Pfam description:
    Function: conjugation of reduced glutathione to a variety of targets. Also included in the alignment, but are not GSTs: * S-crystallins from squid. Similarity to GST previously noted. * Eukaryotic elongation factors 1-gamma. Not known to have GST activity; similarity not previously recognised. * HSP26 family of stress-related proteins. including auxin-regulated proteins in plants and stringent starvation proteins in E. coli. Not known to have GST activity. Similarity not previously recognised. The glutathione molecule binds in a cleft between N and C-terminal domains - the catalytically important residues are proposed to reside in the N-terminal domain.

    Interpro description:

    In eukaryotes, glutathione S-transferases (GSTs) participate in the detoxification of reactive electrophilic compounds by catalysing their conjugation to glutathione. The GST domain is also found in S-crystallins from squid, and proteins with no known GST activity, such as eukaryotic elongation factors 1-gamma and the HSP26 family of stress-related proteins, which include auxin-regulated proteins in plants and stringent starvation proteins in Escherichia coli. The major lens polypeptide of Cephalopoda is also a GST.

    Bacterial GSTs of known function often have a specific, growth-supporting role in biodegradative metabolism: epoxide ring opening and tetrachlorohydroquinone reductive dehalogenation are two examples of the reactions catalysed by these bacterial GSTs. Some regulatory proteins, like the stringent starvation proteins, also belong to the GST family. GST seems to be absent from Archaea in which gamma-glutamylcysteine substitute to glutathione as major thiol.

    Soluble GSTs activate glutathione (GSH) to GS-. In many GSTs, this is accomplished by a Tyr at H-bonding distance from the sulphur of GSH. These enzymes catalyse nucleophilic attack by reduced glutathione (GSH) on nonpolar compounds that contain an electrophilic carbon, nitrogen, or sulphur atom.

    Glutathione S-transferases form homodimers, but in eukaryotes can also form heterodimers of the A1 and A2 or YC1 and YC2 subunits. The homodimeric enzymes display a conserved structural fold, with each monomer composed of two distinct domains. The N-terminal domain forms a thioredoxin-like fold that binds the glutathione moiety, while the C-terminal domain contains several hydrophobic alpha-helices that specifically bind hydrophobic substrates.

    This entry represents the N-terminal domain of GST.

    Proteins where this domain is known:
    PY05088   

    Proteins where this domain has been detected by our approach:
    PY07121   


    PF02799 - NMT_C (Pfam link)

    Interpro entry IPR000903 : Myristoyl-CoA:protein N-myristoyltransferase (Interpro link)

    Pfam description:
    The N and C-terminal domains of NMT are structurally similar, each adopting an acyl-CoA N-acyltransferase-like fold.

    Interpro description:
    Myristoyl-CoA:protein N-myristoyltransferase (Nmt) is the enzyme responsible for transferring a myristate group on the N-terminal glycine of a number of cellular eukaryotics and viral proteins. Nmt is a monomeric protein of about 50 to 60 kD whose sequence appears to be well conserved.

    Proteins where this domain is known:
    PY01548   


    PF02800 - Gp_dh_C (Pfam link)

    Interpro entry IPR000173 : Glyceraldehyde 3-phosphate dehydrogenase (Interpro link)

    Pfam description:
    GAPDH is a tetrameric NAD-binding enzyme involved in glycolysis and glyconeogenesis. C-terminal domain is a mixed alpha/antiparallel beta fold.

    Interpro description:

    Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) plays an important role in glycolysis and gluconeogenesis by reversibly catalysing the oxidation and phosphorylation of D-glyceraldehyde-3-phosphate to 1,3-diphospho-glycerate. The enzyme exists as a tetramer of identical subunits, each containing 2 conserved functional domains: an NAD-binding domain, and a highly conserved catalytic domain. The enzyme has been found to bind to actin and tropomyosin, and may thus have a role in cytoskeleton assembly. Alternatively, the cytoskeleton may provide a framework for precise positioning of the glycolytic enzymes, thus permitting efficient passage of metabolites from enzyme to enzyme.

    GAPDH displays diverse non-glycolytic functions as well, its role depending upon its subcellular location. For instance, the translocation of GAPDH to the nucleus acts as a signalling mechanism for programmed cell death, or apoptosis. The accumulation of GAPDH within the nucleus is involved in the induction of apoptosis, where GAPDH functions in the activation of transcription. The presence of GAPDH is associated with the synthesis of pro-apoptotic proteins like BAX, c-JUN and GAPDH itself.

    GAPDH has been implicated in certain neurological diseases: GAPDH is able to bind to the gene products from neurodegenerative disorders such as Huntington's disease, Alzheimer's disease, Parkinson's disease and Machado-Joseph disease through stretches encoded by their CAG repeats. Abnormal neuronal apoptosis is associated with these diseases. Propargylamines such as deprenyl increase neuronal survival by interfering with apoptosis signalling pathways via their binding to GAPDH, which decreases the synthesis of pro-apoptotic proteins.

    Proteins where this domain is known:
    PY03280   


    PF02801 - Ketoacyl-synt_C (Pfam link)

    Interpro entry IPR014031 : (Interpro link)

    Pfam description:
    The structure of beta-ketoacyl synthase is similar to that of the thiolase family (Pfam::PF00108) and also chalcone synthase. The active site of beta-ketoacyl synthase is located between the N and C-terminal domains.

    Interpro description:

    Beta-ketoacyl-ACP synthase(KAS) is the enzyme that catalyzes the condensation of malonyl-ACP with the growing fatty acid chain. It is found as a component of a number of enzymatic systems, including fatty acid synthetase (FAS), which catalyzes the formation of long-chain fatty acids from acetyl-CoA, malonyl-CoA and NADPH; the multi-functional 6-methysalicylic acid synthase (MSAS) from Penicillium patulum, which is involved in the biosynthesis of a polyketide antibiotic; polyketide antibiotic synthase enzyme systems; Emericella nidulans multifunctional protein Wa, which is involved in the biosynthesis of conidial green pigment; Rhizobium nodulation protein nodE, which probably acts as a beta-ketoacyl synthase in the synthesis of the nodulation Nod factor fatty acyl chain; and yeast mitochondrial protein CEM1. The condensation reaction is a two step process, first the acyl component of an activated acyl primer is transferred to a cysteine residue of the enzyme and is then condensed with an activated malonyl donor with the concomitant release of carbon dioxide.

    This entry represents the C-terminal domain of beta-ketoacyl-ACP synthases. The active site is contained in a cleft betweeen N- and C-terminal domains, with residues from both domains contributing to substrate binding and catalysis.

    Proteins where this domain is known:
    PY04452   


    PF02803 - Thiolase_C (Pfam link)

    Interpro entry IPR002155 : (Interpro link)

    Pfam description:
    Thiolase is reported to be structurally related to beta-ketoacyl synthase (Pfam:PF00109), and also chalcone synthase.

    Interpro description:

    Two different types of thiolase are found both in eukaryotes and in prokaryotes: acetoacetyl-CoA thiolase and 3-ketoacyl-CoA thiolase. 3-ketoacyl-CoA thiolase (also called thiolase I) has a broad chain-length specificity for its substrates and is involved in degradative pathways such as fatty acid beta-oxidation. Acetoacetyl-CoA thiolase (also called thiolase II) is specific for the thiolysis of acetoacetyl-CoA and involved in biosynthetic pathways such as poly beta-hydroxybutyrate synthesis or steroid biogenesis.

    In eukaryotes, there are two forms of 3-ketoacyl-CoA thiolase: one located in the mitochondrion and the other in peroxisomes.

    There are two conserved cysteine residues important for thiolase activity. The first located in the N-terminal section of the enzymes is involved in the formation of an acyl-enzyme intermediate; the second located at the C-terminal extremity is the active site base involved in deprotonation in the condensation reaction.

    Mammalian nonspecific lipid-transfer protein (nsL-TP) (also known as sterol carrier protein 2) is a protein which seems to exist in two different forms: a 14 Kd protein (SCP-2) and a larger 58 Kd protein (SCP-x). The former is found in the cytoplasm or the mitochondria and is involved in lipid transport; the latter is found in peroxisomes. The C-terminal part of SCP-x is identical to SCP-2 while the N-terminal portion is evolutionary related to thiolases.

    Proteins where this domain is known:
    PY01991   


    PF02809 - UIM (Pfam link)

    Interpro entry IPR003903 : (Interpro link)

    Pfam description:
    This motif is called the ubiquitin interaction motif. One of the proteins containing this motif is a receptor for poly-ubiquitination chains for the proteasome. This motif has a pattern of conservation characteristic of an alpha helix.

    Interpro description:

    The Ubiquitin Interacting Motif (UIM), or 'LALAL-motif', is a stretch of about 20 amino acid residues, which was first described in the 26S proteasome subunit PSD4/RPN-10 that is known to recognise ubiquitin. In addition, the UIM is found, often in tandem or triplet arrays, in a variety of proteins either involved in ubiquitination and ubiquitin metabolism, or known to interact with ubiquitin-like modifiers. Among the UIM proteins are two different subgroups of the UBP (ubiquitin carboxy-terminal hydrolase) family of deubiquitinating enzymes, one F-box protein, one family of HECT-containing ubiquitin-ligases (E3s) from plants, and several proteins containing ubiquitin-associated UBA and/or UBX domains. In most of these proteins, the UIM occurs in multiple copies and in association with other domains such as UBA, UBX, ENTH, EH, VHS, SH3, HECT, VWFA, EF-hand calcium-binding, WD-40, F-box, LIM, protein kinase, ankyrin, PX, phosphatidylinositol 3- and 4-kinase, C2, OTU, dnaJ, RING-finger or FYVE-finger. UIMs have been shown to bind ubiquitin and to serve as a specific targeting signal important for monoubiquitination. Thus, UIMs may have several functions in ubiquitin metabolism each of which may require different numbers of UIMs.

    The UIM is unlikely to form an independent folding domain. Instead, based on the spacing of the conserved residues, the motif probably forms a short alpha-helix that can be embedded into different protein folds. Some proteins known to contain an UIM are listed below:

    Proteins where this domain is known:
    PY04518   

    Proteins where this domain has been detected by our approach:
    PY04193    PY06998   


    PF02810 - SEC-C (Pfam link)

    Interpro entry IPR004027 : (Interpro link)

    Pfam description:
    The SEC-C motif found in the C-terminus of the SecA protein, in the middle of some SWI2 ATPases and also solo in several proteins. The motif is predicted to chelate zinc with the CXC and pairs that constitute the most conserved feature of the motif. It is predicted to be a potential nucleic acid binding domain.

    Interpro description:
    The SEC-C motif found in the C-terminus of the SecA protein, in the middle of some SWI2 ATPases and also solo in several proteins. The motif is predicted to chelate zinc with the CXC and C[HC] pairs that constitute the most conserved feature of the motif. It is predicted to be a potential nucleic acid binding domain.

    Proteins where this domain has been detected by our approach:
    PY01291   


    PF02812 - ELFV_dehydrog_N (Pfam link)

    Interpro entry IPR006097 : Glutamate/phenylalanine/leucine/valine dehydrogenase, dimerisation region (Interpro link)

    Interpro description:

    Glutamate, leucine, phenylalanine and valine dehydrogenases are structurally and functionally related. They contain a Gly-rich region containing a conserved Lys residue, which has been implicated in the catalytic activity, in each case a reversible oxidative deamination reaction.

    Glutamate dehydrogenases (GluDH) are enzymes that catalyse the NAD- and/or NADP-dependent reversible deamination of L-glutamate into alpha-ketoglutarate. GluDH isozymes are generally involved with either ammonia assimilation or glutamate catabolism. Two separate enzymes are present in yeasts: the NADP-dependent enzyme, which catalyses the amination of alpha-ketoglutarate to L-glutamate; and the NAD-dependent enzyme, which catalyses the reverse reaction - this form links the L-amino acids with the Krebs cycle, which provides a major pathway for metabolic interconversion of alpha-amino acids and alpha- keto acids.

    Leucine dehydrogenase (LeuDH) is a NAD-dependent enzyme that catalyses the reversible deamination of leucine and several other aliphatic amino acids to their keto analogues. Each subunit of this octameric enzyme from Bacillus sphaericus contains 364 amino acids and folds into two domains, separated by a deep cleft. The nicotinamide ring of the NAD+ cofactor binds deep in this cleft, which is thought to close during the hydride transfer step of the catalytic cycle.

    Phenylalanine dehydrogenase (PheDH) is na NAD-dependent enzyme that catalyses the reversible deamidation of L-phenylalanine into phenyl-pyruvate.

    Valine dehydrogenase (ValDH) is an NADP-dependent enzyme that catalyses the reversible deamidation of L-valine into 3-methyl-2-oxobutanoate.

    This entry represents the dimerisation region of these enzymes.

    Proteins where this domain is known:
    PY01264    PY04261   


    PF02817 - E3_binding (Pfam link)

    Interpro entry IPR004167 : E3 binding (Interpro link)

    Pfam description:
    This family represents a small domain of the E2 subunit of 2-oxo-acid dehydrogenases responsible for the binding of the E3 subunit.

    Interpro description:
    A small domain of the E2 subunit of 2-oxo-acid dehydrogenases that is responsible for the binding of the E3 subunit. Proteins containing this domain include the branched-chain alpha-keto acid dehydrogenase complex of bacteria, which catalyses the overall conversion of alpha-keto acids to acyl-CoA and carbon dioxide; and the E-3 binding protein of eukaryotic pyruvate dehydrogenase.

    Proteins where this domain has been detected by our approach:
    PY00503    PY04573   


    PF02823 - ATP-synt_DE_N (Pfam link)

    Interpro entry IPR001469 : ATPase, F1 complex, delta/epsilon subunit (Interpro link)

    Pfam description:
    Part of the ATP synthase CF(1). These subunits are part of the head unit of the ATP synthase. The subunit is called epsilon in bacteria and delta in mitochondria. In bacteria the delta (D) subunit is equivalent to the mitochondrial Oligomycin sensitive subunit, OSCP (Pfam:PF00213).

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    F-ATPases (also known as F1F0-ATPase, or H(+)-transporting two-sector ATPase) are composed of two linked complexes: the F1 ATPase complex is the catalytic core and is composed of 5 subunits (alpha, beta, gamma, delta, epsilon), while the F0 ATPase complex is the membrane-embedded proton channel that is composed of at least 3 subunits (A-C), nine in mitochondria (A-G, F6, F8). Both the F1 and F0 complexes are rotary motors that are coupled back-to-back. In the F1 complex, the central gamma subunit forms the rotor inside the cylinder made of the alpha(3)beta(3) subunits, while in the F0 complex, the ring-shaped C subunits forms the rotor. The two rotors rotate in opposite directions, but the F0 rotor is usually stronger, using the force from the proton gradient to push the F1 rotor in reverse in order to drive ATP synthesis . These ATPases can also work in reverse to hydrolyse ATP to create a proton gradient.

    This family represents subunits called delta (in mitochondrial ATPase) or epsilon (in bacteria or chloroplast ATPase). The interaction site of subunit C of the F0 complex with the delta or epsilon subunit of the F1 complex may be important for connecting the rotor of F1 (gamma subunit) to the rotor of F0 (C subunit). In bacterial species, the delta subunit is the equivalent of the Oligomycin sensitive subunit (OSCP) in metazoans. The C-terminal domain of the epsilon subunit appears to act as an inhibitor of ATPase activity.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY05978   


    PF02824 - TGS (Pfam link)

    Interpro entry IPR004095 : (Interpro link)

    Pfam description:
    The TGS domain is named after ThrRS, GTPase, and SpoT. Interestingly, TGS domain was detected also at the amino terminus of the uridine kinase from the spirochaete Treponema pallidum (but not any other organism, including the related spirochaete Borrelia burgdorferi). TGS is a small domain that consists of ~50 amino acid residues and is predicted to possess a predominantly beta-sheet structure. There is no direct information on the functions of the TGS domain, but its presence in two types of regulatory proteins (the GTPases and guanosine polyphosphate phosphohydrolases/synthetases) suggests a ligand (most likely nucleotide)-binding, regulatory role.

    Interpro description:

    The TGS domain is present in a number of enzymes, for example, in threonyl-tRNA synthetase (ThrRS), GTPase, and guanosine-3',5'-bis(diphosphate) 3'-pyrophosphohydrolase (SpoT). The TGS domain is also present at the amino terminus of the uridine kinase from the spirochaete Treponema pallidum (but not any other organism, including the related spirochaete Borrelia burgdorferi).

    TGS is a small domain that consists of ~50 amino acid residues and is predicted to possess a predominantly beta-sheet structure. There is no direct information on the functions of the TGS domain, but its presence in two types of regulatory proteins (the GTPases and guanosine polyphosphate phosphohydrolases/synthetases) suggests a ligand (most likely nucleotide)-binding, regulatory role.

    Proteins where this domain is known:
    PY03928    PY06396   


    PF02839 - CBM_5_12 (Pfam link)

    Interpro entry IPR003610 : Carbohydrate-binding family V/XII (Interpro link)

    Pfam description:
    This short domain is found in many different glycosyl hydrolase enzymes and is presumed to have a carbohydrate binding function. The domain has six aromatic groups that may be important for binding.

    Interpro description:

    The carbohydrate-binding domain (CBD) is a short domain found in many different glycosyl hydrolase enzymes, such as the C-terminal cellulose-binding domain of endoglucanase Z. The domain has a core structure consisting of a 3-stranded meander beta-sheet, which contains six aromatic groups that may be important for binding.

    The overall topology of the CBD is structurally similar to the C-terminal chitin-binding domains (ChBD) of chitinase A1 and chitinase B, however the binding mechanism for the ChBD may be different from that of the CBD.

    Proteins where this domain has been detected by our approach:
    PY07200   


    PF02847 - MA3 (Pfam link)

    Interpro entry IPR003891 : (Interpro link)

    Pfam description:
    Domain in DAP-5, eIF4G, MA-3 and other proteins. Highly alpha-helical. May contain repeats and/or regions similar to MIF4G domains.

    Interpro description:

    This entry represents the MI domain (after MA-3 and eIF4G), it is a protein-protein interaction module of ~130 amino acids. It appears in several translation factors and is found in:

    The MI domain consists of seven alpha-helices, which pack into a globular form. The packing arrangement consists of repeating pairs of antiparallel helices packed one upon the other such that a superhelical axis is generated perpendicular to the alpha-helical axes.

    The MI domain has also been named MA3 domain.

    Proteins where this domain is known:
    PY04186    PY04825   

    Proteins where this domain has been detected by our approach:
    PY00162   


    PF02852 - Pyr_redox_dim (Pfam link)

    Interpro entry IPR004099 : Pyridine nucleotide-disulphide oxidoreductase, dimerisation (Interpro link)

    Pfam description:
    This family includes both class I and class II oxidoreductases and also NADH oxidases and peroxidases.

    Interpro description:

    This entry represents a dimerisation domain that is usually found at the C-terminal of both class I and class II oxidoreductases, as well as in NADH oxidases and peroxidases.

    Proteins where this domain is known:
    PY00573    PY01204    PY02397    PY04793   


    PF02854 - MIF4G (Pfam link)

    Interpro entry IPR003890 : MIF4G-like, type 3 (Interpro link)

    Pfam description:
    MIF4G is named after Middle domain of eukaryotic initiation factor 4G (eIF4G). Also occurs in NMD2p and CBP80. The domain is rich in alpha-helices and may contain multiple alpha-helical repeats. In eIF4G, this domain binds eIF4A, eIF3, RNA and DNA.

    Interpro description:

    This entry represents an MIF4G-like domain. MIF4G domains share a common structure but can differ in sequence. This entry is designated "type 3", and is found in nuclear cap-binding proteins, eIF4G, and UPF2.

    The MIF4G domain is a structural motif with an ARM (Armadillo) repeat-type fold, consisting of a 2-layer alpha/alpha right-handed superhelix. Proteins usually contain two or more structurally similar MIF4G domains connected by unstructured linkers. MIF4G domains are found in several proteins involved in RNA metabolism, including eIF4G (eukaryotic initiation factor 4-gamma), eIF-2b (translation initiation factor), UPF2 (regulator of nonsense transcripts 2), and nuclear cap-binding proteins (CBP80, CBC1, NCBP1), although the sequence identity between them may be low.

    The nuclear cap-binding complex (CBC) is a heterodimer. Human CBC consists of a large CBP80 subunit and a small CBP20 subunit, the latter being critical for cap binding. CBP80 contains three MIF4G domains connected with long linkers, while CBP20 has an RNP (ribonucleoprotein)-type domain that associates with domains 2 and 3 of CBP80. The complex binds to 5'-cap of eukaryotic RNA polymerase II transcripts, such as mRNA and U snRNA. The binding is important for several mRNA nuclear maturation steps and for nonsense-mediated decay. It is also essential for nuclear export of U snRNAs in metazoans.

    Eukaryotic translation initiation factor 4 gamma (eIF4G) plays a critical role in protein expression, and is at the centre of a complex regulatory network. Together with the cap-binding protein eIF4E, it recruits the small ribosomal subunit to the 5'-end of mRNA and promotes the assembly of a functional translation initiation complex, which scans along the mRNA to the translation start codon. The activity of eIF4G in translation initiation could be regulated through intra- and inter-protein interactions involving the ARM repeats. In eIF4G, the MIF4G domain binds eIF4A, eIF3, RNA and DNA.

    Nonsense-mediated mRNA decay (NMD) in eukaryotes involves UPF1, UPF2 and UPF3 to accelerate the decay rate of two unique classes of transcripts: (1) nonsense mRNAs that arise through errors in gene expression, and (2) naturally occurring transcripts that lack coding errors but have built-in features that target them for accelerated decay (error-free mRNAs). NMD can trigger decay during any round of translation and can target CBC-bound or eIF-4E-bound transcripts. UPF2 contains MIF4G domains, while UPF3 contains an RNP domain.

    Proteins where this domain is known:
    PY00162    PY03626    PY03876    PY04825    PY07184   


    PF02861 - Clp_N (Pfam link)

    Interpro entry IPR004176 : Clp, N-terminal (Interpro link)

    Pfam description:
    This short domain is found in one or two copies at the amino terminus of ClpA and ClpB proteins from bacteria and eukaryotes. The function of these domains is uncertain but they may form a protein binding site.

    Interpro description:
    This short domain is found in one or two copies at the amino terminus of ClpA and ClpB proteins from bacteria and eukaryotes. The function of these domains is uncertain but they may form a protein binding site. The proteins are thought to be subunits of ATP-dependent proteases which act as chaperones to target the proteases to substrates.

    Proteins where this domain is known:
    PY00565    PY05364   

    Proteins where this domain has been detected by our approach:
    PY06430   


    PF02862 - DDHD (Pfam link)

    Interpro entry IPR004177 : DDHD (Interpro link)

    Pfam description:
    The DDHD domain is 180 residues long and contains four conserved residues that may form a metal binding site. The domain is named after these four residues. This pattern of conservation of metal binding residues is often seen in phosphoesterase domains. This domain is found in retinal degeneration B proteins, as well as a family of probable phospholipases. It has been shown that this domain is found in a longer C terminal region that binds to PYK2 tyrosine kinase. These proteins have been called N-terminal domain-interacting receptor (Nir1, Nir2 and Nir3). This suggests that this region is involved in functionally important interactions in other members of this family.

    Interpro description:
    The DDHD domain is 180 residues long and contains four conserved residues that may form a metal binding site. The domain is named after these four residues. This pattern of conservation of metal binding residues is often seen in phosphoesterase domains. This domain is found in retinal degeneration B proteins, as well as a family of probable phospholipases.

    Proteins where this domain is known:
    PY02403   


    PF02866 - Ldh_1_C (Pfam link)

    Interpro entry IPR001236 : Lactate/malate dehydrogenase (Interpro link)

    Pfam description:
    L-lactate dehydrogenases are metabolic enzymes which catalyse the conversion of L-lactate to pyruvate, the last step in anaerobic glycolysis. L-2-hydroxyisocaproate dehydrogenases are also members of the family. Malate dehydrogenases catalyse the interconversion of malate to oxaloacetate. The enzyme participates in the citric acid cycle. L-lactate dehydrogenase is also found as a lens crystallin in bird and crocodile eyes.

    Interpro description:

    L-lactate dehydrogenases are metabolic enzymes which catalyse the conversion of L-lactate to pyruvate, the last step in anaerobic glycolysis. L-lactate dehydrogenase is also found as a lens crystallin in bird and crocodile eyes. L-2-hydroxyisocaproate dehydrogenases are also members of the family. Malate dehydrogenases catalyse the interconversion of malate to oxaloacetate. The enzyme participates in the citric acid cycle.

    Proteins where this domain is known:
    PY03376    PY03397    PY03885   


    PF02867 - Ribonuc_red_lgC (Pfam link)

    Interpro entry IPR000788 : Ribonucleotide reductase large subunit, C-terminal (Interpro link)

    Interpro description:

    Ribonucleotide reductase catalyzes the reductive synthesis of deoxyribonucleotides from their corresponding ribonucleotides. It provides the precursors necessary for DNA synthesis. RNRs divide into three classes on the basis of their metallocofactor usage. Class I RNRs, found in eukaryotes, bacteria, bacteriophage and viruses, use a diiron-tyrosyl radical, Class II RNRs, found in bacteria, bacteriophage, algae and archaea, use coenzyme B12 (adenosylcobalamin, AdoCbl). Class III RNRs, found in anaerobic bacteria and bacteriophage, use an FeS cluster and S-adenosylmethionine to generate a glycyl radical. Many organisms have more than one class of RNR present in their genomes.

    Ribonucleotide reductase is an oligomeric enzyme composed of a large subunit (700 to 1000 residues) and a small subunit (300 to 400 residues) - class II RNRs are less complex, using the small molecule B12 in place of the small chain.

    The reduction of ribonucleotides to deoxyribonucleotides involves the transfer of free radicals, the function of each metallocofactor is to generate an active site thiyl radical. This thiyl radical then initiates the nucleotide reduction process by hydrogen atom abstraction from the ribonucleotide. The radical-based reaction involves five cysteines: two of these are located at adjacent anti-parallel strands in a new type of ten-stranded alpha/beta-barrel; two others reside at the carboxyl end in a flexible arm; and the fifth, in a loop in the centre of the barrel, is positioned to initiate the radical reaction. There are several regions of similarity in the sequence of the large chain of prokaryotes, eukaryotes and viruses spread across 3 domains: an N-terminal domain common to the mammalian and bacterial enzymes; a C-terminal domain common to the mammalian and viral ribonucleotide reductases; and a central domain common to all three.

    Proteins where this domain is known:
    PY03473   


    PF02874 - ATP-synt_ab_N (Pfam link)

    Interpro entry IPR004100 : ATPase, F1/V1/A1 complex, alpha/beta subunit, N-terminal (Interpro link)

    Pfam description:
    This family includes the ATP synthase alpha and beta subunits the ATP synthase associated with flagella.

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    This entry represents the alpha and beta subunits found in the F1, V1, and A1 complexes of F-, V- and A-ATPases, respectively (sometimes called the A and B subunits in V- and A-ATPases). The F-ATPases (or F1F0-ATPases), V-ATPases (or V1V0-ATPases) and A-ATPases (or A1A0-ATPases) are composed of two linked complexes: the F1, V1 or A1 complex contains the catalytic core that synthesizes/hydrolyses ATP, and the F0, V0 or A0 complex that forms the membrane-spanning pore. The F-, V- and A-ATPases all contain rotary motors, one that drives proton translocation across the membrane and one that drives ATP synthesis/hydrolysis .

    In F-ATPases, there are three copies each of the alpha and beta subunits that form the catalytic core of the F1 complex, while the remaining F1 subunits (gamma, delta, epsilon) form part of the stalks. There is a substrate-binding site on each of the alpha and beta subunits, those on the beta subunits being catalytic, while those on the alpha subunits are regulatory. The alpha and beta subunits form a cylinder that is attached to the central stalk. The alpha/beta subunits undergo a sequence of conformational changes leading to the formation of ATP from ADP, which are induced by the rotation of the gamma subunit, itself is driven by the movement of protons through the F0 complex C subunit.

    In V- and A-ATPases, the alpha/A and beta/B subunits of the V1 or A1 complex are homologous to the alpha and beta subunits in the F1 complex of F-ATPases, except that the alpha subunit is catalytic and the beta subunit is regulatory.

    The alpha/A and beta/B subunits can each be divided into three regions, or domains, centred around the ATP-binding pocket, and based on structure and function, where the central region is the nucleotide-binding domain. This entry represents the N-terminal domain of the alpha/A/beta/B subunits, which forms a closed beta-barrel with Greek-key topology.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY00963    PY01556    PY05102    PY05971   


    PF02878 - PGM_PMM_I (Pfam link)

    Interpro entry IPR005844 : Alpha-D-phosphohexomutase, alpha/beta/alpha domain I (Interpro link)

    Interpro description:

    The alpha-D-phosphohexomutase superfamily is composed of four related enzymes, each of which catalyses a phosphoryl transfer on their sugar substrates: phosphoglucomutase (PGM), phosphoglucomutase/phosphomannomutase (PGM/PMM), phosphoglucosamine mutase (PNGM), and phosphoacetylglucosamine mutase (PAGM). PGM converts D-glucose 1-phosphate into D-glucose 6-phosphate, and participates in both the breakdown and synthesis of glucose. PGM/PMM () are primarily bacterial enzymes that use either glucose or mannose as substrate, participating in the biosynthesis of a variety of carbohydrates such as lipopolysaccharides and alginate. Both PNGM () and PAGM () are involved in the biosynthesis of UDP-N-acetylglucosamine.

    Despite differences in substrate specificity, these enzymes share a similar catalytic mechanism, converting 1-phospho-sugars to 6-phospho-sugars via a biphosphorylated 1,6-phospho-sugar. The active enzyme is phosphorylated at a conserved serine residue and binds one magnesium ion; residues around the active site serine are well conserved among family members. The reaction mechanism involves phosphoryl transfer from the phosphoserine to the substrate to create a biophosphorylated sugar, followed by a phosphoryl transfer from the substrate back to the enzyme.

    The structures of PGM and PGM/PMM have been determined, and were found to be very similar in topology. These enzymes are both composed of four domains and a large central active site cleft, where each domain contains residues essential for catalysis and/or substrate recognition. Domain I contains the catalytic phosphoserine, domain II contains a metal-binding loop to coordinate the magnesium ion, domain III contains the sugar-binding loop that recognises the two different binding orientations of the 1- and 6-phospho-sugars, and domain IV contains a phosphate-binding site required for orienting the incoming phospho-sugar substrate.

    This entry represents domain I found in alpha-D-phosphohexomutase enzymes. This domain has a 3-layer alpha/beta/alpha topology.

    Proteins where this domain is known:
    PY02130    PY03478   


    PF02879 - PGM_PMM_II (Pfam link)

    Interpro entry IPR005845 : Alpha-D-phosphohexomutase, alpha/beta/alpha domain II (Interpro link)

    Interpro description:

    The alpha-D-phosphohexomutase superfamily is composed of four related enzymes, each of which catalyses a phosphoryl transfer on their sugar substrates: phosphoglucomutase (PGM), phosphoglucomutase/phosphomannomutase (PGM/PMM), phosphoglucosamine mutase (PNGM), and phosphoacetylglucosamine mutase (PAGM). PGM converts D-glucose 1-phosphate into D-glucose 6-phosphate, and participates in both the breakdown and synthesis of glucose. PGM/PMM () are primarily bacterial enzymes that use either glucose or mannose as substrate, participating in the biosynthesis of a variety of carbohydrates such as lipopolysaccharides and alginate. Both PNGM () and PAGM () are involved in the biosynthesis of UDP-N-acetylglucosamine.

    Despite differences in substrate specificity, these enzymes share a similar catalytic mechanism, converting 1-phospho-sugars to 6-phospho-sugars via a biphosphorylated 1,6-phospho-sugar. The active enzyme is phosphorylated at a conserved serine residue and binds one magnesium ion; residues around the active site serine are well conserved among family members. The reaction mechanism involves phosphoryl transfer from the phosphoserine to the substrate to create a biophosphorylated sugar, followed by a phosphoryl transfer from the substrate back to the enzyme.

    The structures of PGM and PGM/PMM have been determined, and were found to be very similar in topology. These enzymes are both composed of four domains and a large central active site cleft, where each domain contains residues essential for catalysis and/or substrate recognition. Domain I contains the catalytic phosphoserine, domain II contains a metal-binding loop to coordinate the magnesium ion, domain III contains the sugar-binding loop that recognises the two different binding orientations of the 1- and 6-phospho-sugars, and domain IV contains a phosphate-binding site required for orienting the incoming phospho-sugar substrate.

    This entry represents domain II found in alpha-D-phosphohexomutase enzymes. This domain has a 3-layer alpha/beta/alpha topology.

    Proteins where this domain is known:
    PY03478   

    Proteins where this domain has been detected by our approach:
    PY02130   


    PF02880 - PGM_PMM_III (Pfam link)

    Interpro entry IPR005846 : Alpha-D-phosphohexomutase, alpha/beta/alpha domain III (Interpro link)

    Interpro description:

    The alpha-D-phosphohexomutase superfamily is composed of four related enzymes, each of which catalyses a phosphoryl transfer on their sugar substrates: phosphoglucomutase (PGM), phosphoglucomutase/phosphomannomutase (PGM/PMM), phosphoglucosamine mutase (PNGM), and phosphoacetylglucosamine mutase (PAGM). PGM converts D-glucose 1-phosphate into D-glucose 6-phosphate, and participates in both the breakdown and synthesis of glucose. PGM/PMM () are primarily bacterial enzymes that use either glucose or mannose as substrate, participating in the biosynthesis of a variety of carbohydrates such as lipopolysaccharides and alginate. Both PNGM () and PAGM () are involved in the biosynthesis of UDP-N-acetylglucosamine.

    Despite differences in substrate specificity, these enzymes share a similar catalytic mechanism, converting 1-phospho-sugars to 6-phospho-sugars via a biphosphorylated 1,6-phospho-sugar. The active enzyme is phosphorylated at a conserved serine residue and binds one magnesium ion; residues around the active site serine are well conserved among family members. The reaction mechanism involves phosphoryl transfer from the phosphoserine to the substrate to create a biophosphorylated sugar, followed by a phosphoryl transfer from the substrate back to the enzyme.

    The structures of PGM and PGM/PMM have been determined, and were found to be very similar in topology. These enzymes are both composed of four domains and a large central active site cleft, where each domain contains residues essential for catalysis and/or substrate recognition. Domain I contains the catalytic phosphoserine, domain II contains a metal-binding loop to coordinate the magnesium ion, domain III contains the sugar-binding loop that recognises the two different binding orientations of the 1- and 6-phospho-sugars, and domain IV contains a phosphate-binding site required for orienting the incoming phospho-sugar substrate.

    This entry represents domain III found in alpha-D-phosphohexomutase enzymes. This domain has a 3-layer alpha/beta/alpha topology.

    Proteins where this domain has been detected by our approach:
    PY03478   


    PF02881 - SRP54_N (Pfam link)

    Interpro entry IPR013822 : Signal recognition particle, SRP54 subunit, helical bundle (Interpro link)

    Interpro description:

    The signal recognition particle (SRP) is a multimeric protein, which along with its conjugate receptor (SR), is involved in targeting secretory proteins to the rough endoplasmic reticulum (RER) membrane in eukaryotes, or to the plasma membrane in prokaryotes. SRP recognises the signal sequence of the nascent polypeptide on the ribosome, retards its elongation, and docks the SRP-ribosome-polypeptide complex to the RER membrane via the SR receptor. SRP consists of six polypeptides (SRP9, SRP14, SRP19, SRP54, SRP68 and SRP72) and a single 300 nucleotide 7S RNA molecule. The RNA component catalyses the interaction of SRP with its SR receptor. In higher eukaryotes, the SRP complex consists of the Alu domain and the S domain linked by the SRP RNA. The Alu domain consists of a heterodimer of SRP9 and SRP14 bound to the 5' and 3' terminal sequences of SRP RNA. This domain is necessary for retarding the elongation of the nascent polypeptide chain, which gives SRP time to dock the ribosome-polypeptide complex to the RER membrane.

    This entry represents the N-terminal helical bundle domain of the 54 kDa SRP54 component, a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. SRP54 of the signal recognition particle has a three-domain structure: an N-terminal helical bundle domain, a GTPase domain, and the M-domain that binds the 7s RNA and also binds the signal sequence. The extreme C-terminal region is glycine-rich and lower in complexity and poorly conserved between species.

    These proteins include Escherichia coli and Bacillus subtilis ffh protein (P48), which seems to be the prokaryotic counterpart of SRP54; signal recognition particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which ensures, in conjunction with SRP, the correct targeting of nascent secretory proteins to the endoplasmic reticulum membrane; bacterial FtsY protein, which is believed to play a similar role to that of the docking protein in eukaryotes; the pilA protein from Neisseria gonorrhoeae, the homolog of ftsY; and bacterial flagellar biosynthesis protein flhF.

    Proteins where this domain is known:
    PY06341   

    Proteins where this domain has been detected by our approach:
    PY04912   


    PF02883 - Alpha_adaptinC2 (Pfam link)

    Interpro entry IPR008152 : Clathrin adaptor, alpha/beta/gamma-adaptin, appendage, Ig-like subdomain (Interpro link)

    Pfam description:
    Alpha adaptin is a heterotetramer which regulates clathrin-bud formation. The carboxyl-terminal appendage of the alpha subunit regulates translocation of endocytic accessory proteins to the bud site. This ig-fold domain is found in alpha, beta and gamma adaptins.

    Interpro description:

    Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors.

    AP (adaptor protein) complexes are found in coated vesicles and clathrin-coated pits. AP complexes connect cargo proteins and lipids to clathrin at vesicle budding sites, as well as binding accessory proteins that regulate coat assembly and disassembly (such as AP180, epsins and auxilin). There are different AP complexes in mammals. AP1 is responsible for the transport of lysosomal hydrolases between the TGN and endosomes. AP2 associates with the plasma membrane and is responsible for endocytosis. AP3 is responsible for protein trafficking to lysosomes and other related organelles. AP4 is less well characterised. AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). For example, in AP1 these subunits are gamma-1-adaptin, beta-1-adaptin, mu-1 and sigma-1, while in AP2 they are alpha-adaptin, beta-2-adaptin, mu-2 and sigma-2. Each subunit has a specific function. Adaptins recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal ear (appendage) domains. Mu recognises tyrosine-based sorting signals within the cytoplasmic domains of transmembrane cargo proteins. One function of clathrin and AP2 complex-mediated endocytosis is to regulate the number of GABA(A) receptors available at the cell surface .

    GGAs (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) are a family of monomeric clathrin adaptor proteins that are conserved from yeasts to humans. GGAs regulate clathrin-mediated the transport of proteins (such as mannose 6-phosphate receptors) from the TGN to endosomes and lysosomes through interactions with TGN-sorting receptors, sometimes in conjunction with AP-1. GGAs bind cargo, membranes, clathrin and accessory factors. GGA1, GGA2 and GGA3 all contain a domain homologous to the ear domain of gamma-adaptin. GGAs are composed of a single polypeptide with four domains: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The VHS domain is responsible for endocytosis and signal transduction, recognising transmembrane cargo through the ACLL sequence in the cytoplasmic domains of sorting receptors. The GAT domain (also found in Tom1 proteins) interacts with ARF (ADP-ribosylation factor) to regulate membrane trafficking, and with ubiquitin for receptor sorting. The hinge region contains a clathrin box for recognition and binding to clathrin, similar to that found in AP adaptins. The GAE domain is similar to the AP gamma-adaptin ear domain, and is responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis.

    This entry represents a beta-sandwich structural motif found in the appendage (ear) domain of alpha-, beta- and gamma-adaptin from AP clathrin adaptor complexes, and the GAE (gamma-adaptin ear) domain of GGA adaptor proteins. These domains have an immunoglobulin-like beta-sandwich fold containing 7 or 8 strands in 2 beta-sheets in a Greek key topology. Although these domains share a similar fold, there is little sequence identity between the alpha/beta-adaptins and gamma-adaptin/GAE.

    More information about these proteins can be found at Protein of the Month: Clathrin.

    Proteins where this domain is known:
    PY05746   


    PF02887 - PK_C (Pfam link)

    Interpro entry IPR015794 : Pyruvate kinase, alpha/beta (Interpro link)

    Interpro description:

    Pyruvate kinase (PK) catalyses the final step in glycolysis, the conversion of phosphoenolpyruvate to pyruvate with concomitant phosphorylation of ADP to ATP:

     ADP + phosphoenolpyruvate = ATP + pyruvate 

    The enzyme, which is found in all living organisms, requires both magnesium and potassium ions for its activity. In vertebrates, there are four tissue-specific isozymes: L (liver), R (red cells), M1 (muscle, heart and brain), and M2 (early foetal tissue). In plants, PK exists as cytoplasmic and plastid isozymes, while most bacteria and lower eukaryotes have one form, except in certain bacteria, such as Escherichia coli, that have two isozymes. All isozymes appear to be tetramers of identical subunits of ~500 residues.

    PK helps control the rate of glycolysis, along with phosphofructokinase and hexokinase. PK possesses allosteric sites for numerous effectors, yet the isozymes respond differently, in keeping with their different tissue distributions. The activity of L-type (liver) PK is increased by fructose-1,6-bisphosphate (F1,6BP) and lowered by ATP and alanine (gluconeogenic precursor), therefore when glucose levels are high, glycolysis is promoted, and when levels are low, gluconeogenesis is promoted. L-type PK is also hormonally regulated, being activated by insulin and inhibited by glucagon, which covalently modifies the PK enzyme. M1-type (muscle, brain) PK is inhibited by ATP, but F1,6BP and alanine have no effect, which correlates with the function of muscle and brain, as opposed to the liver.

    The structure of several pyruvate kinases from various organisms have been determined. The protein comprises three-four domains: a small N-terminal helical domain (absent in bacterial PK), a beta/alpha-barrel domain, a beta-barrel domain (inserted within the beta/alpha-barrel domain), and a 3-layer alpha/beta/alpha sandwich domain.

    This entry represents the 3-layer alpha/beta/alpha sandwich domain.

    Proteins where this domain is known:
    PY04645   

    Proteins where this domain has been detected by our approach:
    PY03879   


    PF02889 - Sec63 (Pfam link)

    Interpro entry IPR004179 : (Interpro link)

    Pfam description:
    This domain (also known as the Brl domain) is required for assembly of functional endoplasmic reticulum translocons.

    Interpro description:

    This domain was named after the yeast Sec63 (or NPL1) (also known as the Brl domain) protein in which it was found. This protein is required for assembly of functional endoplasmic reticulum translocons. Other yeast proteins containing this domain include pre-mRNA splicing helicase BRR2, HFM1 protein and putative helicases.

    Proteins where this domain is known:
    PY00412    PY03272    PY05116   


    PF02891 - zf-MIZ (Pfam link)

    Interpro entry IPR004181 : Zinc finger, MIZ-type (Interpro link)

    Pfam description:
    This domain has SUMO (small ubiquitin-like modifier) ligase activity and is involved in DNA repair and chromosome organisatio.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents MIZ-type zinc finger domains. Miz1 (Msx-interacting-zinc finger) is a zinc finger-containing protein with homology to the yeast protein, Nfi-1. Miz1 is a sequence specific DNA binding protein that can function as a positive-acting transcription factor. Miz1 binds to the homeobox protein Msx2, enhancing the specific DNA-binding ability of Msx2. Other proteins containing this domain include the human pias family (protein inhibitor of activated STAT protein).

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY02724   


    PF02892 - zf-BED (Pfam link)

    Interpro entry IPR003656 : Zinc finger, BED-type predicted (Interpro link)

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents predicted BED-type zinc finger domains. The BED finger which was named after the Drosophila proteins BEAF and DREF, is found in one or more copies in cellular regulatory factors and transposases from plants, animals and fungi. The BED finger is an about 50 to 60 amino acid residues domain that contains a characteristic motif with two highly conserved aromatic positions, as well as a shared pattern of cysteines and histidines that is predicted to form a zinc finger. As diverse BED fingers are able to bind DNA, it has been suggested that DNA-binding is the general function of this domain. Some proteins known to contain a BED domain include animal, plant and fungi AC1 and Hobo-like transposases; Caenorhabditis elegans Dpy-20 protein, a predicted cuticular gene transcriptional regulator; Drosophila BEAF (boundary element-associated factor), thought to be involved in chromatin insulation; Drosophila DREF, a transcriptional regulator for S-phase genes; and tobacco 3AF1 and tomato E4/E8-BP1, light- and ethylene-regulated DNA binding proteins that contain two BED fingers.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain has been detected by our approach:
    PY04088   


    PF02893 - GRAM (Pfam link)

    Interpro entry IPR004182 : (Interpro link)

    Pfam description:
    The GRAM domain is found in in glucosyltransferases, myotubularins and other putative membrane-associated proteins.

    Interpro description:

    The GRAM domain is found in glucosyltransferases, myotubularins and other putative membrane-associated proteins. It is normally about 70 amino acids in length. It is thought to be an intracellular protein-binding or lipid-binding signalling domain, which has an important function in membrane-associated processes. Mutations in the GRAM domain of myotubularins cause a muscle disease, which suggests that the domain is essential for the full function of the enzyme. Myotubularin-related proteins are a large subfamily of protein tyrosine phosphatases (PTPs) that dephosphorylate D3-phosphorylated inositol lipids.

    Proteins where this domain has been detected by our approach:
    PY01158   


    PF02902 - Peptidase_C48 (Pfam link)

    Interpro entry IPR003653 : Peptidase C48, SUMO/Sentrin/Ubl1 (Interpro link)

    Pfam description:
    This domain contains the catalytic triad Cys-His-Asn.

    Interpro description:

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad.

    This group of proteins contain cysteine peptidases belonging to MEROPS peptidase family C48 (Ulp1 endopeptidase family, clan CE). The protein fold of the peptidase domain for members of this family resembles that of adenain, the type example for clan CE. This group of sequences also contains a number of hypothetical proteins, which have not yet been characterised, and non-peptidase homologues. These are proteins that have either been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity of the peptidases in the family.

    The Ulp1 endopeptidase family contain the deubiquitinating enzymes (DUB) that can de-conjugate ubiquitin or ubiquitin-like proteins from ubiquitin-conjugated proteins. They can be classified in 3 families according to sequence homology: Ubiquitin carboxyl-terminal hydrolase (UCH) (see, Ubiquitin-specific processing protease (UBP) (see , and ubiquitin-like protease (ULP) specific for de-conjugating ubiquitin-like proteins. In contrast to the UBP pathway, which is very redundant (16 UBP enzymes in yeast), there are few ubiquitin-like proteases (only one in yeast, Ulp1).

    Ulp1 catalyses two critical functions in the SUMO/Smt3 pathway via its cysteine protease activity. Ulp1 processes the Smt3 C-terminal sequence (-GGATY) to its mature form (-GG), and it de-conjugates Smt3 from the lysine epsilon-amino group of the target protein.

    Crystal structure of yeast Ulp1 bound to Smt3 revealed that the catalytic and interaction interface is situated in a shallow and narrow cleft where conserved residues recognise the Gly-Gly motif at the C-terminal extremity of Smt3 protein. Ulp1 adopts a novel architecture despite some structural similarity with other cysteine protease. The secondary structure is composed of seven alpha helices and seven beta strands. The catalytic domain includes the central alpha helix, beta-strands 4 to 6, and the catalytic triad (Cys-His-Asp). This profile is directed against the C-terminal part of ULP proteins that displays full proteolytic activity.

    Proteins where this domain is known:
    PY02388    PY03464   


    PF02910 - Succ_DH_flav_C (Pfam link)

    Interpro entry IPR004112 : Fumarate reductase/succinate dehydrogenase flavoprotein, C-terminal (Interpro link)

    Pfam description:
    This family contains fumarate reductases, succinate dehydrogenases and L-aspartate oxidases.

    Interpro description:

    In bacteria two distinct, membrane-bound, enzyme complexes are responsible for the interconversion of fumarate and succinate : fumarate reductase (Frd) is used in anaerobic growth, and succinate dehydrogenase (Sdh) is used in aerobic growth. Both complexes consist of two main components: a membrane-extrinsic component composed of a FAD-binding flavoprotein and an iron-sulphur protein; and an hydrophobic component composed of a membrane anchor protein and/or a cytochrome B.

    In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone) is an enzyme composed of two subunits: a FAD flavoprotein and and iron-sulphur protein.

    The flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is covalently bound to a histidine residue which is located in the N-terminal section of the protein. The sequence around that histidine is well conserved in Frd and Sdh from various bacterial and eukaryotic species.

    This family includes members that bind FAD such as the flavoprotein subunits from succinate and fumarate dehydrogenase, aspartate oxidase and the alpha subunit of adenylylsulphate reductase.

    Proteins where this domain is known:
    PY05468   


    PF02911 - Formyl_trans_C (Pfam link)

    Interpro entry IPR005793 : Formyl transferase, C-terminal (Interpro link)

    Interpro description:

    Methionyl-tRNA formyltransferase transfers a formyl group onto the amino terminus of the acyl moiety of the methionyl aminoacyl-tRNA. The formyl group appears to play a dual role in the initiator identity of N-formylmethionyl-tRNA by promoting its recognition by IF2 and by impairing its binding to EFTU-GTP. This family also includes formyltetrahydrofolate dehydrogenases, which produce formate from formyl-tetrahydrofolate. These enzymes contain an N-terminal domain in common with other formyl transferase enzymes. The C-terminal domain has an open beta-barrel fold.

    Proteins where this domain is known:
    PY03768   


    PF02919 - Topoisom_I_N (Pfam link)

    Interpro entry IPR008336 : DNA topoisomerase I, DNA binding, eukaryotic-type (Interpro link)

    Pfam description:
    Topoisomerase I promotes the relaxation of DNA superhelical tension by introducing a transient single-stranded break in duplex DNA and are vital for the processes of replication, transcription, and recombination. This family may be more than one structural domain.

    Interpro description:

    DNA topoisomerases regulate the number of topological links between two DNA strands (i.e. change the number of superhelical turns) by catalysing transient single- or double-strand breaks, crossing the strands through one another, then resealing the breaks. These enzymes have several functions: to remove DNA supercoils during transcription and DNA replication; for strand breakage during recombination; for chromosome condensation; and to disentangle intertwined DNA during mitosis. DNA topoisomerases are divided into two classes: type I enzymes (topoisomerases I, III and V) break single-strand DNA, and type II enzymes (topoisomerases II, IV and VI) break double-strand DNA.

    Type I topoisomerases are ATP-independent enzymes (except for reverse gyrase), and can be subdivided according to their structure and reaction mechanisms: type IA (bacterial and archaeal topoisomerase I, topoisomerase III and reverse gyrase) and type IB (eukaryotic topoisomerase I and topoisomerase V). These enzymes are primarily responsible for relaxing positively and/or negatively supercoiled DNA, except for reverse gyrase, which can introduce positive supercoils into DNA.

    This entry represents the N-terminal DNA-binding domain found in eukaryotic topoisomerase I, which is a type IB enzymes. To cleave the DNA backbone, these enzymes must make a transient phosphotyrosine bond. The N-terminal domain of human topoisomerase I is thought to coordinate the restriction of free strand rotation during the topoisomerisation step of catalysis. A conserved tryptophan residue may be important for the DNA-interaction ability of the N-terminal domain. Human topoisomerase I has been shown to be inhibited by camptothecin (CPT), a plant alkaloid with antitumour activity. A binding mode for the anticancer drug camptothecin has been proposed on the basis of chemical and biochemical information combined with the three-dimensional structures of topoisomerase I-DNA complexes.

    More information about this protein can be found at Protein of the Month: DNA Topoisomerase.

    Proteins where this domain is known:
    PY05226   


    PF02933 - CDC48_2 (Pfam link)

    Interpro entry IPR004201 : Cell division protein 48, CDC48, domain 2 (Interpro link)

    Pfam description:
    This domain has a double psi-beta barrel fold and includes VCP-like ATPase and N-ethylmaleimide sensitive fusion protein N-terminal domains. Both the VAT and NSF N-terminal functional domains consist of two structural domains of which this is at the C-terminus. The VAT-N domain found in AAA ATPases Pfam:PF00004 is a substrate 185-residue recognition domain.

    Interpro description:
    This domain has a double psi-beta barrel fold and includes VCP-like ATPase and N-ethylmaleimide sensitive fusion protein N-terminal domains. Both the VAT and NSF N-terminal functional domains consist of two structural domains of which this is at the C-terminus. The VAT-N domain found in AAA ATPases is a substrate 185-residue recognition domain.

    Proteins where this domain is known:
    PY03639   

    Proteins where this domain has been detected by our approach:
    PY05628    PY05787   


    PF02934 - GatB_N (Pfam link)

    Interpro entry IPR006075 : Glutamyl-tRNA(Gln) amidotransferase, subunit B/E, N-terminal region (Interpro link)

    Interpro description:

    Glutamyl-tRNA(Gln) amidotransferase subunit B is a microbial enzyme that furnishes a means for formation of correctly charged Gln-tRNA(Gln) through the transamidation of misacylated Glu-tRNA(Gln) in organisms which lack glutaminyl-tRNA synthetase. The reaction takes place in the presence of glutamine and ATP through an activated gamma-phospho-Glu-tRNA(Gln). The enzyme is composed of three subunits: A (an amidase), B and C. It also exists in eukaryotes as a protein targeted to the mitochondria.

    Proteins where this domain is known:
    PY03547   


    PF02940 - mRNA_triPase (Pfam link)

    Interpro entry IPR004206 : mRNA capping enzyme, beta subunit (Interpro link)

    Pfam description:
    The beta chain of mRNA capping enzyme has triphosphatase activity. The function of the capping enzyme also depends on the guanylyltransferase activity conferred by the alpha chain (see Pfam:PF01331)

    Interpro description:
    The mRNA capping enzyme in yeasts is composed of two separate subunits, a mRNA guanyltransferase and an RNA 5'-triphosphate. This is the beta subunit of mRNA capping enzyme which has triphosphatase activity. The beta chain (polynucleotide 5'-phosphatase converts the 5'-triphosphate end of a nascent mRNA chain into a diphosphate in the first step of mRNA capping. The function of the capping enzyme also depends on the guanylyltransferase activity conferred by the alpha chain.

    Proteins where this domain is known:
    PY03669   


    PF02953 - zf-Tim10_DDP (Pfam link)

    Interpro entry IPR004217 : Zinc finger, Tim10/DDP-type (Interpro link)

    Pfam description:
    Putative zinc binding domain with four conserved cysteine residues. This domain is found in the human disease protein Swiss:O60220. Members of this family such as Tim9 and Tim10 are involved in mitochondrial protein import. Members of this family seem to be localised to the mitochondrial intermembrane space.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents a putative zinc binding domain with four conserved cysteine residues. This domain is found in the human disease protein Deafness Dystonia Protein 1. Members of this family such as Tim9 and Tim10 are involved in mitochondrial protein import. Members of this family seem to be localised to the mitochondrial intermembrane space.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY04101    PY04662    PY05705    PY06161    PY07259   


    PF02966 - DIM1 (Pfam link)

    Interpro entry IPR004123 : mRNA splicing factor, thioredoxin-like U5 snRNP (Interpro link)

    Interpro description:

    Thioredoxins are small disulphide-containing redox proteins that have been found in all the kingdoms of living organisms. Thioredoxin serves as a general protein disulphide oxidoreductase. It interacts with a broad range of proteins by a redox mechanism based on reversible oxidation of 2 cysteine thiol groups to a disulphide, accompanied by the transfer of 2 electrons and 2 protons. The net result is the covalent interconversion of a disulphide and a dithiol.

    Compared to human thioredoxin, human U5 snRNP-specific protein U5-15kD contains 37 additional residues that may cause structural changes which most likely form putative binding sites for other spliceosomal proteins or RNA. Although U5-15kD apparently lacks protein disulphide isomerase activity, it is strictly required for pre-mRNA splicing.

    Proteins where this domain is known:
    PY07357   

    Proteins where this domain has been detected by our approach:
    PY03715   


    PF02978 - SRP_SPB (Pfam link)

    Interpro entry IPR004125 : Signal recognition particle, SRP54 subunit, M-domain (Interpro link)

    Interpro description:

    The signal recognition particle (SRP) is a multimeric protein, which along with its conjugate receptor (SR), is involved in targeting secretory proteins to the rough endoplasmic reticulum (RER) membrane in eukaryotes, or to the plasma membrane in prokaryotes. SRP recognises the signal sequence of the nascent polypeptide on the ribosome, retards its elongation, and docks the SRP-ribosome-polypeptide complex to the RER membrane via the SR receptor. SRP consists of six polypeptides (SRP9, SRP14, SRP19, SRP54, SRP68 and SRP72) and a single 300 nucleotide 7S RNA molecule. The RNA component catalyses the interaction of SRP with its SR receptor. In higher eukaryotes, the SRP complex consists of the Alu domain and the S domain linked by the SRP RNA. The Alu domain consists of a heterodimer of SRP9 and SRP14 bound to the 5' and 3' terminal sequences of SRP RNA. This domain is necessary for retarding the elongation of the nascent polypeptide chain, which gives SRP time to dock the ribosome-polypeptide complex to the RER membrane.

    This entry represents the M domain of the 54 kDa SRP54 component, a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. SRP54 of the signal recognition particle has a three-domain structure: an N-terminal helical bundle domain, a GTPase domain, and the M-domain that binds the 7s RNA and also binds the signal sequence. The extreme C-terminal region is glycine-rich and lower in complexity and poorly conserved between species.

    These proteins include Escherichia coli and Bacillus subtilis ffh protein (P48), which seems to be the prokaryotic counterpart of SRP54; signal recognition particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which ensures, in conjunction with SRP, the correct targeting of nascent secretory proteins to the endoplasmic reticulum membrane; bacterial FtsY protein, which is believed to play a similar role to that of the docking protein in eukaryotes; the pilA protein from Neisseria gonorrhoeae, the homolog of ftsY; and bacterial flagellar biosynthesis protein flhF.

    Proteins where this domain is known:
    PY06341   


    PF02985 - HEAT (Pfam link)

    Interpro entry IPR000357 : (Interpro link)

    Pfam description:
    The HEAT repeat family is related to armadillo/beta-catenin-like repeats (see Pfam:PF00514).

    Interpro description:

    The HEAT repeat is a tandemly repeated, 37-47 amino acid long module occurring in a number of cytoplasmic proteins, including the four name-giving proteins huntingtin, elongation factor 3 (EF3), the 65 Kd alpha regulatory subunit of protein phosphatase 2A (PP2A) and the yeast PI3-kinase TOR1. Arrays of HEAT repeats consists of 3 to 36 units forming a rod-like helical structure and appear to function as protein-protein interaction surfaces. It has been noted that many HEAT repeat-containing proteins are involved in intracellular transport processes.

    In the crystal structure of PP2A PR65/A, the HEAT repeats consist of pairs of antiparallel alpha helices, as predicted in.

    Proteins where this domain is known:
    PY00155    PY01063    PY01164    PY01941    PY02189    PY02341    PY02706    PY04443    PY04701    PY05481   

    Proteins where this domain has been detected by our approach:
    PY01200    PY01672    PY06655   


    PF02990 - EMP70 (Pfam link)

    Interpro entry IPR004240 : Nonaspanin (TM9SF) (Interpro link)

    Interpro description:
    The transmembrane 9 superfamily protein (TM9SF) may function as a channel or small molecule transporter. Proteins in this group are endosomal integral membrane proteins.

    Proteins where this domain is known:
    PY02254   


    PF02991 - MAP1_LC3 (Pfam link)

    Interpro entry IPR004241 : (Interpro link)

    Pfam description:
    Light chain 3 is proposed to function primarily as a subunit of microtubule associated proteins 1A and 1B and that its expression may regulate microtubule binding activity.

    Interpro description:
    Light chain 3 (LC3) may function primarily as a MAP1A and MAP1B subunit and its expression may regulate the microtubule binding activity of of the neuronal microtubule-associated proteins (MAPs), MAP1A and MAP1B. Related proteins that belong to this group include the human ganglioside expression factor and a symbiosis-related fungal protein.

    Proteins where this domain is known:
    PY00606   


    PF02994 - Transposase_22 (Pfam link)

    Interpro entry IPR004244 : (Interpro link)

    Interpro description:

    Many human L1 elements are capable of retrotransposition. Some of these have been shown to exhibit reverse transcriptase (RT) activity although the function of many are, as yet, unknown.

    More information about these proteins can be found at Protein of the Month: Transposase.

    Proteins where this domain is known:
    PY06984    PY07579   


    PF02996 - Prefoldin (Pfam link)

    Interpro entry IPR004127 : Prefoldin alpha-like (Interpro link)

    Pfam description:
    This family comprises of several prefoldin subunits. The biogenesis of the cytoskeletal proteins actin and tubulin involves interaction of nascent chains of each of the two proteins with the oligomeric protein prefoldin (PFD) and their subsequent transfer to the cytosolic chaperonin CCT (chaperonin containing TCP-1). Electron microscopy shows that eukaryotic PFD, which has a similar structure to its archaeal counterpart, interacts with unfolded actin along the tips of its projecting arms. In its PFD-bound state, actin seems to acquire a conformation similar to that adopted when it is bound to CCT.

    Interpro description:

    Prefoldin (PFD) is a chaperone that interacts exclusively with type II chaperonins, hetero-oligomers lacking an obligate co-chaperonin that are found only in eukaryotes (chaperonin-containing T-complex polypeptide-1 (CCT)) and archaea. Eukaryotic PFD is a multi-subunit complex containing six polypeptides in the molecular mass range of 14Â23 kDa. In archaea, on the other hand, PFD is composed of two types of subunits, two alpha and four beta. The six subunits associate to form two back-to-back up-and-down eight-stranded barrels, from which hang six coiled coils. Each subunit contributes one (beta subunits) or two (alpha subunits) beta hairpin turns to the barrels. The coiled coils are formed by the N and C termini of an individual subunit. Overall, this unique arrangement resembles a jellyfish. The eukaryotic PFD hexamer is composed of six different subunits; however, these can be grouped into two alpha-like (PFD3 and -5) and four beta-like (PFD1, -2, -4, and -6) subunits based on amino acid sequence similarity with their archaeal counterparts. Eukaryotic PFD has a six-legged structure similar to that seen in the archaeal homologue. This family contains the archaeal alpha subunit, eukaryotic prefoldin subunits 3 and 5 and the UXT (ubiquitously expressed transcript) family.

    Eukaryotic PFD has been shown to bind both actin and tubulin co-translationally. The chaperone then delivers the target protein to CCT, interacting with the chaperonin through the tips of the coiled coils. No authentic target proteins of any archaeal PFD have been identified, to date.

    Proteins where this domain is known:
    PY01952   


    PF03009 - GDPD (Pfam link)

    Interpro entry IPR004129 : Glycerophosphoryl diester phosphodiesterase (Interpro link)

    Pfam description:
    E. coli has two sequence related isozymes of glycerophosphoryl diester phosphodiesterase (GDPD) - periplasmic and cytosolic. This family also includes agrocinopine synthase, the similarity to GDPD has been noted. This family appears to have weak but not significant matches to mammalian phospholipase C Pfam:PF00388, which suggests that this family may adopt a TIM barrel fold.

    Interpro description:
    Glycerophosphoryl diester phosphodiesterases display broad specificity for glycerophosphodiesters; glycerophosphocholine, glycerophosphoethanolamine, glycerophosphoglycerol, and bis(glycerophosphoglycerol) all of which are are hydrolysed by this enzyme.

    Proteins where this domain is known:
    PY03220   


    PF03028 - Dynein_heavy (Pfam link)

    Interpro entry IPR004273 : Dynein heavy chain (Interpro link)

    Pfam description:
    This family represents the C-terminal region of dynein heavy chain. The chain also contains ATPase activity and microtubule binding ability and acts as a motor for the movement of organelles and vesicles along microtubules. Dynein is also involved in cilia and flagella movement. The dynein subunit consists of at least two heavy chains and a number of intermediate and light chains (see Pfam:PF01221).

    Interpro description:

    Dynein is a multisubunit microtubule-dependent motor enzyme that acts as the force generating protein of eukaryotic cilia and flagella. The cytoplasmic isoform of dynein acts as a motor for the intracellular retrograde motility of vesicles and organelles along microtubules.

    Dynein is composed of a number of ATP-binding large subunits, intermediate size subunits and small subunits. This family represents the C-terminal region of dynein heavy chain. The dynein heavy chain also exhibits ATPase activity and microtubule binding ability and acts as a motor for the movement of organelles and vesicles along microtubules.

    Proteins where this domain is known:
    PY00078    PY01468    PY01526    PY02768    PY03506    PY04544    PY04739   


    PF03029 - ATP_bind_1 (Pfam link)

    Interpro entry IPR004130 : Protein of unknown function, ATP binding (Interpro link)

    Pfam description:
    Members of this family are found in a range of archaea and eukaryotes and have hypothesised ATP binding activity.

    Interpro description:
    Members of this family are found in a range of archaea and eukaryotes and have hypothesised ATP binding activity.

    Proteins where this domain is known:
    PY00114    PY01480    PY06079   


    PF03030 - H_PPase (Pfam link)

    Interpro entry IPR004131 : Inorganic H+ pyrophosphatase (Interpro link)

    Pfam description:
    The H+ pyrophosphatase is an transmembrane proton pump involved in establishing the H+ electrochemical potential difference between the vacuole lumen and the cell cytosol. Vacuolar-type H(+)-translocating inorganic pyrophosphatases have long been considered to be restricted to plants and to a few species of phototrophic bacteria. However, in recent investigations, these pyrophosphatases have been found in organisms as disparate as thermophilic Archaea and parasitic protists.

    Interpro description:

    Two types of proteins that hydrolyse inorganic pyrophosphate (PPi), very different in both amino acid sequence and structure, have been characterised to date: soluble and membrane-bound proton-pumping pyrophosphatases (sPPases and H(+)-PPases, respectively). sPPases are ubiquitous proteins that hydrolyse PPi to release heat, whereas H+-PPases, so far unidentified in animal and fungal cells, couple the energy of PPi hydrolysis to proton movement across biological membranes. The latter type is represented by this group of proteins. H+-PPases are also called vacuolar-type inorganic pyrophosphatases (V-PPase) or pyrophosphate-energised vacuolar membrane proton pumps. In plants, vacuoles contain two enzymes for acidifying the interior of the vacuole, the V-ATPase and the V-PPase (V is for vacuolar).

    Two distinct biochemical subclasses of H+-PPases have been characterised to date: K+-stimulated and K+-insensitive.

    For additional information please see.

    Proteins where this domain is known:
    PY00513    PY06858   


    PF03031 - NIF (Pfam link)

    Interpro entry IPR004274 : (Interpro link)

    Pfam description:
    This family contains a number of NLI interacting factor isoforms (eg. Swiss:Q9PTJ8) and also an N-terminal regions of RNA polymerase II CTC phosphatase (Swiss:Q9Y5BO) and FCP1 serine phosphatase (Swiss:Q9PT70). This region has been identified as the minimal phosphatase domain.

    Interpro description:
    The function of this domain is unclear. It is found in proteins of diverse function including phosphatases some of which may be active in active in ternary elongation complexes and a number of NLI interacting factors. In the phospatases this domain is often present N-terminal to the BRCT domain.

    Proteins where this domain is known:
    PY00226    PY00230    PY02099    PY03480    PY04698   

    Proteins where this domain has been detected by our approach:
    PY04958   


    PF03034 - PSS (Pfam link)

    Interpro entry IPR004277 : Phosphatidyl serine synthase (Interpro link)

    Pfam description:
    Phosphatidyl serine synthase is also known as serine exchange enzyme. This family represents eukaryotic PSS I and II which are membrane bound proteins which catalyses the replacement of the head group of a phospholipid (phosphotidylcholine or phosphotidylethanolamine) by L-serine.

    Interpro description:
    Phosphatidyl serine synthase is also known as serine exchange enzyme. This family represents eukaryotic PSS I and II, membrane bound proteins that catalyse the replacement of the head group of a phospholipid (phosphotidylcholine or phosphotidylethanolamine) by L-serine.

    Proteins where this domain is known:
    PY06104   


    PF03054 - tRNA_Me_trans (Pfam link)

    Interpro entry IPR018318 : (Interpro link)

    Pfam description:
    This family represents tRNA(5-methylaminomethyl-2-thiouridine)-methyltransferase which is involved in the biosynthesis of the modified nucleoside 5-methylaminomethyl-2-thiouridine present in the wobble position of some tRNAs.

    Interpro description:
    tRNA (5-methylaminomethyl-2-thiouridylate)-methyltransferase catalyses the addition of 5-methylaminomethyl-2-thiouridylate to tRNAs using S-adenosyl-L-methionine as a substrate and releasing S-adenosyl-L-homocysteine. The enzyme is cytoplasmic and is involved in tRNA processing.

    Proteins where this domain is known:
    PY00607    PY03479   


    PF03062 - MBOAT (Pfam link)

    Interpro entry IPR004299 : (Interpro link)

    Pfam description:
    The MBOAT (membrane bound O-acyl transferase) family of membrane proteins contains a variety of acyltransferase enzymes. A conserved histidine has been suggested to be the active site residue.

    Interpro description:
    The MBOAT (membrane bound O-acyl transferase) family of membrane proteins contains a variety of acyltransferase enzymes. A conserved histidine has been suggested to be the active site residue.

    Proteins where this domain is known:
    PY01256   


    PF03074 - GCS (Pfam link)

    Interpro entry IPR004308 : Glutamate-cysteine ligase catalytic subunit (Interpro link)

    Pfam description:
    This family represents the catalytic subunit of glutamate-cysteine ligase (E.C. 6.3.2.2), also known as gamma-glutamylcysteine synthetase (GCS). This enzyme catalyses the rate limiting step in the biosynthesis of glutathione. The eukaryotic enzyme is a dimer of a heavy chain and a light chain with all the catalytic activity exhibited by the heavy chain (this family).

    Interpro description:
    This family represents the catalytic subunit of glutamate-cysteine ligase, also known as gamma-glutamylcysteine synthetase (GCS). This enzyme catalyses the rate limiting step in the biosynthesis of glutathione. The eukaryotic enzyme is a dimer of a heavy chain and a light chain with all the catalytic activity exhibited by the heavy chain.

    Proteins where this domain is known:
    PY01606   


    PF03083 - MtN3_slv (Pfam link)

    Interpro entry IPR018169 : (Interpro link)

    Pfam description:
    This family includes proteins such as drosophila saliva, MtN3 involved in root nodule development and a protein involved in activation and expression of recombination activation genes (RAGs). Although the molecular function of these proteins is unknown, they are almost certainly transmembrane proteins. This family contains a region of two transmembrane helices that is found in two copies in most members of the family.

    Interpro description:
    This family includes proteins such as Drosophila saliva, MtN3 involved in root nodule development and a protein involved in activation and expression of recombination activation genes (RAGs). Although the molecular function of these proteins is unknown, they are almost certainly transmembrane proteins. This family contains a region of two transmembrane helices that is found in two copies in most members of the family.

    Proteins where this domain is known:
    PY01249   


    PF03092 - BT1 (Pfam link)

    Interpro entry IPR004324 : (Interpro link)

    Pfam description:
    Members of this family are transmembrane proteins. Several are Leishmania putative proteins that are thought to be pteridine transporters. One such protein Swiss:Q25272, previously termed (and is still annotated as) ORFG, was shown to encode a biopterin transport protein using null mutants, thus being subsequently renamed BT1. The significant similarity of ORFG/BT1 to Trypanosoma brucei ESAG10 (a putative transmembrane protein and another member of this family) was previously noted. This family also contains five putative Arabidopsis thaliana proteins of unknown function. In addition, it also contains two predicted prokaryotic proteins (from the cyanobacteria Synechocystis and Synechococcus).

    Interpro description:
    Members of this family are transmembrane proteins. Several are Leishmania putative proteins that are thought to be pteridine transporters. This family also contains five putative Arabidopsis thaliana proteins of unknown function as well as two predicted prokaryotic proteins (from the cyanobacteria Synechocystis and Synechococcus).

    Proteins where this domain is known:
    PY02152    PY07338   


    PF03095 - PTPA (Pfam link)

    Interpro entry IPR004327 : Phosphotyrosyl phosphatase activator, PTPA (Interpro link)

    Pfam description:
    Phosphotyrosyl phosphatase activator (PTPA) proteins stimulate the phosphotyrosyl phosphatase (PTPase) activity of the dimeric form of protein phosphatase 2A (PP2A). PTPase activity in PP2A (in vitro) is relatively low when compared to the better recognised phosphoserine/ threonine protein phosphorylase activity. The specific biological role of PTPA is unknown, Basal expression of PTPA depends on the activity of a ubiquitous transcription factor, Yin Yang 1 (YY1). The tumour suppressor protein p53 can inhibit PTPA expression through an unknown mechanism that negatively controls YY1.

    Interpro description:
    Phosphotyrosyl phosphatase activator (PTPA) proteins stimulate the phosphotyrosyl phosphatase (PTPase) activity of the dimeric form of protein phosphatase 2A (PP2A). PTPase activity in PP2A (in vitro) is relatively low when compared to the better recognized phosphoserine/ threonine protein phosphorylase activity. The specific biological role of PTPA is unknown, Basal expression of PTPA depends on the activity of a ubiquitous transcription factor, Yin Yang 1 (YY1). The tumour suppressor protein p53 can inhibit PTPA expression through an unknown mechanism that negatively controls YY1.

    Proteins where this domain is known:
    PY01269   


    PF03099 - BPL_LipA_LipB (Pfam link)

    Interpro entry IPR004143 : Biotin/lipoate A/B protein ligase (Interpro link)

    Pfam description:
    This family includes biotin protein ligase, lipoate-protein ligase A and B. Biotin is covalently attached at the active site of certain enzymes that transfer carbon dioxide from bicarbonate to organic acids to form cellular metabolites. Biotin protein ligase (BPL) is the enzyme responsible for attaching biotin to a specific lysine at the active site of biotin enzymes. Each organism probably has only one BPL. Biotin attachment is a two step reaction that results in the formation of an amide linkage between the carboxyl group of biotin and the epsilon-amino group of the modified lysine. Lipoate-protein ligase A (LPLA) catalyses the formation of an amide linkage between lipoic acid and a specific lysine residue in lipoate dependent enzymes.

    Interpro description:
    This domain is found in biotin protein ligase, lipoate-protein ligase A and B. Biotin is covalently attached at the active site of certain enzymes that transfer carbon dioxide from bicarbonate to organic acids to form cellular metabolites. Biotin protein ligase (BPL) is the enzyme responsible for attaching biotin to a specific lysine at the active site of biotin enzymes. Each organism probably has only one BPL. Biotin attachment is a two step reaction that results in the formation of an amide linkage between the carboxyl group of biotin and the epsilon-amino group of the modified lysine. Lipoate-protein ligase A (LPLA) catalyses the formation of an amide linkage between lipoic acid and a specific lysine residue in lipoate dependent enzymes.

    Proteins where this domain is known:
    PY00475    PY01917    PY02395    PY04501    PY06060   


    PF03104 - DNA_pol_B_exo (Pfam link)

    Interpro entry IPR006133 : DNA-directed DNA polymerase, family B, exonuclease (Interpro link)

    Pfam description:
    This domain has 3\' to 5\' exonuclease activity and adopts a ribonuclease H type fold.

    Interpro description:

    DNA is the biological information that instructs cells how to exist in an ordered fashion: accurate replication is thus one of the most important events in the life cycle of a cell. This function is performed by DNA- directed DNA-polymerases by adding nucleotide triphosphate (dNTP) residues to the 5'-end of the growing chain of DNA, using a complementary DNA chain as a template. Small RNA molecules are generally used as primers for chain elongation, although terminal proteins may also be used for the de novo synthesis of a DNA chain. Even though there are 2 different methods of priming, these are mediated by 2 very similar polymerases classes, A and B, with similar methods of chain elongation. A number of DNA polymerases have been grouped under the designation of DNA polymerase family B. Six regions of similarity (numbered from I to VI) are found in all or a subset of the B family polymerases. The most conserved region (I) includes a conserved tetrapeptide with two aspartate residues. Its function is not yet known. However, it has been suggested that it may be involved in binding a magnesium ion. All sequences in the B family contain a characteristic DTDS motif, and possess many functional domains, including a 5'-3' elongation domain, a 3'-5' exonuclease domain, a DNA binding domain, and binding domains for both dNTP's and pyrophosphate.

    This domain has 3' to 5' exonuclease activity and adopts a ribonuclease H type fold.

    Proteins where this domain is known:
    PY00203    PY05353   


    PF03105 - SPX (Pfam link)

    Interpro entry IPR004331 : (Interpro link)

    Pfam description:
    We have named this region the SPX domain after (SYG1, Pho81 and XPR1). This 180 residue length domain is found at the amino terminus of a variety of proteins. In the yeast protein SYG1, the N-terminus directly binds to the G- protein beta subunit and inhibits transduction of the mating pheromone signal. This finding suggests that all the members of this family are involved in G-protein associated signal transduction. The N-termini of several proteins involved in the regulation of phosphate transport, including the putative phosphate level sensors PHO81 Swiss:P17442 from Saccharomyces cerevisiae and NUC-2 Swiss:Q01317 from Neurospora crassa, are also members of this family . NUC-2 contains several ankyrin repeats Pfam:PF00023. Several members of this family are annotated as XPR1 proteins: the xenotropic and polytropic retrovirus receptor confers susceptibility to infection with murine leukaemia viruses (MLV). The similarity between SYG1, phosphate regulators and XPR1 sequences has been previously noted, as has the additional similarity to several predicted proteins, of unknown function, from Drosophila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans, Schizosaccharomyces pombe, and Saccharomyces cerevisiae. In addition, given the similarities between XPR1 and SYG1 and phosphate regulatory proteins, it has been proposed that XPR1 might be involved in G-protein associated signal transduction and may itself function as a phosphate sensor.

    Interpro description:

    The SPX domain is named after SYG1/Pho81/XPR1 proteins. This 180 residue length domain is found at the amino terminus of a variety of proteins. In the yeast protein SYG1, the N-terminus directly binds to the G- protein beta subunit and inhibits transduction of the mating pheromone signal suggesting that all the members of this family are involved in G-protein associated signal transduction. The C-terminal of these proteins often have an EXS domain.

    The N-termini of several proteins involved in the regulation of phosphate transport, including the putative phosphate level sensors PHO81 from Saccharomyces cerevisiae and NUC-2 from Neurospora crassa, are also members of this family. NUC-2 contains several ankyrin repeats.

    Several members of this family are the XPR1 proteins: the xenotropic and polytropic retrovirus receptor confers susceptibility to infection with Murine leukemia virus (MLV). The similarity between SYG1, phosphate regulators and XPR1 sequences has been previously noted, as has the additional similarity to several predicted proteins, of unknown function, from Drosophila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans, Schizosaccharomyces pombe, and Saccharomyces cerevisiae. In addition, given the similarities between XPR1 and SYG1 and phosphate regulatory proteins, it has been proposed that XPR1 might be involved in G-protein associated signal transduction and may itself function as a phosphate sensor.

    Proteins where this domain is known:
    PY06375   


    PF03107 - C1_2 (Pfam link)

    Interpro entry IPR004146 : (Interpro link)

    Pfam description:
    This short domain is rich in cysteines and histidines. The pattern of conservation is similar to that found in Pfam:PF00130, therefore we have termed this domain DC1 for divergent C1 domain. This domain probably also binds to two zinc ions. The function of proteins with this domain is uncertain, however this domain may bind to molecules such as diacylglycerol (A Bateman pers. obs.). This family are found in plant proteins.

    Interpro description:
    This short domain is rich in cysteines and histidines. The pattern of conservation is similar to that found in DAG_PE-bind, therefore we have termed this domain DC1 for divergent C1 domain. This domain probably also binds to two zinc ions. The function of proteins with this domain is uncertain, however this domain may bind to molecules such as diacylglycerol. This family are found in plant proteins.

    Proteins where this domain has been detected by our approach:
    PY05680   


    PF03109 - ABC1 (Pfam link)

    Interpro entry IPR004147 : (Interpro link)

    Pfam description:
    This family includes ABC1 from yeast and AarF from E. coli. These proteins have a nuclear or mitochondrial subcellular location in eukaryotes. The exact molecular functions of these proteins is not clear, however yeast ABC1 suppresses a cytochrome b mRNA translation defect and is essential for the electron transfer in the bc 1 complex and E. coli AarF is required for ubiquinone production. It has been suggested that members of the ABC1 family are novel chaperonins. These proteins are unrelated to the ABC transporter proteins.

    Interpro description:

    This entry includes ABC1 from yeast and AarF from Escherichia coli. These proteins have a nuclear or mitochondrial subcellular location in eukaryotes. The exact molecular functions of these proteins is not clear, however yeast ABC1 suppresses a cytochrome b mRNA translation defect and is essential for the electron transfer in the bc 1 complex and E. coli AarF is required for ubiquinone production. It has been suggested that members of the ABC1 family are novel chaperonins. These proteins are unrelated to the ABC transporter proteins.

    Proteins where this domain is known:
    PY01655    PY04349   


    PF03129 - HGTP_anticodon (Pfam link)

    Interpro entry IPR004154 : Anticodon-binding (Interpro link)

    Pfam description:
    This domain is found in histidyl, glycyl, threonyl and prolyl tRNA synthetases it is probably the anticodon binding domain.

    Interpro description:
    tRNA synthetases, or tRNA ligases are involved in protein synthesis. This domain is found in histidyl, glycyl, threonyl and prolyl tRNA synthetases it is probably the anticodon binding domain.

    Proteins where this domain is known:
    PY01198    PY02018    PY03706    PY06957   

    Proteins where this domain has been detected by our approach:
    PY00927   


    PF03130 - HEAT_PBS (Pfam link)

    Interpro entry IPR004155 : (Interpro link)

    Pfam description:
    This family contains a short bi-helical repeat that is related to Pfam:PF02985. Cyanobacteria and red algae harvest light energy using macromolecular complexes known as phycobilisomes (PBS), peripherally attached to the photosynthetic membrane. The major components of PBS are the phycobiliproteins. These heterodimeric proteins are covalently attached to phycobilins: open-chain tetrapyrrole chromophores, which function as the photosynthetic light-harvesting pigments. Phycobiliproteins differ in sequence and in the nature and number of attached phycobilins to each of their subunits. This family includes the lyase enzymes that specifically attach particular phycobilins to apophycobiliprotein subunits. The most comprehensively studied of these is the CpcE/F lyase Swiss:P31967 Swiss:P31968, which attaches phycocyanobilin (PCB) to the alpha subunit of apophycocyanin. Similarly, MpeU/V attaches phycoerythrobilin to phycoerythrin II, while CpeY/Z is thought to be involved in phycoerythrobilin (PEB) attachment to phycoerythrin (PE) I (PEs I and II differ in sequence and in the number of attached molecules of PEB: PE I has five, PE II has six). All the reactions of the above lyases involve an apoprotein cysteine SH addition to a terminal delta 3,3\'-double bond. Such a reaction is not possible in the case of phycoviolobilin (PVB), the phycobilin of alpha-phycoerythrocyanin (alpha-PEC). It is thought that in this case, PCB, not PVB, is first added to apo-alpha-PEC, and is then isomerised to PVB. The addition reaction has been shown to occur in the presence of either of the components of alpha-PEC-PVB lyase PecE or PecF (or both). The isomerisation reaction occurs only when both PecE and PecF components are present, i.e. the PecE/F phycobiliprotein lyase is also a phycobilin isomerase. Another member of this family is the NblB protein Swiss:Q9Z3G5, whose similarity to the phycobiliprotein lyases was previously noted. This constitutively expressed protein is not known to have any lyase activity. It is thought to be involved in the coordination of PBS degradation with environmental nutrient limitation. It has been suggested that the similarity of NblB to the phycobiliprotein lyases is due to the ability to bind tetrapyrrole phycobilins via the common repeated motif.

    Interpro description:

    These proteins contain a short bi-helical repeat that is related to HEAT. Cyanobacteria and red algae harvest light energy using macromolecular complexes known as phycobilisomes (PBS), peripherally attached to the photosynthetic membrane. The major components of PBS are the phycobiliproteins. These heterodimeric proteins are covalently attached to phycobilins: open-chain tetrapyrrole chromophores, which function as the photosynthetic light-harvesting pigments. Phycobiliproteins differ in sequence and in the nature and number of attached phycobilins to each of their subunits. These proteins include the lyase enzymes that specifically attach particular phycobilins to apophycobiliprotein subunits. The most comprehensively studied of these is the CpcE/Flyasewhich attaches phycocyanobilin (PCB) to the alpha subunit of apophycocyanin. Similarly, MpeU/V attaches phycoerythrobilin to phycoerythrin II, while CpeY/Z is thought to be involved in phycoerythrobilin (PEB) attachment to phycoerythrin (PE) I (PEs I and II differ in sequence and in the number of attached molecules of PEB: PE I has five, PE II has six).

    All the reactions of the above lyases involve an apoprotein cysteine SH addition to a terminal delta 3,3'-double bond. Such a reaction is not possible in the case of phycoviolobilin (PVB), the phycobilin of alpha-phycoerythrocyanin (alpha-PEC). It is thought that in this case, PCB, not PVB, is first added to apo-alpha-PEC, and is then isomerized to PVB. The addition reaction has been shown to occur in the presence of either of the components of alpha-PEC-PVB lyase PecE or PecF (or both). The isomerisation reaction occurs only when both PecE and PecF components are present, i.e. the PecE/F phycobiliprotein lyase is also a phycobilin isomerase. Another member of this family is the NblB protein, whose similarity to the phycobiliprotein lyases was previously noted. This constitutively expressed protein is not known to have any lyase activity. It is thought to be involved in the coordination of PBS degradation with environmental nutrient limitation. It has been suggested that the similarity of NblB to the phycobiliprotein lyases is due to the ability to bind tetrapyrrole phycobilins via the common repeated motif.

    Proteins where this domain is known:
    PY00741   


    PF03133 - TTL (Pfam link)

    Interpro entry IPR004344 : Tubulin-tyrosine ligase (Interpro link)

    Pfam description:
    Tubulins and microtubules are subjected to several post-translational modifications of which the reversible detyrosination/tyrosination of the carboxy-terminal end of most alpha-tubulins has been extensively analysed. This modification cycle involves a specific carboxypeptidase and the activity of the tubulin-tyrosine ligase (TTL). The true physiological function of TTL has so far not been established. Tubulin-tyrosine ligase (TTL) catalyses the ATP-dependent post-translational addition of a tyrosine to the carboxy terminal end of detyrosinated alpha-tubulin. In normally cycling cells, the tyrosinated form of tubulin predominates. However, in breast cancer cells, the detyrosinated form frequently predominates, with a correlation to tumour aggressiveness. On the other hand, 3-nitrotyrosine has been shown to be incorporated, by TTL, into the carboxy terminal end of detyrosinated alpha-tubulin. This reaction is not reversible by the carboxypeptidase enzyme. Cells cultured in 3-nitrotyrosine rich medium showed evidence of altered microtubule structure and function, including altered cell morphology, epithelial barrier dysfunction, and apoptosis.

    Interpro description:

    Tubulins and microtubules are subjected to several post-translational modifications of which the reversible detyrosination/tyrosination of the carboxy-terminal end of most alpha-tubulins has been extensively analysed. This modification cycle involves a specific carboxypeptidase and the activity of the tubulin-tyrosine ligase (TTL). Tubulin-tyrosine ligase (TTL) catalyses the ATP-dependent post-translational addition of a tyrosine to the carboxy terminal end of detyrosinated alpha-tubulin. The true physiological function of TTL has so far not been established. In normally cycling cells, the tyrosinated form of tubulin predominates. However, in breast cancer cells, the detyrosinated form frequently predominates, with a correlation to tumour aggressiveness.

    3-nitrotyrosine has been shown to be incorporated, by TTL, into the carboxy terminal end of detyrosinated alpha-tubulin. This reaction is not reversible by the carboxypeptidase enzyme. Cells cultured in 3-nitrotyrosine rich medium showed evidence of altered microtubule structure and function, including altered cell morphology, epithelial barrier dysfunction, and apoptosis.

    Proteins where this domain is known:
    PY02412    PY02532    PY02818   


    PF03143 - GTP_EFTU_D3 (Pfam link)

    Interpro entry IPR004160 : Translation elongation factor EFTu/EF1A, C-terminal (Interpro link)

    Pfam description:
    Elongation factor Tu consists of three structural domains, this is the third domain. This domain adopts a beta barrel structure. This the third domain is involved in binding to both charged tRNA and binding to EF-Ts Pfam:PF00889.

    Interpro description:

    Translation elongation factors are responsible for two main processes during protein synthesis on the ribosome. EF1A (or EF-Tu) is responsible for the selection and binding of the cognate aminoacyl-tRNA to the A-site (acceptor site) of the ribosome. EF2 (or EF-G) is responsible for the translocation of the peptidyl-tRNA from the A-site to the P-site (peptidyl-tRNA site) of the ribosome, thereby freeing the A-site for the next aminoacyl-tRNA to bind. Elongation factors are responsible for achieving accuracy of translation and both EF1A and EF2 are remarkably conserved throughout evolution.

    EF1A (also known as EF-1alpha or EF-Tu) is a G-protein. It forms a ternary complex of EF1A-GTP-aminoacyltRNA. The binding of aminoacyl-tRNA stimulates GTP hydrolysis by EF1A, causing a conformational change in EF1A that causes EF1A-GDP to detach from the ribosome, leaving the aminoacyl-tRNA attached at the A-site. Only the cognate aminoacyl-tRNA can induce the required conformational change in EF1A through its tight anticodon-codon binding. EF1A-GDP is returned to its active state, EF1A-GTP, through the action of another elongation factor, EF1B (also known as EF-Ts or EF-1beta/gamma/delta).

    EF1A consists of three structural domains. This entry represents the C-terminal domain, which adopts a beta-barrel structure, and is involved in binding to both charged tRNA and to EF1B (or EF-Ts).

    More information about these proteins can be found at Protein of the Month: Elongation Factors.

    Proteins where this domain is known:
    PY00361    PY00362    PY02338    PY04385    PY05361    PY06134   


    PF03144 - GTP_EFTU_D2 (Pfam link)

    Interpro entry IPR004161 : Translation elongation factor EFTu/EF1A, domain 2 (Interpro link)

    Pfam description:
    Elongation factor Tu consists of three structural domains, this is the second domain. This domain adopts a beta barrel structure. This the second domain is involved in binding to charged tRNA. This domain is also found in other proteins such as elongation factor G and translation initiation factor IF-2. This domain is structurally related to Pfam:PF03143, and in fact has weak sequence matches to this domain.

    Interpro description:

    Translation elongation factors are responsible for two main processes during protein synthesis on the ribosome. EF1A (or EF-Tu) is responsible for the selection and binding of the cognate aminoacyl-tRNA to the A-site (acceptor site) of the ribosome. EF2 (or EF-G) is responsible for the translocation of the peptidyl-tRNA from the A-site to the P-site (peptidyl-tRNA site) of the ribosome, thereby freeing the A-site for the next aminoacyl-tRNA to bind. Elongation factors are responsible for achieving accuracy of translation and both EF1A and EF2 are remarkably conserved throughout evolution.

    EF1A (also known as EF-1alpha or EF-Tu) is a G-protein. It forms a ternary complex of EF1A-GTP-aminoacyltRNA. The binding of aminoacyl-tRNA stimulates GTP hydrolysis by EF1A, causing a conformational change in EF1A that causes EF1A-GDP to detach from the ribosome, leaving the aminoacyl-tRNA attached at the A-site. Only the cognate aminoacyl-tRNA can induce the required conformational change in EF1A through its tight anticodon-codon binding. EF1A-GDP is returned to its active state, EF1A-GTP, through the action of another elongation factor, EF1B (also known as EF-Ts or EF-1beta/gamma/delta).

    EF1A consists of three structural domains. This entry represents domain 2 of EF2, which adopts a beta-barrel structure, and is involved in binding to both charged tRNA. This domain is structurally related to the C-terminal domain of EF2, to which it displays weak sequence matches. This domain is also found in other proteins such as translation initiation factor IF-2 and tetracycline-resistance proteins.

    More information about these proteins can be found at Protein of the Month: Elongation Factors.

    Proteins where this domain is known:
    PY00361    PY00362    PY00420    PY01864    PY02338    PY02627    PY02880    PY03426    PY04028    PY04385    PY04706    PY05356    PY05361    PY05417    PY06134    PY06191   

    Proteins where this domain has been detected by our approach:
    PY00960    PY02337    PY03311    PY05837   


    PF03147 - FDX-ACB (Pfam link)

    Interpro entry IPR005121 : Phenylalanyl-tRNA synthetase, beta subunit, ferrodoxin-fold anticodon-binding (Interpro link)

    Pfam description:
    This is the anticodon binding domain found in some phenylalanyl tRNA synthetases. The domain has a ferredoxin fold.

    Interpro description:

    This is the anticodon binding domain found in some phenylalanyl tRNA synthetases. The domain has a ferredoxin fold, consisting of an alpha+beta sandwich with anti-parallel beta-sheets (beta-alpha-beta x2).

    Proteins where this domain is known:
    PY04422   

    Proteins where this domain has been detected by our approach:
    PY02857    PY04756   


    PF03151 - TPT (Pfam link)

    Interpro entry IPR004853 : (Interpro link)

    Pfam description:
    This family includes transporters with a specificity for triose phosphate.

    Interpro description:
    This family consists entirely of aligned regions from Drosophila melanogaster proteins.contains three repeats of this region. In other proteins, the aligned region is located towards the C-terminus. The function of the aligned region is unknown.

    Proteins where this domain is known:
    PY00389    PY01812   


    PF03152 - UFD1 (Pfam link)

    Interpro entry IPR004854 : Ubiquitin fusion degradation protein UFD1 (Interpro link)

    Pfam description:
    Post-translational ubiquitin-protein conjugates are recognised for degradation by the ubiquitin fusion degradation (UFD) pathway. Several proteins involved in this pathway have been identified. This family includes UFD1, a 40kD protein that is essential for vegetative cell viability. The human UFD1 gene is expressed at high levels during embryogenesis, especially in the eyes and in the inner ear primordia and is thought to be important in the determination of ectoderm-derived structures, including neural crest cells. In addition, this gene is deleted in the CATCH-22 (cardiac defects, abnormal facies, thymic hypoplasia, cleft palate and hypocalcaemia with deletions on chromosome 22) syndrome. This clinical syndrome is associated with a variety of developmental defects, all characterised by microdeletions on 22q11.2. Two such developmental defects are the DiGeorge syndrome OMIM:188400, and the velo-cardio- facial syndrome OMIM:145410. Several of the abnormalities associated with these conditions are thought to be due to defective neural crest cell differentiation.

    Interpro description:
    Post-translational ubiquitin-protein conjugates are recognized for degradation by the ubiquitin fusion degradation (UFD) pathway. Several proteins involved in this pathway have been identified. This family includes UFD1, a 40kD protein that is essential for vegetative cell viability. The human UFD1 gene is expressed at high levels during embryogenesis, especially in the eyes and in the inner ear primordia and is thought to be important in the determination of ectoderm-derived structures, including neural crest cells. In addition, this gene is deleted in the CATCH-22 (cardiac defects, abnormal facies, thymic hypoplasia, cleft palate and hypocalcaemia with deletions on chromosome 22) syndrome. This clinical syndrome is associated with a variety of developmental defects, all characterised by microdeletions on 22q11.2. Two such developmental defects are the DiGeorge syndrome OMIM:188400, and the velo-cardio- facial syndrome OMIM:145410. Several of the abnormalities associated with these conditions are thought to be due to defective neural crest cell differentiation.

    Proteins where this domain is known:
    PY01640    PY01641    PY04576    PY07395   


    PF03159 - XRN_N (Pfam link)

    Interpro entry IPR004859 : Putative 5-3 exonuclease (Interpro link)

    Pfam description:
    This family aligns residues towards the N-terminus of several proteins with multiple functions. The members of this family all appear to possess 5\'-3\' exonuclease activity EC:3.1.11.-. Thus, the aligned region may be necessary for 5\'->3\' exonuclease function. The family also contains several Xrn1 and Xrn2 proteins. The 5\'-3\' exoribonucleases Xrn1p and Xrn2p/Rat1p function in the degradation and processing of several classes of RNA in Saccharomyces cerevisiae. Xrn1p is the main enzyme catalysing cytoplasmic mRNA degradation in multiple decay pathways, whereas Xrn2p/Rat1p functions in the processing of rRNAs and small nucleolar RNAs (snoRNAs) in the nucleus.

    Interpro description:
    Signatures of this entry align residues towards the N-terminus of several proteins with multiple functions. The members of this family all appear to possess 5'-3' exonuclease activity Thus, the aligned region may be necessary for 5'-3' exonuclease function.

    Proteins where this domain is known:
    PY03131    PY03368   


    PF03164 - Mon1 (Pfam link)

    Interpro entry IPR004353 : (Interpro link)

    Pfam description:
    Members of this family have been called SAND proteins although these proteins do not contain a SAND domain. In Saccharomyces cerevisiae a protein complex of Mon1 and Ccz1 functions with the small GTPase Ypt7 to mediate vesicle trafficking to the vacuole. The Mon1/Ccz1 complex is conserved in eukaryotic evolution and members of this family (previously known as DUF254) are distant homologues to domains of known structure that assemble into cargo vesicle adapter (AP) complexes. describes orthologues in Fugu rubripes.

    Interpro description:

    Members of this family have been called SAND proteins although these proteins do not contain a SAND domain. In Saccharomyces cerevisiae a protein complex of Mon1 and Ccz1 functions with the small GTPase Ypt7 to mediate vesicle trafficking to the vacuole. The Mon1/Ccz1 complex is conserved in eukaryotic evolution and members of this family (previously known as DUF254) are distant homologues to domains of known structure that assemble into cargo vesicle adapter (AP) complexes.

    Proteins where this domain is known:
    PY02732   


    PF03167 - UDG (Pfam link)

    Interpro entry IPR005122 : (Interpro link)

    Interpro description:

    This entry represents various uracil-DNA glycosylases and related DNA glycosylases, such as uracil-DNA glycosylase, thermophilic uracil-DNA glycosylase, G:T/U mismatch-specific DNA glycosylase (Mug), and single-strand selective monofunctional uracil-DNA glycosylase (SMUG1). These proteins have a 3-layer alpha/beta/alpha structure. Uracil-DNA glycosylases are DNA repair enzymes that excise uracil residues from DNA by cleaving the N-glycosylic bond, initiating the base excision repair pathway. Uracil in DNA can arise either through the deamination of cytosine to form mutagenic U:G mispairs, or through the incorporation of dUMP by DNA polymerase to form U:A pairs. These aberrant uracil residues are genotoxic. The sequence of uracil-DNA glycosylase is extremely well conserved in bacteria and eukaryotes as well as in herpes viruses. More distantly related uracil-DNA glycosylases are also found in poxviruses. In eukaryotic cells, UNG activity is found in both the nucleus and the mitochondria. Human UNG1 protein is transported to both the mitochondria and the nucleus. The N-terminal 77 amino acids of UNG1 seem to be required for mitochondrial localization, but the presence of a mitochondrial transit peptide has not been directly demonstrated. The most N-terminal conserved region contains an aspartic acid residue which has been proposed, based on X-ray structures to act as a general base in the catalytic mechanism.

    Proteins where this domain is known:
    PY04457   


    PF03178 - CPSF_A (Pfam link)

    Interpro entry IPR004871 : Cleavage and polyadenylation specificity factor, A subunit, C-terminal (Interpro link)

    Pfam description:
    This family includes a region that lies towards the C-terminus of the cleavage and polyadenylation specificity factor (CPSF) A (160 kDa) subunit. CPSF is involved in mRNA polyadenylation and binds the AAUAAA conserved sequence in pre-mRNA. CPSF has also been found to be necessary for splicing of single-intron pre-mRNAs. The function of the aligned region is unknown but may be involved in RNA/DNA binding.

    Interpro description:
    This family includes a region that lies towards the C-terminus of the cleavage and polyadenylation specificity factor (CPSF) A (160 kDa) subunit. CPSF is involved in mRNA polyadenylation and binds the AAUAAA conserved sequence in pre-mRNA. CPSF has also been found to be necessary for splicing of single-intron pre-mRNAs. The function of the aligned region is unknown but may be involved in RNA/DNA binding.

    Proteins where this domain is known:
    PY02648    PY03514    PY04624   


    PF03179 - V-ATPase_G (Pfam link)

    Interpro entry IPR005124 : Vacuolar (H+)-ATPase G subunit (Interpro link)

    Pfam description:
    This family represents the eukaryotic vacuolar (H+)-ATPase (V-ATPase) G subunit. V-ATPases generate an acidic environment in several intracellular compartments. Correspondingly, they are found as membrane-attached proteins in several organelles. They are also found in the plasma membranes of some specialised cells. V-ATPases consist of peripheral (V1) and membrane integral (V0) heteromultimeric complexes. The G subunit is part of the V1 subunit, but is also thought to be strongly attached to the V0 complex. It may be involved in the coupling of ATP degradation to H+ translocation.

    Interpro description:
    This family represents the eukaryotic vacuolar (H+)-ATPase (V-ATPase) G subunit. V-ATPases generate an acidic environment in several intracellular compartments. Correspondingly, they are found as membrane-attached proteins in several organelles. They are also found in the plasma membranes of some specialised cells. V-ATPases consist of peripheral (V1) and membrane integral (V0) heteromultimeric complexes. The G subunit is part of the V1 subunit, but is also thought to be strongly attached to the V0 complex. It may be involved in the coupling of ATP degradation to H+ translocation.

    Proteins where this domain is known:
    PY01589   


    PF03194 - LUC7 (Pfam link)

    Interpro entry IPR004882 : (Interpro link)

    Pfam description:
    This family contains the N terminal region of several LUC7 protein homologues and only contains eukaryotic proteins. LUC7 has been shown to be a U1 snRNA associated protein with a role in splice site recognition. The family also contains human and mouse LUC7 like (LUC7L) proteins and human cisplatin resistance-associated overexpressed protein (CROP).

    Interpro description:

    This family consists of several LUC7 protein homologues that are restricted to eukaryotes. LUC7 has been shown to be a U1 snRNA associated protein with a role in splice site recognition. The entry contains human and mouse LUC7 like (LUC7L) proteins and human cisplatin resistance-associated overexpressed protein (CROP).

    Proteins where this domain is known:
    PY04945   


    PF03199 - GSH_synthase (Pfam link)

    Interpro entry IPR004887 : Glutathione synthase, substrate-binding, eukaryotic (Interpro link)

    Interpro description:

    This entry represents the substrate-binding domain of glutathione synthetase (GSS), a homodimeric enzyme that catalyses the conversion of gamma-L-glutamyl-L-cysteine and glycine to phosphate and glutathione in the presence of ATP. This is the second step in glutathione biosynthesis, the first step being catalysed by gamma-glutamylcysteine synthetase. In humans, defects in GSS are inherited in an autosomal recessive way and are the cause of severe metabolic acidosis, 5-oxoprolinuria, and increased rate of haemolysis and defective function of the central nervous system. The substrate-binding domain has a 3-layer alpha/beta/alpha structure.

    Proteins where this domain is known:
    PY07248   


    PF03223 - V-ATPase_C (Pfam link)

    Interpro entry IPR004907 : ATPase, V1 complex, subunit C (Interpro link)

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    V-ATPases (also known as V1V0-ATPase or vacuolar ATPase) are found in the eukaryotic endomembrane system, and in the plasma membrane of prokaryotes and certain specialised eukaryotic cells. V-ATPases hydrolyse ATP to drive a proton pump, and are involved in a variety of vital intra- and inter-cellular processes such as receptor mediated endocytosis, protein trafficking, active transport of metabolites, homeostasis and neurotransmitter release. V-ATPases are composed of two linked complexes: the V1 complex (subunits A-H) contains the catalytic core that hydrolyses ATP, while the V0 complex (subunits a, c, c', c'', d) forms the membrane-spanning pore. V-ATPases may have an additional role in membrane fusion through binding to t-SNARE proteins.

    This entry represents the C subunit that is part of the V1 complex, and is localised to the interface between the V1 and V0 complexes. This subunit does not show any homology with F-ATPase subunits. The C subunit plays an essential role in controlling the assembly of V-ATPase, acting as a flexible stator that holds together the catalytic (V1) and membrane (V0) sectors of the enzyme . The release of subunit C from the ATPase complex results in the dissociation of the V1 and V0 subcomplexes, which is an important mechanism in controlling V-ATPase activity in cells.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY05774   


    PF03224 - V-ATPase_H (Pfam link)

    Interpro entry IPR004908 : ATPase, V1 complex, subunit H (Interpro link)

    Interpro description:

    ATPases (or ATP synthases) are membrane-bound enzyme complexes/ion transporters that combine ATP synthesis and/or hydrolysis with the transport of protons across a membrane. ATPases can harness the energy from a proton gradient, using the flux of ions across the membrane via the ATPase proton channel to drive the synthesis of ATP. Some ATPases work in reverse, using the energy from the hydrolysis of ATP to create a proton gradient. There are different types of ATPases, which can differ in function (ATP synthesis and/or hydrolysis), structure (F-, V- and A-ATPases contain rotary motors) and in the type of ions they transport.

    V-ATPases (also known as V1V0-ATPase or vacuolar ATPase) are found in the eukaryotic endomembrane system, and in the plasma membrane of prokaryotes and certain specialised eukaryotic cells. V-ATPases hydrolyse ATP to drive a proton pump, and are involved in a variety of vital intra- and inter-cellular processes such as receptor mediated endocytosis, protein trafficking, active transport of metabolites, homeostasis and neurotransmitter release. V-ATPases are composed of two linked complexes: the V1 complex (subunits A-H) contains the catalytic core that hydrolyses ATP, while the V0 complex (subunits a, c, c', c'', d) forms the membrane-spanning pore. V-ATPases may have an additional role in membrane fusion through binding to t-SNARE proteins.

    This entry represents subunit H (also known as Vma13p) found in the V1 complex of V-ATPases. This subunit has a regulatory function, being responsible for activating ATPase activity and coupling ATPase activity to proton flow. The yeast enzyme contains five motifs similar to the HEAT or Armadillo repeats seen in the importins, and can be divided into two distinct domains: a large N-terminal domain consisting of stacked alpha helices, and a smaller C-terminal alpha-helical domain with a similar superhelical topology to an armadillo repeat.

    More information about this protein can be found at Protein of the Month: ATP Synthases.

    Proteins where this domain is known:
    PY02592   


    PF03226 - Yippee (Pfam link)

    Interpro entry IPR004910 : (Interpro link)

    Interpro description:

    This entry represents the Yippee-like (YPEL) family of putative zinc-binding proteins which is highly conserved among eukaryotes. The first protein in this family to be characterised, the Yippee protein from Drosophila, was identified by yeast interaction trap screen as a protein that physically interacts with moth hemolin. It was subsequently found to be a member of a highly conserved family of proteins found in diverse eukaryotes including plants, animals and fungi. Mammals contain five members of this family, YPEL1 to YPEL5, while other organisms tend to contain only two or three members. The mammalian proteins all appear to localise in the nucleus. YPEL1-4 are located in an unknown structure located on or close to the mitotic apparatus in the mitotic phase, whereas in the interphase they are located in the nuclei and nucleoli. In contrast, YPEL5 is localised to the centrosome and nucleus during interphase and at the mitotic spindle during mitosis, suggesting a function distinct from that of YPEL1-4. The localisation of the YPEL proteins suggests a novel, thopugh still unknown, function involved in cell division.

    Proteins where this domain is known:
    PY03878   


    PF03248 - Rer1 (Pfam link)

    Interpro entry IPR004932 : Retrieval of early ER protein Rer1 (Interpro link)

    Pfam description:
    RER1 family protein are involved in involved in the retrieval of some endoplasmic reticulum membrane proteins from the early golgi compartment. The C terminus of yeast Rer1p interacts with a coatomer complex.

    Interpro description:

    RER1 family proteins are involved in involved in the retrieval of some endoplasmic reticulum membrane proteins from the early golgi compartment. The C terminus of yeast Rer1p interacts with a coatomer complex.

    Proteins where this domain is known:
    PY01823   


    PF03256 - APC10 (Pfam link)

    Interpro entry IPR004939 : Anaphase-promoting complex, subunit 10 (Interpro link)

    Interpro description:

    The anaphase-promoting complex (APC) is a multi-subunit E3 protein ubiquitin ligase that is responsible for the metaphase to anaphase transition and the exit from mitosis. Anaphase is initiated when the APC triggers the destruction of securin, thereby allowing the protease, separase, to disrupt sister-chromatid cohesion. Securin ubiquitination by the APC is inhibited by cyclin-dependent kinase 1 (Cdk1)-dependent phosphorylation.

    Forkhead Box M1 (FoxM1), which is a transcription factor that is over-expressed in many cancers, is degraded in late mitosis and early G1 phase by the APC/cyclosome (APC/C) E3 ubiquitin ligase. The APC/C targets mitotic cyclins for destruction in mitosis and G1 phase and is then inactivated at S phase. It thereby generates alternating states of high and low cyclin-Cdk activity, which is required for the alternation of mitosis and DNA replication.

    APC from Schizosaccharomyces pombe and Saccharomyces cerevisiae was previously thought to have 11 subunits, but more sensitive techniques have identified 13 subunits in both yeasts.

    One of the subunits of the APC that is required for ubiquitination activity is APC10, a one-domain protein homologous to a sequence element, termed the DOC domain, found in several hypothetical proteins that may also mediate ubiquitination reactions, because they contain combinations of either RING finger (see, cullin (see or HECT (see domains.

    The DOC domain consists of a beta-sandwich, in which a five-stranded antiparallel beta-sheet is packed on top of a three stranded antiparallel beta-sheet, exhibiting a 'jellyroll' fold.

    Proteins known to contain a DOC domain include:

    Proteins where this domain is known:
    PY04728   


    PF03259 - Robl_LC7 (Pfam link)

    Interpro entry IPR004942 : (Interpro link)

    Pfam description:
    This family includes proteins that are about 100 amino acids long and have been shown to be related. Members of this family of proteins are associated with both flagellar outer arm dynein and Drosophila and rat brain cytoplasmic dynein. It is proposed that roadblock/LC7 family members may modulate specific dynein functions. This family also includes Swiss:Q9Y2Q5 Golgi-associated MP1 adapter protein and MglB from Myxococcus xanthus Swiss:Q50883, a protein involved in gliding motility. However the family also includes members from non-motile bacteria such as Streptomyces coelicolor, suggesting that the protein may play a structural or regulatory role.

    Interpro description:

    This family includes proteins that are about 100 amino acids long and have been shown to be related. Members of this family of proteins are associated with both flagellar outer arm dynein and Drosophila and rat brain cytoplasmic dynein. It is proposed that roadblock/LC7 family members may modulate specific dynein functions. This family also includes Golgi-associated MP1 adapter protein and MglB from Myxococcus xanthus, a protein involved in gliding motility. However the family also includes members from non-motile bacteria such as Streptomyces coelicolor, suggesting that the protein may play a structural or regulatory role.

    Proteins where this domain is known:
    PY00604   


    PF03271 - EB1 (Pfam link)

    Interpro entry IPR004953 : EB1, C-terminal (Interpro link)

    Pfam description:
    This motif is found at the C-terminus of proteins that are related to the EB1 protein. The EB1 proteins contain an N-terminal CH domain Pfam:PF00307. The human EB1 protein was originally discovered as a protein interacting with the C-terminus of the APC protein. This interaction is often disrupted in colon cancer, due to deletions affecting the APC C-terminus. Several EB1 orthologues are also included in this family. The interaction between EB1 and APC has been shown to have a potent synergistic effect on microtubule polymerisation. Neither of EB1 or APC alone has this effect. It is thought that EB1 targets APC to the + ends of microtubules, where APC promotes microtubule polymerisation. This process is regulated by APC phosphorylation by Cdc2, which disrupts APC-EB1 binding. Human EB1 protein can functionally substitute for the yeast EB1 homologue Mal3. In addition, Mal3 can substitute for human EB1 in promoting microtubule polymerisation with APC.

    Interpro description:

    A group of microtubule-associated proteins called +TIPs (plus end tracking proteins), including EB1 (end-binding protein 1) family proteins, label growing microtubules ends specifically in diverse organisms and are implicated in spindle dynamics, chromosome segregation, and directing microtubules toward cortical sites. EB1 members have a bipartite composition: the N-terminal CH domain mediates microtubule plus end localization and a C-terminal cargo binding domain (EB1-C) that captures cell polarity determinants. The EB1-C domain comprises a unique EB1-like sequence motif that acts as a binding site for other +TIP proteins. It interacts with the carboxy terminus of the adenomatous polyposis coli (APC) tumor suppressor, a well conserved +TIP phosphoprotein with a pivotal function in cell cycle regulation. Another binding partner of the EB1-C domain is the well conserved +TIP protein dynactin, a component of the large cytoplasmic dynein/dynactin complex.

    The ~80-residue EB1-C domain starts with a long smoothly curved helix (alpha1), which is followed by a hairpin connection leading to a short second helix (alpha2) running antiparallel to alpha1. The two parallel alpha1 helices of the EB1-C domain dimer wrap around each other in a slightly left-handed supercoil. The two alpha2 helices run antiparallel to helices alpha1 and form a similar fork in the opposite orientation and rotated by 90°. As a result, two helical segments from each monomer form a four-helix bundle. The side chain forming the hydrophobic core of this bundle are highly conserved.

    Some protein known to contain an EB1-C domain are listed below:

    Proteins where this domain is known:
    PY06631   


    PF03291 - Pox_MCEL (Pfam link)

    Interpro entry IPR004971 : mRNA capping enzyme, large subunit (Interpro link)

    Pfam description:
    This family of enzymes are related to Pfam:PF03919.

    Interpro description:
    This is a family of viral mRNA capping enzymes. The enzyme catalyses the first two reactions in the mRNA cap formation pathway. It is a heterodimer consisting of a large and small subunit. This entry is the large subunit.

    Proteins where this domain is known:
    PY04146   


    PF03332 - PMM (Pfam link)

    Interpro entry IPR005002 : Eukaryotic phosphomannomutase (Interpro link)

    Pfam description:
    This enzyme EC:5.4.2.8 is involved in the synthesis of the GDP-mannose and dolichol-phosphate-mannose required for a number of critical mannosyl transfer reactions.

    Interpro description:
    This enzyme is involved in the synthesis of the GDP-mannose and dolichol-phosphate-mannose required for a number of critical mannosyl transfer reactions.

    Proteins where this domain is known:
    PY00199   


    PF03343 - SART-1 (Pfam link)

    Interpro entry IPR005011 : (Interpro link)

    Pfam description:
    SART-1 is a protein involved in cell cycle arrest and pre-mRNA splicin. It has been shown to be a component of U4/U6 x U5 tri-snRNP complex in human, Schizosaccharomyces pombe and Saccharomyces cerevisiae. SART-1 is a known tumour antigen in a range of cancers recognised by T cells.

    Interpro description:
    This family of proteins appear to contain a leucine zipper and may therefore be a family of transcription factors.

    Proteins where this domain is known:
    PY00360   


    PF03357 - Snf7 (Pfam link)

    Interpro entry IPR005024 : Snf7 (Interpro link)

    Pfam description:
    This family of proteins are involved in protein sorting and transport from the endosome to the vacuole/lysosome in eukaryotic cells. Vacuoles/lysosomes play an important role in the degradation of both lipids and cellular proteins. In order to perform this degradative function, vacuoles/lysosomes contain numerous hydrolases which have been transported in the form of inactive precursors via the biosynthetic pathway and are proteolytically activated upon delivery to the vacuole/lysosome. The delivery of transmembrane proteins, such as activated cell surface receptors to the lumen of the vacuole/lysosome, either for degradation/downregulation, or in the case of hydrolases, for proper localisation, requires the formation of multivesicular bodies (MVBs). These late endosomal structures are formed by invaginating and budding of the limiting membrane into the lumen of the compartment. During this process, a subset of the endosomal membrane proteins is sorted into the forming vesicles. Mature MVBs fuse with the vacuole/lysosome, thereby releasing cargo containing vesicles into its hydrolytic lumen for degradation. Endosomal proteins that are not sorted into the intralumenal MVB vesicles are either recycled back to the plasma membrane or Golgi complex, or remain in the limiting membrane of the MVB and are thereby transported to the limiting membrane of the vacuole/lysosome as a consequence of fusion. Therefore, the MVB sorting pathway plays a critical role in the decision between recycling and degradation of membrane proteins. A few archaeal sequences are also present within this family.

    Interpro description:

    This is a family of eukaryotic proteins which are variously described as either hypothetical protein, developmental protein or related to yeast SNF7. The family contains human CHMP1. CHMP1 (CHromatin Modifying Protein; CHarged Multivesicular body Protein), is encoded by an alternative open reading frame in the PRSM1 gene and is conserved in both complex and simple eukaryotes. CHMP1 contains a predicted bipartite nuclear localisation signal and distributes as distinct forms to the cytoplasm and the nuclear matrix in all cell lines tested.

    Human CHMP1 is strongly implicated in multivesicular body formation. A multivesicular body is a vesicle-filled endosome that targets proteins to the interior of lysosomes. Immunocytochemistry and biochemical fractionation localise CHMP1 to early endosomes and CHMP1 physically interacts with SKD1/VPS4, a highly conserved protein directly linked to multivesicular body sorting in yeast. Similar to the action of a mutant SKD1 protein, over expression of a fusion derivative of human CHMP1 dilates endosomal compartments and disrupts the normal distribution of several endosomal markers. Genetic studies in Saccharomyces cerevisiae (Baker's yeast) further support a conserved role of CHMP1 in vesicle trafficking. Deletion of CHM1, the budding yeast homolog of CHMP1, results in defective sorting of carboxypeptidases S and Y and produces abnormal, multi-lamellar prevacuolar compartments. This phenotype classifies CHM1 as a member of the class E vacuolar protein sorting genes.

    Proteins where this domain is known:
    PY00570    PY01275    PY01293   


    PF03366 - YEATS (Pfam link)

    Interpro entry IPR005033 : YEATS (Interpro link)

    Pfam description:
    We have named this family the YEATS family, after `YNK7\', `ENL\', `AF-9\', and `TFIIF small subunit\'. This family also contains the GAS41 protein. All these proteins are thought to have a transcription stimulatory activity

    Interpro description:

    Named the YEATS family, after 'YNK7', 'ENL', 'AF-9', and 'TFIIF small subunit', this family also contains the GAS41 protein. All these proteins are thought to have a transcription stimulatory activity.

    Proteins where this domain is known:
    PY07255   


    PF03367 - zf-ZPR1 (Pfam link)

    Interpro entry IPR004457 : Zinc finger, ZPR1-type (Interpro link)

    Pfam description:
    The zinc-finger protein ZPR1 is ubiquitous among eukaryotes. It is indeed known to be an essential protein in yeast. In quiescent cells, ZPR1 is localised to the cytoplasm. But in proliferating cells treated with EGF or with other mitogens, ZPR1 accumulates in the nucleolus. ZPR1 interacts with the cytoplasmic domain of the inactive EGF receptor (EGFR) and is thought to inhibit the basal protein tyrosine kinase activity of EGFR. This interaction is disrupted when cells are treated with EGF, though by themselves, inactive EGFRs are not sufficient to sequester ZPR1 to the cytoplasm. Upon stimulation by EGF, ZPR1 directly binds the eukaryotic translation elongation factor-1alpha (eEF-1alpha) to form ZPR1/eEF-1alpha complexes. These move into the nucleus, localising particularly at the nucleolus. Indeed, the interaction between ZPR1 and eEF-1alpha has been shown to be essential for normal cellular proliferation, and ZPR1 is thought to be involved in pre-ribosomal RNA expression. The ZPR1 domain consists of an elongation initiation factor 2-like zinc finger and a double-stranded beta helix with a helical hairpin insertion. ZPR1 binds preferentially to GDP-bound eEF1A but does not directly influence the kinetics of nucleotide exchange or GTP hydrolysis. The alignment for this family shows a domain of which there are two copies in ZPR1 proteins. This family also includes several hypothetical archaeal proteins (from both Crenarchaeota and Euryarchaeota), which only contain one copy of the aligned region. This similarity between ZPR1 and archaeal proteins was not previously noted.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents ZPR1-type zinc finger domains. An orthologous protein found once in each of the completed archaeal genomes corresponds to a zinc finger-containing domain repeated as the N-terminal and C-terminal halves of the mouse protein ZPR1. ZPR1 is an experimentally proven zinc-binding protein that binds the tyrosine kinase domain of the epidermal growth factor receptor (EGFR); binding is inhibited by EGF stimulation and tyrosine phosphorylation, and activation by EGF is followed by some redistribution of ZPR1 to the nucleus. By analogy, other proteins with the ZPR1 zinc finger domain may be regulatory proteins that sense protein phosphorylation state and/or participate in signal transduction (see also.

    Deficiencies in ZPR1 may contribute to neurodegenerative disorders. ZPR1 appears to be down-regulated in patients with spinal muscular atrophy (SMA), a disease characterised by degeneration of the alpha-motor neurons in the spinal cord that can arise from mutations affecting the expression of Survival Motor Neurons (SMN). ZPR1 interacts with complexes formed by SMN, and may act as a modifier that effects the severity of SMA.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY01182   


    PF03371 - PRP38 (Pfam link)

    Interpro entry IPR005037 : (Interpro link)

    Pfam description:
    Members of this family are related to the pre mRNA splicing factor PRP38 from yeast. Therefore all the members of this family could be involved in splicing. This conserved region could be involved in RNA binding. The putative domain is about 180 amino acids in length. PRP38 is a unique component of the U4/U6.U5 tri-small nuclear ribonucleoprotein (snRNP) particle and is necessary for an essential step late in spliceosome maturation.

    Interpro description:

    Members of this family are related to the pre mRNA splicing factor PRP38 from yeast, therefore all the members of this family could be involved in splicing. This conserved region could be involved in RNA binding. The putative domain is about 180 amino acids in length. PRP38 is a unique component of the U4/U6.U5 tri-small nuclear ribonucleoprotein (snRNP) particle and is necessary for an essential step late in spliceosome maturation.

    Proteins where this domain is known:
    PY05953   


    PF03372 - Exo_endo_phos (Pfam link)

    Interpro entry IPR005135 : (Interpro link)

    Pfam description:
    This large family of proteins includes magnesium dependent endonucleases and a large number of phosphatases involved in intracellular signalling. This family includes: AP endonuclease proteins EC:4.2.99.18 e.g Swiss:P27695, DNase I proteins EC:3.1.21.1 e.g. Swiss:P24855, Synaptojanin an inositol-1,4,5-trisphosphate phosphatase EC:3.1.3.56 Swiss:O43426, Sphingomyelinase EC:3.1.4.12 Swiss:P11889 and Nocturnin Swiss:O35710.

    Interpro description:

    This domain is found in a large number of proteins including magnesium dependent endonucleases and phosphatases involved in intracellular signalling. Proteins this domain is found in include: AP endonuclease proteins, DNase I proteins, Synaptojanin an inositol-1,4,5-trisphosphate phosphatase and Sphingomyelinase.

    Proteins where this domain is known:
    PY00970    PY01264    PY02645    PY03237    PY04723    PY06583    PY06685    PY06895    PY06985    PY07668   


    PF03381 - CDC50 (Pfam link)

    Interpro entry IPR005045 : Protein of unknown function DUF284, transmembrane eukaryotic (Interpro link)

    Pfam description:
    Members of this family have been predicted to contain transmembrane helices. The family member LEM3 (Swiss:P42838) is a ligand-effect modulator, mutation of which increases glucocorticoid receptor activity in response to dexamethasone and also confers increased activity on other intracellular receptors including the progesterone, oestrogen and mineralocorticoid receptors. LEM3 is thought to affect a downstream step in the glucocorticoid receptor pathway. Factors that modulate ligand responsiveness are likely to contribute to the context-specific actions of the glucocorticoid receptor in mammalian cells. The products of genes YNR048w (Swiss:P53740), YNL323w (Swiss:P42838) and YCR094w (Swiss:P25656) (CDC50) show redundancy of function and are involved in regulation of transcription via CDC39. CDC39 (also known as NOT1) is normally a negative regulator of transcription either by affecting the general RNA polymerase II machinery or by altering chromatin structure. One function of CDC39 is to block activation of the mating response pathway in the absence of pheromone, and mutation causes arrest in G1 by activation of the pathway. It may be that the cold-sensitive arrest in G1 noticed in CDC50 mutants may be due to inactivation of CDC39. The effects of LEM3 on glucocorticoid receptor activity may also be due to effects on transcription via CDC39.

    Interpro description:
    Members of this family have no known function. They have predicted transmembrane helices.

    Proteins where this domain is known:
    PY02074    PY02459   


    PF03399 - SAC3_GANP (Pfam link)

    Interpro entry IPR005062 : (Interpro link)

    Pfam description:
    This large family includes diverse proteins involved in large complexes. The alignment contains one highly conserved negatively charged residue and one highly conserved positively charged residue that are probably important for the function of these proteins. The family includes the yeast nuclear export factor Sac3 Swiss:P46674, and mammalian GANP/MCM3-associated proteins, which facilitate the nuclear localisation of MCM3, a protein that associates with chromatin in the G1 phase of the cell-cycle. The 26S protease (or 26S proteasome) is responsible for degrading ubiquitin conjugates. It consists of 19S regulatory complexes associated with the ends of 20S proteasomes. The 19S regulatory complex is composed of about 20 different polypeptides and confers ATP-dependence and substrate specificity to the 26S enzyme. The conserved region occurs at the C-terminal of the Nin1-like regulatory subunit. This family includes several eukaryotic translation initiation factor 3 subunit 11 (eIF-3 p25) proteins. Eukaryotic initiation factor 3 (eIF3) is a multisubunit complex that is required for binding of mRNA to 40 S ribosomal subunits, stabilisation of ternary complex binding to 40 S subunits, and dissociation of 40 and 60 S subunits.

    Interpro description:

    This large family includes diverse proteins involved in large complexes. The alignment contains one highly conserved negatively charged residue and one highly conserved positively charged residue that are probably important for the function of these proteins. The family includes the yeast nuclear export factor Sac3, and mammalian GANP/MCM3-associated proteins, which facilitate the nuclear localisation of MCM3, a protein that associates with chromatin in the G1 phase of the cell-cycle. The 26S protease (or 26S proteasome) is responsible for degrading ubiquitin conjugates. It consists of 19S regulatory complexes associated with the ends of 20S proteasomes. The 19S regulatory complex is composed of about 20 different polypeptides and confers ATP-dependence and substrate specificity to the 26S enzyme. The conserved region occurs at the C-terminal of the Nin1-like regulatory subunit. This family includes several eukaryotic translation initiation factor 3 subunit 11 (eIF-3 p25) proteins. Eukaryotic initiation factor 3 (eIF3) is a multisubunit complex that is required for binding of mRNA to 40 S ribosomal subunits, stabilisation of ternary complex binding to 40 S subunits, and dissociation of 40 and 60 S subunits.

    Proteins where this domain is known:
    PY00220    PY01718    PY04581   


    PF03406 - Phage_fiber_2 (Pfam link)

    Interpro entry IPR005068 : (Interpro link)

    Pfam description:
    This repeat is found in the tail fibres of phage. For example protein K Swiss:Q37842. The repeats are about 40 residues long.

    Interpro description:

    This repeat is found in the tail fibers of phage, for example protein Kbut bacterial homologues have also been identified. The repeats are about 40 residues long.

    Proteins where this domain is known:
    PY03080   


    PF03416 - Peptidase_C54 (Pfam link)

    Interpro entry IPR005078 : (Interpro link)

    Interpro description:

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad.

    This is a group of cysteine peptidases which constitute MEROPS peptidase family C54 (Aut2 peptidase family, clan CA), which are a group of proteins of unknown function.

    Proteins where this domain is known:
    PY03056   


    PF03441 - FAD_binding_7 (Pfam link)

    Interpro entry IPR005101 : DNA photolyase, FAD-binding/Cryptochrome, C-terminal (Interpro link)

    Interpro description:

    This entry represents a multi-helical domain composed of two all-alpha subdomains that is found as the C-terminal domain in cryptochrome proteins, as well as at the N-terminal of DNA photolyase where it acts as a FAD-binding domain (the N-terminal of DNA photolyase binds a light-harvesting cofactor).

    Photolyases and cryptochromes are related flavoproteins that bind FAD. Photolyases harness the energy of blue light to repair DNA damage by removing pyrimidine dimers. Cryptochromes (CRY1 and CRY2) are blue light photoreceptors that mediate blue light-induced gene expression.

    DNA photolyases are DNA repair enzymes that repair mismatched pyrimidine dimers induced by exposure to ultra-violet light. They bind to UV-damaged DNA containing pyrimidine dimers and, upon absorbing a near-UV photon (300 to 500 nm), they catalyse dimer splitting, breaking the cyclobutane ring joining the two pyrimidines of the dimer so as to split them into the constituent monomers; this process is called photoreactivation. DNA photolyases require two choromophore-cofactors for their activity. All monomers contain a reduced FAD moiety, and, in addition, either a reduced pterin or 8-hydroxy-5-diazaflavin as a second chromophore. Either chromophore may act as the primary photon acceptor, peak absorptions occurring in the blue region of the spectrum and in the UV-B region, at a wavelength around 290nm.

    Proteins where this domain is known:
    PY02821   


    PF03446 - NAD_binding_2 (Pfam link)

    Interpro entry IPR006115 : 6-phosphogluconate dehydrogenase, NAD-binding (Interpro link)

    Pfam description:
    The NAD binding domain of 6-phosphogluconate dehydrogenase adopts a Rossmann fold.

    Interpro description:

    6-Phosphogluconate dehydrogenase (6PGD) is an oxidative carboxylase that catalyses the decarboxylating reduction of 6-phosphogluconate into ribulose 5-phosphate in the presence of NADP. This reaction is a component of the hexose mono-phosphate shunt and pentose phosphate pathways (PPP). Prokaryotic and eukaryotic 6PGD are proteins of about 470 amino acids whose sequence are highly conserved. The protein is a homodimer in which the monomers act independently: each contains a large, mainly alpha-helical domain and a smaller beta-alpha-beta domain, containing a mixed parallel and anti-parallel 6-stranded beta sheet. NADP is bound in a cleft in the small domain, the substrate binding in an adjacent pocket.

    This family represents the NAD binding domain of 6-phosphogluconate dehydrogenase which adopts a Rossman fold. The C-terminal domain is described in

    Proteins where this domain is known:
    PY00858   


    PF03462 - PCRF (Pfam link)

    Interpro entry IPR005139 : Peptide chain release factor (Interpro link)

    Pfam description:
    This domain is found in peptide chain release factors.

    Interpro description:

    This domain is found in peptide chain release factors. Peptide chain release factors are important for protein synthesis since they direct the termination of translation in response to the peptide chain termination codons UAG and UAA. Bacteria contain RF1 and Eukaryotes contain RF2. These are structurally distinct but both contain the PCRF domain.

    Proteins where this domain is known:
    PY04145   

    Proteins where this domain has been detected by our approach:
    PY03620    PY04471   


    PF03463 - eRF1_1 (Pfam link)

    Interpro entry IPR005140 : (Interpro link)

    Pfam description:
    The release factor eRF1 terminates protein biosynthesis by recognising stop codons at the A site of the ribosome and stimulating peptidyl-tRNA bond hydrolysis at the peptidyl transferase centre. The crystal structure of human eRF1 is known. The overall shape and dimensions of eRF1 resemble a tRNA molecule with domains 1, 2, and 3 of eRF1 corresponding to the anticodon loop, aminoacyl acceptor stem, and T stem of a tRNA molecule, respectively. The position of the essential GGQ motif at an exposed tip of domain 2 suggests that the Gln residue coordinates a water molecule to mediate the hydrolytic activity at the peptidyl transferase centre. A conserved groove on domain 1, 80 A from the GGQ motif, is proposed to form the codon recognition site. This family also includes other proteins for which the precise molecular function is unknown. Many of them are from Archaebacteria. These proteins may also be involved in translation termination but this awaits experimental verification.

    Interpro description:

    This domain is found in the release factor eRF1 which terminates protein biosynthesis by recognizing stop codons at the A site of the ribosome and stimulating peptidyl-tRNA bond hydrolysis at the peptidyl transferase centre. The crystal structure of human eRF1 is known. The overall shape and dimensions of eRF1 resemble a tRNA molecule with domains 1, 2, and 3 of eRF1 corresponding to the anticodon loop, aminoacyl acceptor stem, and T stem of a tRNA molecule, respectively. The position of the essential GGQ motif at an exposed tip of domain 2 suggests that the Gln residue coordinates a water molecule to mediate the hydrolytic activity at the peptidyl transferase centre. A conserved groove on domain 1, 80 A from the GGQ motif, is proposed to form the codon recognition site .

    This domain is also found in other proteins for which the precise molecular function is unknown. Many of them are from Archaebacteria. These proteins may also be involved in translation termination but this awaits experimental verification.

    Proteins where this domain is known:
    PY02838    PY03558   


    PF03464 - eRF1_2 (Pfam link)

    Interpro entry IPR005141 : (Interpro link)

    Pfam description:
    The release factor eRF1 terminates protein biosynthesis by recognising stop codons at the A site of the ribosome and stimulating peptidyl-tRNA bond hydrolysis at the peptidyl transferase centre. The crystal structure of human eRF1 is known. The overall shape and dimensions of eRF1 resemble a tRNA molecule with domains 1, 2, and 3 of eRF1 corresponding to the anticodon loop, aminoacyl acceptor stem, and T stem of a tRNA molecule, respectively. The position of the essential GGQ motif at an exposed tip of domain 2 suggests that the Gln residue coordinates a water molecule to mediate the hydrolytic activity at the peptidyl transferase centre. A conserved groove on domain 1, 80 A from the GGQ motif, is proposed to form the codon recognition site. This family also includes other proteins for which the precise molecular function is unknown. Many of them are from Archaebacteria. These proteins may also be involved in translation termination but this awaits experimental verification.

    Interpro description:

    This domain is found in the release factor eRF1 which terminates protein biosynthesis by recognizing stop codons at the A site of the ribosome and stimulating peptidyl-tRNA bond hydrolysis at the peptidyl transferase centre. The crystal structure of human eRF1 is known. The overall shape and dimensions of eRF1 resemble a tRNA molecule with domains 1, 2, and 3 of eRF1 corresponding to the anticodon loop, aminoacyl acceptor stem, and T stem of a tRNA molecule, respectively. The position of the essential GGQ motif at an exposed tip of domain 2 suggests that the Gln residue coordinates a water molecule to mediate the hydrolytic activity at the peptidyl transferase centre. A conserved groove on domain 1, 80 A from the GGQ motif, is proposed to form the codon recognition site .

    This domain is also found in other proteins which may also be involved in translation termination

    Proteins where this domain is known:
    PY03558   


    PF03465 - eRF1_3 (Pfam link)

    Interpro entry IPR005142 : (Interpro link)

    Pfam description:
    The release factor eRF1 terminates protein biosynthesis by recognising stop codons at the A site of the ribosome and stimulating peptidyl-tRNA bond hydrolysis at the peptidyl transferase centre. The crystal structure of human eRF1 is known. The overall shape and dimensions of eRF1 resemble a tRNA molecule with domains 1, 2, and 3 of eRF1 corresponding to the anticodon loop, aminoacyl acceptor stem, and T stem of a tRNA molecule, respectively. The position of the essential GGQ motif at an exposed tip of domain 2 suggests that the Gln residue coordinates a water molecule to mediate the hydrolytic activity at the peptidyl transferase centre. A conserved groove on domain 1, 80 A from the GGQ motif, is proposed to form the codon recognition site. This family also includes other proteins for which the precise molecular function is unknown. Many of them are from Archaebacteria. These proteins may also be involved in translation termination but this awaits experimental verification.

    Interpro description:

    This domain is found in the release factor eRF1 which terminates protein biosynthesis by recognizing stop codons at the A site of the ribosome and stimulating peptidyl-tRNA bond hydrolysis at the peptidyl transferase centre. The crystal structure of human eRF1 is known. The overall shape and dimensions of eRF1 resemble a tRNA molecule with domains 1, 2, and 3 of eRF1 corresponding to the anticodon loop, aminoacyl acceptor stem, and T stem of a tRNA molecule, respectively. The position of the essential GGQ motif at an exposed tip of domain 2 suggests that the Gln residue coordinates a water molecule to mediate the hydrolytic activity at the peptidyl transferase centre. A conserved groove on domain 1, 80 A from the GGQ motif, is proposed to form the codon recognition site .

    This domain is also found in other proteins which may also be involved in translation termination but this awaits experimental verification.

    Proteins where this domain is known:
    PY03558   


    PF03467 - Smg4_UPF3 (Pfam link)

    Interpro entry IPR005120 : (Interpro link)

    Pfam description:
    This family contains proteins that are involved in nonsense mediated mRNA decay. A process that is triggered by premature stop codons in mRNA. The family includes Smg-4 and UPF3.

    Interpro description:

    Nonsense-mediated mRNA decay (NMD) is a surveillance mechanism by which eukaryotic cells detect and degrade transcripts containing premature termination codons. Three 'up-frameshift' proteins, UPF1, UPF2 and UPF3, are essential for this process in organisms ranging from yeast, human to plants . Exon junction complexes (EJCs) are deposited ~24 nucleotides upstream of exon-exon junctions after splicing. Translation causes displacement of the EJCs, however, premature translation termination upstream of one or more EJCs triggers the recruitment of UPF1, UPF2 and UPF3 and activates the NMD pathway.

    This family contains UPF3. The crystal structure of the complex between human UPF2 and UPF3b, which are, respectively, a MIF4G (middle portion of eIF4G) domain and an RNP domain (ribonucleoprotein-type RNA-binding domain) has been determined to 1.95A. The protein-protein interface is mediated by highly conserved charged residues in UPF2 and UPF3b and involves the beta-sheet surface of the UPF3b ribonucleoprotein (RNP) domain, which is generally used by these domains to bind nucleic acids. In UPF3b the RNP domain does not bind RNA, whereas the UPF2 construct and the complex do. It is clear that some RNP domains have evolved for specific protein-protein interactions rather than as nucleic acid binding modules.

    Proteins where this domain is known:
    PY06727   


    PF03477 - ATP-cone (Pfam link)

    Interpro entry IPR005144 : (Interpro link)

    Interpro description:

    The ATP-cone is an evolutionarily mobile, ATP-binding regulatory domain which is found in a variety of proteins including ribonucleotide reductases, phosphoglycerate kinases and transcriptional regulators.

    In ribonucleotide reductase protein R1 from Escherichia coli this domain is located at the N-terminus, and is composed mostly of helices. It forms part of the allosteric effector region and contains the general allosteric activity site in a cleft located at the tip of the N-terminal region. This site binds either ATP (activating) or dATP (inhibitory), with the base bound in a hydrophobic pocket and the phosphates bound to basic residues. Substrate binding to this site is thought to affect enzyme activity by altering the relative positions of the two subunits of ribonucleotide reductase.

    Proteins where this domain is known:
    PY03473   


    PF03481 - SUA5 (Pfam link)

    Interpro entry IPR005145 : (Interpro link)

    Pfam description:
    The function of this domain is unknown, it is found in Swiss:P32579 and its relatives. It is found to the C-terminus of Pfam:PF01300

    Interpro description:

    The function of this domain is unknown, it is found inand its relatives. It is found C-terminal to the

    Proteins where this domain has been detected by our approach:
    PY05913   


    PF03483 - B3_4 (Pfam link)

    Interpro entry IPR005146 : B3/B4 tRNA-binding domain (Interpro link)

    Pfam description:
    This domain is found in tRNA synthetase beta subunits as well as in some non tRNA synthetase proteins.

    Interpro description:

    This entry represents the B3/B4 domain found in tRNA synthetase beta subunits as well as in some non-tRNA synthetase proteins. This domain has a 3-layer structure, and contains a beta-sandwich fold of unusual topology, and contains a putative tRNA-binding structural motif. In Thermus thermophilus, both the catalytic alpha- and the non-catalytic beta-subunits comprise the characteristic fold of the class II active-site domains. The presence of an RNA-binding domain, similar to that of the U1A spliceosomal protein, in the beta-subunit of tRNA synthetase indicates structural relationships among different families of RNA-binding proteins.

    Aminoacyl-tRNA synthetases can catalyse editing reactions to correct errors produced during amino acid activation and tRNA esterification, in order to prevent the attachment of incorrect amino acids to tRNA. The B3/B4 domain of the beta subunit contains an editing site, which lies close to the active site on the alpha subunit. Disruption of this site abolished tRNA editing, a process that is essential for faithful translation of the genetic code.

    Proteins where this domain is known:
    PY02380    PY02381   


    PF03484 - B5 (Pfam link)

    Interpro entry IPR005147 : tRNA synthetase, B5 (Interpro link)

    Pfam description:
    This domain is found in phenylalanine-tRNA synthetase beta subunits.

    Interpro description:

    Domain B5 is found in phenylalanine-tRNA synthetase beta subunits. This domain has been shown to bind DNA through a winged helix-turn-helix motif. Phenylalanine-tRNA synthetase may influence common cellular processes via DNA binding, in addition to its aminoacylation function.

    Proteins where this domain is known:
    PY02380    PY02381   


    PF03485 - Arg_tRNA_synt_N (Pfam link)

    Interpro entry IPR005148 : Arginyl tRNA synthetase, class Ic, N-terminal (Interpro link)

    Pfam description:
    This domain is found at the amino terminus of Arginyl tRNA synthetase, also called additional domain 1 (Add-1). It is about 140 residues long and it has been suggested that this domain will be involved in tRNA recognition.

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

    This domain is found at the N-terminus of Arginyl tRNA synthetase, also called additional domain 1 (Add-1). It is about 140 residues long and it has been suggested that this domain will be involved in tRNA recognition.

    Proteins where this domain is known:
    PY01800   


    PF03493 - BK_channel_a (Pfam link)

    Interpro entry IPR003929 : Potassium channel, calcium-activated, BK, alpha subunit (Interpro link)

    Interpro description:

    Potassium channels are the most diverse group of the ion channel family. They are important in shaping the action potential, and in neuronal excitability and plasticity. The potassium channel family is composed of several functionally distinct isoforms, which can be broadly separated into 2 groups: the practically non-inactivating 'delayed' group and the rapidly inactivating 'transient' group.

    These are all highly similar proteins, with only small amino acid changes causing the diversity of the voltage-dependent gating mechanism, channel conductance and toxin binding properties. Each type of K+ channel is activated by different signals and conditions depending on their type of regulation: some open in response to depolarisation of the plasma membrane; others in response to hyperpolarisation or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter, together with intracellular kinases; while others are regulated by GTP-binding proteins or other second messengers. In eukaryotic cells, K+ channels are involved in neural signalling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors (GPCRs) and may have a role in target cell lysis by cytotoxic T-lymphocytes. In prokaryotic cells, they play a role in the maintenance of ionic homeostasis.

    All K+ channels discovered so far possess a core of alpha subunits, each comprising either one or two copies of a highly conserved pore loop domain (P-domain). The P-domain contains the sequence (T/SxxTxGxG), which has been termed the K+ selectivity sequence. In families that contain one P-domain, four subunits assemble to form a selective pathway for K+ across the membrane. However, it remains unclear how the 2 P-domain subunits assemble to form a selective pore. The functional diversity of these families can arise through homo- or hetero-associations of alpha subunits or association with auxiliary cytoplasmic beta subunits. K+ channel subunits containing one pore domain can be assigned into one of two superfamilies: those that possess six transmembrane (TM) domains and those that possess only two TM domains. The six TM domain superfamily can be further subdivided into conserved gene families: the voltage-gated (Kv) channels; the KCNQ channels (originally known as KvLQT channels); the EAG-like K+ channels; and three types of calcium (Ca)-activated K+ channels (BK, IK and SK). The 2TM domain family comprises inward-rectifying K+ channels. In addition, there are K+ channel alpha-subunits that possess two P-domains. These are usually highly regulated K+ selective leak channels.

    Ca2+-activated K+ channels are a diverse group of channels that are activated by an increase in intracellular Ca2+ concentration. They are found in the majority of nerve cells, where they modulate cell excitability and action potential. Three types of Ca2+-activated K+ channel have been characterised, termed small-conductance (SK), intermediate conductance (IK) and large conductance (BK) respectively.

    BK channels (also referred to as maxi-K channels) are widely expressed in the body, being found in glandular tissue, smooth and skeletal muscle, as well as in neural tissues. They have been demonstrated to regulate arteriolar and airway diameter, and also neurotransmitter release. Each channel complex is thought to be composed of 2 types of subunit; the pore-forming (alpha) subunits and smaller accessory (beta) subunits.

    The alpha subunit of the BK channel was initially thought to share the characteristic 6TM organisation of the voltage-gated K+ channels. However, the molecule is now thought to possess an additional TM domain, with an extracellular N-terminus and intracellular C-terminus. This C-terminal region contains 4 predominantly hydrophobic domains, which are also thought to lie intracellularly. The extracellular N-terminus and the first TM region are required for modulation by the beta subunit. The precise location of the Ca2+-binding site that modulates channel activation remains unknown, but it is thought to lie within the C-terminal hydrophobic domains.

    Proteins where this domain is known:
    PY00619    PY02006   


    PF03501 - S10_plectin (Pfam link)

    Interpro entry IPR005326 : (Interpro link)

    Pfam description:
    This presumed domain is found at the N-terminus of some isoforms of the cytoskeletal muscle protein plectin as well as the ribosomal S10 protein. This domain may be involved in RNA binding.

    Interpro description:

    This presumed domain is found at the N terminus of some isoforms of the cytoskeletal muscle protein plectin as well as the ribosomal S10 protein. This domain may be involved in RNA binding.

    Proteins where this domain is known:
    PY02462   


    PF03531 - SSrecog (Pfam link)

    Interpro entry IPR000969 : Structure-specific recognition protein (Interpro link)

    Pfam description:
    SSRP1 has been implicated in transcriptional initiation and elongation and in DNA replication and repair.

    Interpro description:
    Human structure-specific recognition protein, SSRP1, binds specifically to DNA modified with the anti-cancer drug cisplatin. An 81 kD protein is predicted, containing several highly-charged domains and a stretch of 75 residues that share 47% identity with a portion of the high mobility group (HMG) protein HMG1. This HMG box probably constitutes the structure recognition element for cisplatin-modified DNA, the probable recognition motif being the local duplex unwinding and bending that occurs on formation of intra-strand cross-links. SSRP1 is the human homologue of a recently identified mouse protein that binds to recombination signal sequences. These sequences have been postulated to form stem-loop structures, further implicating local bends and unwinding in DNA as a recognition target for HMG-box proteins. A Drosophila melanogaster cDNA encoding an HMG-box-containing protein has also been isolated. This protein shares 50% sequence identity with human SSRP1. In vitro binding studies using Drosophila SSRP showed that the protein binds to single-stranded DNA and RNA, with highest affinity for nucleotides G and U. Comparison of the predicted amino acid sequences among SSRP family members reveals 48% identity, with structural conservation in the C-terminus of the HMG box, as well as domains of highly charged residues. The most highly conserved regions lie in the poorly understood N-terminus, suggesting that this portion of the protein is critical for its function.

    This entry contains Pob3 which is a subunit of the heterodimeric yeast FACT complex (Spt16p-Pob3p). The FACT complex facilitates RNA Polymerase II transcription elongation through nucleosomes by destabilizing and then reassembling nucleosome structure.

    Proteins where this domain is known:
    PY06012   


    PF03561 - Allantoicase (Pfam link)

    Interpro entry IPR015908 : Allantoicase region (Interpro link)

    Pfam description:
    This family is found in pairs in Allantoicases, forming the majority of the protein. These proteins allow the use of purines as secondary nitrogen sources in nitrogen-limiting conditions through the reaction: allantoate + H(2)0 = (-)-ureidoglycolate + urea.

    Interpro description:

    Allantoicase (also known as allantoate amidinohydrolase) is involved in purine degradation, facilitating the utilization of purines as secondary nitrogen sources under nitrogen-limiting conditions. While purine degradation converges to uric acid in all vertebrates, its further degradation varies from species to species. Uric acid is excreted by birds, reptiles, and some mammals that do not have a functional uricase gene, whereas other mammals produce allantoin. Amphibians and microorganisms produce ammonia and carbon dioxide using the uricolytic pathway. Allantoicase performs the second step in this pathway catalyzing the conversion of allantoate into ureidoglycolate and urea.

     allantoate + H(2)0 =  (S)-ureidoglycolate + urea

    The structure of allantoicase is best described as being composed of two repeats (the allantoicase repeats: AR1 and AR2), which are connected by a flexible linker. The crystal structure, resolved at 2.4A resolution, reveals that AR1 has a very similar fold to AR2, both repeats being jelly-roll motifs, composed of four-stranded and five-stranded antiparallel beta-sheets. Each jelly-roll motif has two conserved surface patches that probably constitute the active site.

    Proteins where this domain is known:
    PY01476    PY03526   


    PF03568 - Peptidase_C50 (Pfam link)

    Interpro entry IPR005314 : Peptidase C50, separase (Interpro link)

    Interpro description:

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad.

    This group of cysteine peptidases belong to MEROPS peptidase family C50 (separase family, clan CD). The active site residues for members of this family and family C14 occur in the same order in the sequence: H,C.

    The separases are caspase-like proteases, which plays a central role in the chromosome segregation. In yeast they cleave the rad21 subunit of the cohesin complex at the onset of anaphase. During most of the cell cycle, separase is inactivated by the securin/cut2 protein, which probably covers its active site.

    Proteins where this domain is known:
    PY06348   


    PF03587 - EMG1 (Pfam link)

    Interpro entry IPR005304 : Ribosomal biogenesis, methyltransferase, EMG1/NEP1 (Interpro link)

    Pfam description:
    Members of this family are essential for 40S ribosomal biogenesis. The structure of EMG1 has revealed that it is a novel member of the superfamily of alpha/beta knot fold methyltransferases.

    Interpro description:

    Members of this family are essential for 40S ribosomal biogenesis. They play a role in the methylation reaction of pre-rRNA processing. The structure of EMG1 has revealed that it is a novel member of the superfamily of alpha/beta knot fold methyltransferases.

    Proteins where this domain is known:
    PY05849   


    PF03588 - Leu_Phe_trans (Pfam link)

    Interpro entry IPR004616 : Leucyl/phenylalanyl-tRNA-protein transferase (Interpro link)

    Interpro description:
    Leucyl/phenylalanyl-tRNA--protein transferasetransfers a Leu or Phe to the amino end of certain proteins to enable degradation. The N-terminal residue controls the biological half-life of many proteins via the N-end rule pathway.

    Proteins where this domain is known:
    PY02988   


    PF03602 - Cons_hypoth95 (Pfam link)

    Interpro entry IPR016065 : (Interpro link)

    Interpro description:
    This is a family of conserved hypothetical proteins, which includes a putative methylase.

    Proteins where this domain is known:
    PY06977   


    PF03630 - Fumble (Pfam link)

    Interpro entry IPR011602 : Fumble (Interpro link)

    Pfam description:
    Fumble is required for cell division in Drosophila. Mutants lacking fumble exhibit abnormalities in bipolar spindle organisation, chromosome segregation, and contractile ring formation. Analyses have demonstrated that encodes three protein isoforms, all of which contain a domain with high similarity to the pantothenate kinases of A. nidulans and mous. A role of fumble in membrane synthesis has been propose.

    Interpro description:

    Fumble is required for cell division in Drosophila. Mutants lacking fumble exhibit abnormalities in bipolar spindle organisation, chromosome segregation, and contractile ring formation. Analyses have demonstrated that it encodes three protein isoforms, all of which contain a domain with high similarity to the pantothenate kinases of Emericella nidulans and mouse. A role of fumble in membrane synthesis has been proposed.

    Proteins where this domain is known:
    PY02917    PY06793   


    PF03635 - Vps35 (Pfam link)

    Interpro entry IPR005378 : (Interpro link)

    Pfam description:
    Vacuolar protein sorting-associated protein (Vps) 35 is one of around 50 proteins involved in protein trafficking. In particular, Vps35 assembles into a retromer complex with at least four other proteins Vps5, Vps17, Vps26 and Vps29. Vps35 contains a central region of weaker sequence similarity, thought to indicate the presence of at least three domain.

    Interpro description:

    The movement of lipid and protein components between intracellular organelles requires the regulated interactions of many molecules. Vacuolar protein sorting-associated protein (Vps)5 is a yeast protein that is a subunit of a large multimeric complex, termed the retromer complex, involved in retrograde transport of proteins from endosomes to the trans-Golgi network. Sorting nexin (SNX) 1 and SNX2 are its mammalian orthologs.

    To carry out its biological functions, Vps5 forms the retromer complex with at least four other proteins: Vps17, Vps26, Vps29, and Vps35.Vps35 contains a central region of weaker sequence similarity, thought to indicate the presence of at least three domains.

    Proteins where this domain is known:
    PY01746   


    PF03643 - Vps26 (Pfam link)

    Interpro entry IPR005377 : Vacuolar protein sorting-associated protein 26 (Interpro link)

    Pfam description:
    Vacuolar protein sorting-associated protein (Vps) 26 is one of around 50 proteins involved in protein trafficking. In particular, Vps26 assembles into a retromer complex with at least four other proteins Vps5, Vps17, Vps29 and Vps35. This family also contains Down syndrome critical region 3/A.

    Interpro description:

    The movement of lipid and protein components between intracellular organelles requires the regulated interactions of many molecules. Vacuolar protein sorting-associated protein (Vps)5 is a yeast protein that is a subunit of a large multimeric complex, termed the retromer complex, involved in retrograde transport of proteins from endosomes to the trans-Golgi network. Sorting nexin (SNX) 1 and SNX2 are its mammalian orthologs.

    To carry out its biological functions, Vps5 forms the retromer complex with at least four other proteins: Vps17, Vps26, Vps29, and Vps35. This family of Vps26-proteins also contains Down syndrome critical region 3/A.

    Proteins where this domain is known:
    PY05367   


    PF03650 - UPF0041 (Pfam link)

    Interpro entry IPR005336 : (Interpro link)

    Interpro description:

    This is a family of proteins of unknown function.

    Proteins where this domain is known:
    PY06652    PY07365   


    PF03660 - PHF5 (Pfam link)

    Interpro entry IPR005345 : (Interpro link)

    Pfam description:
    This family of proteins the superfamily of PHD-finger proteins. At least one example, from mouse, may act as a chromatin-associated protei. The S. pombe ini1 gene is essential, required for splicing. It is localised in the nucleus, but not detected in the nucleolus and can be complemented by human ini1.

    Interpro description:

    Phf5 is a member of a novel murine multigene family that is highly conserved during evolution and belongs to the superfamily of PHD-finger proteins. At least one example, from Mus musculus (Mouse), may act as a chromatin-associated protein. The Schizosaccharomyces pombe (Fission yeast) ini1 gene is essential, required for splicing. It is localised in the nucleus, but not detected in the nucleolus and can be complemented by human ini1. The proteins of this family contain five CXXC motifs.

    Proteins where this domain is known:
    PY05008   


    PF03690 - UPF0160 (Pfam link)

    Interpro entry IPR003226 : (Interpro link)

    Pfam description:
    This family of proteins contains a large number of metal binding residues. The patterns are suggestive of a phosphoesterase function. The conserved DHH motif may mean this family is related to Pfam:PF01368.

    Interpro description:
    The function of this domain is not known, but it is found in several uncharacterised proteins and a probable metal dependent protein hydrolase.

    Proteins where this domain is known:
    PY07532   


    PF03711 - OKR_DC_1_C (Pfam link)

    Interpro entry IPR008286 : Orn/Lys/Arg decarboxylase, C-terminal (Interpro link)

    Interpro description:
    Pyridoxal-dependent decarboxylases are bacterial proteins acting on ornithine, lysine, arginine and related substrates. One of the regions of sequence similarity contains a conserved lysine residue, which is the site of attachment of the pyridoxal-phosphate group.

    Proteins where this domain is known:
    PY00349   


    PF03715 - Noc2 (Pfam link)

    Interpro entry IPR005343 : (Interpro link)

    Pfam description:
    At least one member, Noc2p from yeast, is required for a late step in 60S subunit export from the nucleus. It has also been shown to co-precipitate with Nug1p, a nuclear GTPase also required for ribosome nucleus export. This family was formerly known as UPF0120.

    Interpro description:

    This is a small family of mainly hypothetical proteins of unknown function.

    Proteins where this domain is known:
    PY04556   


    PF03719 - Ribosomal_S5_C (Pfam link)

    Interpro entry IPR005324 : Ribosomal protein S5, C-terminal (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    This is a family of proteins related to the 30S ribosomal protein S5P from Sulfolobus acidocaldarius. Ribosomal protein S5 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S5 is known to be important in the assembly and function of the 30S ribosomal subunit. Mutations in S5 have been shown to increase translational error frequencies.

    Proteins where this domain is known:
    PY06704   


    PF03725 - RNase_PH_C (Pfam link)

    Interpro entry IPR015847 : Exoribonuclease, phosphorolytic domain 2 (Interpro link)

    Pfam description:
    This family includes 3\'-5\' exoribonucleases. Ribonuclease PH contains a single copy of this domain, and removes nucleotide residues following the -CCA terminus of tRNA. Polyribonucleotide nucleotidyltransferase (PNPase) contains two tandem copies of the domain. PNPase is involved in mRNA degradation in a 3\'-5\' direction. The exosome is a 3\'-5\' exoribonuclease complex that is required for 3\' processing of the 5.8S rRNA. Three of its five protein components, Swiss:P46948 Swiss:Q12277 and Swiss:P25359 contain a copy of this domain. Swiss:Q10205, a hypothetical protein from S. pombe appears to belong to an uncharacterised subfamily. This subfamily is found in both eukaryotes and archaebacteria.

    Interpro description:

    The PH (phosphorolytic) domain is responsible for 3'-5' exoribonuclease activity, although in some proteins this domain has lost its catalytic function. An active PH domain uses inorganic phosphate as a nucleophile, adding it across the phosphodiester bond between the end two nucleotides in order to release ribonucleoside 5'-diphosphate (rNDP) from the 3' end of the RNA substrate.

    PH domains can be found in bacterial/organelle RNases and PNPases (polynucleotide phosphorylases), as well as in archaeal and eukaryotic RNA exosomes, the later acting as nano-compartments for the degradation or processing of RNA (including mRNA, rRNA, snRNA and snoRNA). Bacterial/organelle PNPases share a common barrel structure with RNA exosomes, consisting of a hexameric ring of PH domains that act as a degradation chamber, and an S1-domain/KH-domain containing cap that binds the RNA substrate (and sometimes accessory proteins) in order to regulate and restrict entry into the degradation chamber . Unstructured RNA substrates feed in through the pore made by the S1 domains, are degraded by the PH domain ring, and exit as nucleotides via the PH pore at the opposite end of the barrel.

    This entry represents the phosphorolytic (PH) domain 2, which has a core 3-layer alpha/beta/alpha structure. This domain is found in bacterial/organelle PNPases and in archaeal/eukaryotic exosomes..

    More information about these proteins can be found at Protein of the Month: RNA Exosomes.

    Proteins where this domain has been detected by our approach:
    PY02553    PY06726    PY07159   


    PF03727 - Hexokinase_2 (Pfam link)

    Interpro entry IPR001312 : Hexokinase (Interpro link)

    Pfam description:
    Hexokinase (EC:2.7.1.1) contains two structurally similar domains represented by this family and PFAM:PF00349. Some members of the family have two copies of each of these domains.

    Interpro description:

    Hexokinase is an important enzyme that catalyses the ATP-dependent conversion of aldo- and keto-hexose sugars to the hexose-6-phosphate (H6P). The enzyme can catalyse this reaction on glucose, fructose, sorbitol and glucosamine, and as such is the first step in a number of metabolic pathways. The addition of a phosphate group to the sugar acts to trap it in a cell, since the negatively charged phosphate cannot easily traverse the plasma membrane.

    The enzyme is widely distributed in eukaryotes. There are three isozymes of hexokinase in yeast (PI, PII and glucokinase): isozymes PI and PII phosphorylate both aldo- and keto-sugars; glucokinase is specific for aldo-hexoses. All three isozymes contain two domains. Structural studies of yeast hexokinase reveal a well-defined catalytic pocket that binds ATP and hexose, allowing easy transfer of the phosphate from ATP to the sugar. Vertebrates contain four hexokinase isozymes, designated I to IV, where types I to III contain a duplication of the two-domain yeast-type hexokinases. Both the N- and C-terminal halves bind hexose and H6P, though in types I an III only the C-terminal half supports catalysis, while both halves support catalysis in type II. The N-terminal half is the regulatory region. Type IV hexokinase is similar to the yeast enzyme in containing only the two domains, and is sometimes incorrectly referred to as glucokinase.

    The different vertebrate isozymes differ in their catalysis, localisation and regulation, thereby contributing to the different patterns of glucose metabolism in different tissues. Whereas types I to III can phosphorylate a variety of hexose sugars and are inhibited by glucose-6-phosphate (G6P), type IV is specific for glucose and shows no G6P inhibition. Type I enzyme may have a catabolic function, producing H6P for energy production in glycolysis; it is bound to the mitochondrial membrane, which enables the coordination of glycolysis with the TCA cycle. Types II and III enzyme may have anabolic functions, providing H6P for glycogen or lipid synthesis. Type IV enzyme is found in the liver and pancreatic beta-cells, where it is controlled by insulin (activation) and glucagon (inhibition). In pancreatic beta-cells, type IV enzyme acts as a glucose sensor to modify insulin secretion. Mutations in type IV hexokinase have been associated with diabetes mellitus.

    Proteins where this domain is known:
    PY02030   


    PF03764 - EFG_IV (Pfam link)

    Interpro entry IPR005517 : Translation elongation factor EFG/EF2, domain IV (Interpro link)

    Pfam description:
    This domain is found in elongation factor G, elongation factor 2 and some tetracycline resistance proteins and adopts a ribosomal protein S5 domain 2-like fold.

    Interpro description:

    Translation elongation factors are responsible for two main processes during protein synthesis on the ribosome. EF1A (or EF-Tu) is responsible for the selection and binding of the cognate aminoacyl-tRNA to the A-site (acceptor site) of the ribosome. EF2 (or EF-G) is responsible for the translocation of the peptidyl-tRNA from the A-site to the P-site (peptidyl-tRNA site) of the ribosome, thereby freeing the A-site for the next aminoacyl-tRNA to bind. Elongation factors are responsible for achieving accuracy of translation and both EF1A and EF2 are remarkably conserved throughout evolution.

    Elongation factor EF2 (EF-G) is a G-protein. It brings about the translocation of peptidyl-tRNA and mRNA through a ratchet-like mechanism: the binding of GTP-EF2 to the ribosome causes a counter-clockwise rotation in the small ribosomal subunit; the hydrolysis of GTP to GDP by EF2 and the subsequent release of EF2 causes a clockwise rotation of the small subunit back to the starting position. This twisting action destabilises tRNA-ribosome interactions, freeing the tRNA to translocate along the ribosome upon GTP-hydrolysis by EF2. EF2 binding also affects the entry and exit channel openings for the mRNA, widening it when bound to enable the mRNA to translocate along the ribosome.

    EF2 has five domains. This entry represents domain IV found in EF2 (or EF-G) of both prokaryotes and eukaryotes. The EF2-GTP-ribosome complex undergoes extensive structural rearrangement for tRNA-mRNA movement to occur. Domain IV, which extends from the 'body' of the EF2 molecule much like a lever arm, appears to be essential for the structural transition to take place.

    More information about these proteins can be found at Protein of the Month: Elongation Factors.

    Proteins where this domain is known:
    PY01864    PY04706    PY05356    PY05417   


    PF03770 - IPK (Pfam link)

    Interpro entry IPR005522 : Inositol polyphosphate kinase (Interpro link)

    Pfam description:
    ArgRIII has has been demonstrated to be an inositol polyphosphate kinase.

    Interpro description:

    ArgRIII has been demonstrated to be an inositol polyphosphate kinase which catalyses the reaction

    ATP + 1D-myo-inositol 1,4,5-trisphosphate = ADP + 1D-myo-inositol 1,3,4,5-tetrakisphosphate
    .

    Proteins where this domain is known:
    PY06556    PY07179   


    PF03798 - TRAM_LAG1_CLN8 (Pfam link)

    Interpro entry IPR006634 : TRAM, LAG1 and CLN8 homology (Interpro link)

    Interpro description:

    TLC is a protein domain with at least 5 transmembrane alpha-helices. Lag1p and Lac1p are essential for acyl-CoA-dependent ceramide synthesis , TRAM is a subunit of the translocon and the CLN8 gene is mutated in Northern epilepsy syndrome. Proteins containing this domain may possess multiple functions such as lipid trafficking, metabolism, or sensing. Trh homologues possess additional homeobox domains.

    Proteins where this domain is known:
    PY00812   


    PF03801 - Ndc80_HEC (Pfam link)

    Interpro entry IPR005550 : (Interpro link)

    Pfam description:
    Members of this family are components of the mitotic spindle. It has been shown that Ndc80/HEC from yeast is part of a complex called the Ndc80p complex. This complex is thought to bind to the microtubules of the spindle.

    Interpro description:

    Members of this family are components of the mitotic spindle. It has been shown that Ndc80 from yeast is part of a complex called the Ndc80p complex. This complex is thought to bind to the microtubules of the spindle.

    Proteins where this domain is known:
    PY01957   


    PF03805 - CLAG (Pfam link)

    Interpro entry IPR005553 : (Interpro link)

    Pfam description:
    Clag (cytoadherence linked asexual gene) is a malaria surface protein which has been shown to be involved in the binding of Plasmodium falciparum infected erythrocytes to host endothelial cells, a process termed cytoadherence. The cytoadherence phenomenon is associated with the sequestration of infected erythrocytes in the blood vessels of the brain, cerebral malaria. Clag is a multi-gene family in Plasmodium falciparum with at least 9 members identified to date. Orthologous proteins in the rodent malaria species Plasmodium chabaudi (Lawson D Unpubl. obs.) suggest that the gene family is found in other malaria species and may play a more generic role in cytoadherence.

    Interpro description:

    Clag (cytoadherence linked asexual gene) is a malaria surface protein which has been shown to be involved in the binding of Plasmodium falciparum infected erythrocytes to host endothelial cells, a process termed cytoadherence. The cytoadherence phenomenon is associated with the sequestration of infected erythrocytes in the blood vessels of the brain, cerebral malaria. Clag is a multi-gene family in P. falciparum with at least 9 members identified to date. Orthologous proteins in the rodent malaria species Plasmodium chabaudi suggest that the gene family is found in other malaria species and may play a more generic role in cytoadherence.

    Proteins where this domain is known:
    PY02932    PY04666    PY06117   


    PF03807 - F420_oxidored (Pfam link)

    Interpro entry IPR004455 : (Interpro link)

    Interpro description:
    The function of F420-dependent NADP reductase is the transfer of electrons from reduced coenzyme F420 into an electron transport chain. It catalyses the reduction of F420 with NADP(+) and the reduction of NADP(+) with F420H(2).

    Proteins where this domain is known:
    PY02938   


    PF03810 - IBN_N (Pfam link)

    Interpro entry IPR001494 : Importin-beta, N-terminal (Interpro link)

    Interpro description:

    The exchange of macromolecules between the nucleus and cytoplasm takes place through nuclear pore complexes within the nuclear membrane. Active transport of large molecules through these pore complexes require carrier proteins, called karyopherins (importins and exportins), which shuttle between the two compartments.

    Members of the importin-beta (karyopherin-beta) family can bind and transport cargo by themselves, or can form heterodimers with importin-alpha. As part of a heterodimer, importin-beta mediates interactions with the pore complex, while importin-alpha acts as an adaptor protein to bind the nuclear localisation signal (NLS) on the cargo through the classical NLS import of proteins. Importin-beta is a helicoidal molecule constructed from 19 HEAT repeats. Many nuclear pore proteins contain FG sequence repeats that can bind to HEAT repeats within importins, which is important for importin-beta mediated transport.

    Ran GTPase helps to control the unidirectional transfer of cargo. The cytoplasm contains primarily RanGDP and the nucleus RanGTP through the actions of RanGAP and RanGEF, respectively. In the nucleus, RanGTP binds to importin-beta within the importin/cargo complex, causing a conformational change in importin-beta that releases it from importin-alpha-bound cargo. As a result, the N-terminal auto-inhibitory region on importin-alpha is free to loop back and bind to the major NLS-binding site, causing the cargo to be released. There are additional release factors as well.

    More information about these proteins can be found at Protein of the Month: Importins.

    Proteins where this domain is known:
    PY02706    PY04054    PY04443    PY05627   

    Proteins where this domain has been detected by our approach:
    PY00155    PY06655   


    PF03815 - LCCL (Pfam link)

    Interpro entry IPR004043 : (Interpro link)

    Interpro description:

    The LCCL domain has been named after the best characterised proteins that were found to contain it, namely Limulus factor C, Coch-5b2 and Lgl1. It is an about 100 amino acids domain whose C-terminal part contains a highly conserved histidine in a conserved motif YxxxSxxCxAAVHxGVI. The LCCL module is thought to be an autonomously folding domain that has been used for the construction of various modular proteins through exon-shuffling. It has been found in various metazoan proteins in association with complement B-type domains, C-type lectin domains, von Willebrand type A domains, CUB domains, discoidin lectin domains or CAP domains. It has been proposed that the LCCL domain could be involved in lipopolysaccharide (LPS) binding. Secondary structure prediction suggests that the LCCL domain contains six beta strands and two alpha helices.

    Some proteins known to contain a LCCL domain include Limulus factor C, a LPS endotoxin-sensitive trypsin type serine protease which serves to protect the organism from bacterial infection; vertebrate cochlear protein cochlin or coch-5b2 (Cochlin is probably a secreted protein, mutations affecting the LCCL domain of coch-5b2 cause the deafness disorder DFNA9 in humans); and mammalian late gestation lung protein Lgl1, contains two tandem copies of the LCCL domain.

    Proteins where this domain is known:
    PY01071    PY01154    PY01580    PY05554   


    PF03828 - PAP_assoc (Pfam link)

    Interpro entry IPR002058 : (Interpro link)

    Pfam description:
    This domain is found in poly(A) polymerases and has been shown to have polynucleotide adenylyltransferase activ. Proteins in this family have been located to both the nucleus and cytoplasm.

    Interpro description:

    These PAP/25A associated domains are found in uncharacterised eukaryotic proteins, a number of which are described as 'topoisomerase 1-related' though they appear to have little or no homology to topoisomerase 1. The signatures that define this group of sequences often occur towards the C-terminus after the PAP/25A core domain

    Proteins where this domain has been detected by our approach:
    PY04615    PY05727   


    PF03834 - Rad10 (Pfam link)

    Interpro entry IPR004579 : DNA repair protein rad10 (Interpro link)

    Pfam description:
    Ercc1 and XPF (xeroderma pigmentosum group F-complementing protein) are two structure-specific endonucleases of a class of seven containing an ERCC4 domain. Together they form an obligate complex that functions primarily in nucleotide excision repair (NER), a versatile pathway able to detect and remove a variety of DNA lesions induced by UV light and environmental carcinogens, and secondarily in DNA interstrand cross-link repair and telomere maintenance. This domain in fact binds simultaneously to both XPF and single-stranded DNA; this ternary complex explains the important role of Ercc1 in targeting its catalytic XPF partner to the NER pre-incision complex.

    Interpro description:

    All proteins in this family for which functions are known are components in a multiprotein endonuclease complex (usually made up of Rad1 and Rad10 homologs). This complex is used primarily for nucleotide excision repair but also for some aspects of recombination repair. In yeast, Rad10 works as a heterodimer with Rad1, and is involved in nucleotide excision repair of DNA damaged with UV light, bulky adducts or cross-linking agents. The complex forms an endonuclease which specifically degrades single-stranded DNA.

    Ercc1 and XPF (xeroderma pigmentosum group F-complementing protein) are two structure-specific endonucleases of a class of seven containing an ERCC4 domain. Together they form an obligate complex that functions primarily in nucleotide excision repair (NER), a versatile pathway able to detect and remove a variety of DNA lesions induced by UV light and environmental carcinogens, and secondarily in DNA inter-strand cross-link repair and telomere maintenance. This domain in fact binds simultaneously to both XPF and single-stranded DNA; this ternary complex explains the important role of Ercc1 in targeting its catalytic XPF partner to the NER pre-incision complex.

    Proteins where this domain is known:
    PY06905   


    PF03839 - Sec62 (Pfam link)

    Interpro entry IPR004728 : Translocation protein Sec62 (Interpro link)

    Interpro description:
    Members of the NSCC2 family have been sequenced from various yeast, fungal and animals species including Saccharomyces cerevisiae, Drosophila melanogaster and Homo sapiens. These proteins are the Sec62 proteins, believed to be associated with the Sec61 and Sec63 constituents of the general protein secretary systems of yeast microsomes. They are also the non-selective cation (NS) channels of the mammalian cytoplasmic membrane. The yeast Sec62 protein has been shown to be essential for cell growth. The mammalian NS channel proteins have been implicated in platelet derived growth factor(PGDF) dependent single channel current in fibroblasts. These channels are essentially closed in serum deprived tissue-culture cells and are specifically opened by exposure to PDGF. These channels are reported to exhibit equal selectivity for Na+, K+ and Cs+ with low permeability to Ca2+, and no permeability to anions.

    Proteins where this domain is known:
    PY02255   


    PF03849 - Tfb2 (Pfam link)

    Interpro entry IPR004598 : Transcription factor Tfb2 (Interpro link)

    Interpro description:
    Members of this family are part of the TFIIH complex which is involved in the initiation of transcription and nucleotide excision repair. The core-TFIIH basal transcription factor complex has six subunits, this is the p52 subunit.

    Proteins where this domain is known:
    PY06009   


    PF03850 - Tfb4 (Pfam link)

    Interpro entry IPR004600 : Transcription factor Tfb4 (Interpro link)

    Interpro description:
    Members of this family are part of the TFIIH complex which is involved in the initiation of transcription and nucleotide excision repair. The core-TFIIH basal transcription factor complex has six subunits, this is the p34 subunit.

    Proteins where this domain is known:
    PY04660   


    PF03853 - YjeF_N (Pfam link)

    Interpro entry IPR004443 : (Interpro link)

    Interpro description:

    The YjeF N-terminal domains occur either as single proteins or fusions with other domains and are commonly associated with enzymes. In bacteria and archaea, YjeF N-terminal domains are often fused to a YjeF C-terminal domain with high structural homology to the members of a ribokinase-like superfamilyand/or belong to operons that encode enzymes of diverse functions: pyridoxal phosphate biosynthetic protein PdxJ; phosphopanteine-protein transferase; ATP/GTP hydrolase; and pyruvate-formate lyase 1-activating enzyme. In plants, the YjeF N-terminal domain is fused to a C-terminal putative pyridoxamine 5'-phosphate oxidase. In eukaryotes, proteins that consist of (Sm)-FDF-YjeF N-terminal domains may be involved in RNA processing.

    The YjeF N-terminal domains represent a novel version of the Rossmann fold, one of the most common protein folds in nature observed in numerous enzyme families, that has acquired a set of catalytic residues and structural features that distinguish them from the conventional dehydrogenases. The YjeF N-terminal domain is comprised of a three-layer alpha-beta-alpha sandwich with a central beta-sheet surrounded by helices. The conservation of the acidic residues in the predicted active site of the YjeF N-terminal domains is reminiscent of the presence of such residues in the active sites of diverse hydrolases.

    Proteins where this domain is known:
    PY05519   


    PF03870 - RNA_pol_Rpb8 (Pfam link)

    Interpro entry IPR005570 : RNA polymerase, Rpb8 (Interpro link)

    Pfam description:
    Rpb8 is a subunit common to the three yeast RNA polymerases, pol I, II and III. Rpb8 interacts with the largest subunit Rpb1, and with Rpb3 and Rpb11, two smaller subunits.

    Interpro description:
    Rpb8 is a subunit common to the three yeast RNA polymerases, pol I, II and III. Rpb8 interacts with the largest subunit Rpb1, and with Rpb3 and Rpb11, two smaller subunits.

    Proteins where this domain is known:
    PY02017   


    PF03871 - RNA_pol_Rpb5_N (Pfam link)

    Interpro entry IPR005571 : RNA polymerase, Rpb5, N-terminal (Interpro link)

    Pfam description:
    Rpb5 has a bipartite structure which includes a eukaryote-specific N-terminal domain and a C-terminal domain resembling the archaeal RNAP subunit H. The N-terminal domain is involved in DNA binding and is part of the jaw module in the RNA pol II structure. This module is important for positioning the downstream DNA.

    Interpro description:

    Prokaryotes contain a single DNA-dependent RNA polymerase (RNAP; that is responsible for the transcription of all genes, while eukaryotes have three classes of RNAPs (I-III) that transcribe different sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. Certain subunits of RNAPs, including RPB5 (POLR2E in mammals), are common to all three eukaryotic polymerases. RPB5 plays a role in the transcription activation process. Eukaryotic RPB5 has a bipartite structure consisting of a unique N-terminal region, plus a C-terminal region that is structurally homologous to the prokaryotic RPB5 homologue, subunit H (gene rpoH).

    This entry represents the N-terminal domain of eukaryotic RPB5, which has a core structure consisting of 3 layers alpha/beta/alpha. The N-terminal domain is involved in DNA binding and is part of the jaw module in the RNA pol II structure. This module is important for positioning the downstream DNA.

    Proteins where this domain is known:
    PY02778   


    PF03874 - RNA_pol_Rpb4 (Pfam link)

    Interpro entry IPR005574 : RNA polymerase II, Rpb4 (Interpro link)

    Interpro description:

    The eukaryotic RNA polymerase subunits RPB4 and RPB7 form a heterodimer that reversibly associates with the RNA polymerase II core. Archaeal cells contain a single RNAP made up of about 12 subunits, displaying considerable homology to the eukaryotic RNAPII subunits. The RPB4 and RPB7 homologs are called subunits F and E, respectively, and have been shown to form a stable heterodimer. While the RPB7 homolog is reasonably well conserved, the similarity between the eukaryotic RPB4 and the archaeal F subunit is barely detectable.

    Proteins where this domain is known:
    PY02525   


    PF03876 - RNA_pol_Rpb7_N (Pfam link)

    Interpro entry IPR005576 : RNA polymerase Rpb7, N-terminal (Interpro link)

    Pfam description:
    Rpb7 bind to Rpb4 to form a heterodimer. This complex is thought to interact with the nascent RNA strand during RNA polymerase II elongatio. This family includes the homologs from RNA polymerase I and III. In RNA polymerase I, Rpa43 is at least one of the subunits contacted by the transcription factor TIF-IA.

    Interpro description:

    The eukaryotic RNA polymerase subunits RPB4 and RPB7 form a heterodimer that reversibly associates with the RNA polymerase II core. Archaeal cells contain a single RNAP made up of about 12 subunits, displaying considerable homology to the eukaryotic RNAPII subunits. The RPB4 and RPB7 homologs are called subunits F and E, respectively, and have been shown to form a stable heterodimer. While the RPB7 homolog is reasonably well conserved, the similarity between the eukaryotic RPB4 and the archaeal F subunit is barely detectable.

    Proteins where this domain is known:
    PY02385    PY05877   


    PF03900 - Porphobil_deamC (Pfam link)

    Interpro entry IPR000860 : Tetrapyrrole biosynthesis, hydroxymethylbilane synthase (Interpro link)

    Interpro description:

    Tetrapyrroles are large macrocyclic compounds derived from a common biosynthetic pathway. The end-product, uroporphyrinogen III, is used to synthesise a number of important molecules, including vitamin B12, haem, sirohaem, chlorophyll, coenzyme F430 and phytochromobilin.

    The first stage in tetrapyrrole synthesis is the synthesis of 5-aminoaevulinic acid ALA via two possible routes: (1) condensation of succinyl CoA and glycine (C4 pathway) using ALA synthase, or (2) decarboxylation of glutamate (C5 pathway) via three different enzymes, glutamyl-tRNA synthetase to charge a tRNA with glutamate, glutamyl-tRNA reductase to reduce glutamyl-tRNA to glutamate-1-semialdehyde (GSA), and GSA aminotransferase to catalyse a transamination reaction to produce ALA.

    The second stage is to convert ALA to uroporphyrinogen III, the first macrocyclic tetrapyrrolic structure in the pathway. This is achieved by the action of three enzymes in one common pathway: porphobilinogen (PBG) synthase (or ALA dehydratase) to condense two ALA molecules to generate porphobilinogen; hydroxymethylbilane synthase (or PBG deaminase) to polymerise four PBG molecules into preuroporphyrinogen (tetrapyrrole structure); and uroporphyrinogen III synthase to link two pyrrole units together (rings A and D) to yield uroporphyrinogen III.

    Uroporphyrinogen III is the first branch point of the pathway. To synthesise cobalamin (vitamin B12), sirohaem, and coenzyme F430, uroporphyrinogen III needs to be converted into precorrin-2 by the action of uroporphyrinogen III methyltransferase. To synthesise haem and chlorophyll, uroporphyrinogen III needs to be decarboxylated into coproporphyrinogen III by the action of uroporphyrinogen III decarboxylase.

    This entry represents hydroxymethylbilane synthase (or porphobilinogen deaminase), which functions during the second stage of tetrapyrrole biosynthesis. This enzyme catalyses the polymerisation of four PBG molecules into the tetrapyrrole structure, preuroporphyrinogen, with the concomitant release of four molecules of ammonia. This enzyme uses a unique dipyrro-methane cofactor made from two molecules of PBG, which is covalently attached to a cysteine side chain. The tetrapyrrole product is synthesized in an ordered, sequential fashion, by initial attachment of the first pyrrole unit (ring A) to the cofactor, followed by subsequent additions of the remaining pyrrole units (rings B, C, D) to the growing pyrrole chain. The link between the pyrrole ring and the cofactor is broken once all the pyrroles have been added. This enzyme is folded into three distinct domains that enclose a single, large active site that makes use of an aspartic acid as its one essential catalytic residue, acting as a general acid/base during catalysis. A deficiency of hydroxymethylbilane synthase is implicated in the neuropathic disease, Acute Intermittent Porphyria (AIP).

    Proteins where this domain is known:
    PY01828   


    PF03901 - Glyco_transf_22 (Pfam link)

    Interpro entry IPR005599 : Alg9-like mannosyltransferase (Interpro link)

    Pfam description:
    Members of this family are mannosyltransferase enzymes. At least some members are localised in endoplasmic reticulum and involved in GPI anchor biosynthesis.

    Interpro description:

    Members of this family are mannosyltransferase enzymes. At least some members are localised in endoplasmic reticulum and involved in GPI anchor biosynthesis. In yeast the SMP3 (YOR149C) has been implemented in plasmid stability.

    Proteins where this domain is known:
    PY01388   


    PF03908 - Sec20 (Pfam link)

    Interpro entry IPR005606 : (Interpro link)

    Pfam description:
    Sec20 is a membrane glycoprotein associated with secretory pathway.

    Interpro description:

    Sec20 is a membrane glycoprotein associated with secretory pathway.

    Proteins where this domain is known:
    PY03407   


    PF03909 - BSD (Pfam link)

    Interpro entry IPR005607 : (Interpro link)

    Pfam description:
    This domain contains a distinctive -FW- motif. It is found in a family of eukaryotic transcription factors as well as a set of proteins of unknown function.

    Interpro description:

    The BSD domain is an about 60-residue long domain named after the BTF2-like transcription factors, Synapse-associated proteins and DOS2-like proteins in which it is found. Additionally, it is also found in several hypothetical proteins. The BSD domain occurs in one or two copies in a variety of species ranging from primal protozoan to human. It can be found associated with other domains such as the BTB domain (see or the U-box in multidomain proteins. The function of the BSD domain is yet unknown.

    Secondary structure prediction indicates the presence of three predicted alpha helices, which probably form a three-helical bundle in small domains. The third predicted helix contains neighbouring phenylalanine and tryptophan residues - less common amino acids that are invariant in all the BSD domains identified and that are the most striking sequence features of the domain.

    Some proteins known to contain one or two BSD domains are listed below:
  • Mammalian TFIIH basal transcription factor complex p62 subunit (GTF2H1).
  • Yeast RNA polymerase II transcription factor B 73 kDa subunit (TFB1), the homologue of BTF2.
  • Yeast DOS2 protein. It is involved in single-copy DNA replication and ubiquitination.
  • Drosophila synapse-associated protein SAP47.
  • Mammalian SYAP1.
  • Various Arabidopsis thaliana (Mouse-ear cress) hypothetical proteins.
  • Proteins where this domain is known:
    PY03420    PY06452   

    Proteins where this domain has been detected by our approach:
    PY00359   


    PF03914 - CBF (Pfam link)

    Interpro entry IPR005612 : (Interpro link)

    Interpro description:

    This domain is present in the CAATT-binding protein which is essential for growth and necessary for 60S ribosomal subunit biogenesis. Other proteins containing this domain stimulate transcription from the HSP70 promoter.

    Proteins where this domain is known:
    PY00187   


    PF03917 - GSH_synth_ATP (Pfam link)

    Interpro entry IPR005615 : Glutathione synthase, eukaryotic (Interpro link)

    Interpro description:

    This entry represents eukaryotic glutathione synthetase (GSS), a homodimeric enzyme that catalyses the conversion of gamma-L-glutamyl-L-cysteine and glycine to phosphate and glutathione in the presence of ATP. This is the second step in glutathione biosynthesis, the first step being catalysed by gamma-glutamylcysteine synthetase. In humans, defects in GSS are inherited in an autosomal recessive way and are the cause of severe metabolic acidosis, 5-oxoprolinuria, and increased rate of haemolysis and defective function of the central nervous system.

    Proteins where this domain is known:
    PY07248   


    PF03919 - mRNA_cap_C (Pfam link)

    Interpro entry IPR013846 : (Interpro link)

    Interpro description:

    This domain is found at the C terminus of the mRNA capping enzyme. The mRNA capping enzyme in yeasts is composed of two separate chains: alpha a mRNA guanyltransferase and beta an RNA 5'-triphosphate. X-ray crystallography reveals a large conformational change during guanyl transfer by mRNA capping enzymes. Binding of the enzyme to nucleotides is specific to the GMP moiety of GTP. The viral mRNA capping enzyme is a monomer that transfers a GMP cap onto the end of mRNA that terminates with a 5'-diphosphate tail.

    Proteins where this domain is known:
    PY05095   


    PF03931 - Skp1_POZ (Pfam link)

    Interpro entry IPR016073 : SKP1 component, POZ (Interpro link)

    Interpro description:

    SKP1 (together with SKP2) was identified as an essential component of the cyclin A-CDK2 S phase kinase complex. It was found to bind several F-box containing proteins (e.g., Cdc4, Skp2, cyclin F) and to be involved in the ubiquitin protein degradation pathway. A yeast homologue of SKP1 (P52286) was identified in the centromere bound kinetochore complex and is also involved in the ubiquitin pathway. In Dictyostelium discoideum (Slime mold) FP21 was shown to be glycosylated in the cytosol and has homology to SKP1.

    This entry represents a POZ domain with a core structure consisting of beta(2)/alpha(2)/beta(2)/alpha(2) in two layers, alpha/beta. This domain is found at the N-terminal of SKP1 proteins as well as in subunit D of the centromere DNA-binding protein complex Cbf3.

    Proteins where this domain is known:
    PY00081   


    PF03939 - Ribosomal_L23eN (Pfam link)

    Interpro entry IPR005633 : (Interpro link)

    Pfam description:
    The N-terminal domain appears to be specific to the eukaryotic ribosomal proteins L25, L23, and L23a.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    The N-terminal domain appears to be specific to the eukaryotic ribosomal proteins L25, L23, and L23a.

    Proteins where this domain has been detected by our approach:
    PY04600   


    PF03946 - Ribosomal_L11_N (Pfam link)

    Interpro entry IPR000911 : Ribosomal protein L11 (Interpro link)

    Pfam description:
    The N-terminal domain of Ribosomal protein L11 adopts an alpha/beta fold and is followed by the RNA binding C-terminal domain.

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L11 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L11 is known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups bacteria, plant chloroplast, read algal chloroplast, cyanelle and archaeabacterial L11; and mammalian, plant and yeast L12 (YL15). L11 is a protein of 140 to 165 amino-acid residues. In E. coli, the C-terminal half of L11 has been shown to be in an extended and loosely folded conformation and is likely to be buried within the ribosomal structure.

    Proteins where this domain is known:
    PY01745    PY04344   


    PF03947 - Ribosomal_L2_C (Pfam link)

    Interpro entry IPR002171 : Ribosomal protein L2 (Interpro link)

    Interpro description:

    Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.

    Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome.

    Ribosomal protein L2 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L2 is known to bind to the 23S rRNA and to have peptidyltransferase activity. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups:

    Proteins where this domain is known:
    PY04343    PY04762    PY05952   


    PF03950 - tRNA-synt_1c_C (Pfam link)

    Interpro entry IPR000924 : Glutamyl/glutaminyl-tRNA synthetase, class Ic (Interpro link)

    Pfam description:
    Other tRNA synthetase sub-families are too dissimilar to be included. This family includes only glutamyl and glutaminyl tRNA synthetases. In some organisms, a single glutamyl-tRNA synthetase aminoacylates both tRNA(Glu) and tRNA(Gln).

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction. These proteins differ widely in size and oligomeric state, and have limited sequence homology. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric. Class II aminoacyl-tRNA synthetases share an anti-parallel beta-sheet fold flanked by alpha-helices, and are mostly dimeric or multimeric, containing at least three conserved regions. However, tRNA binding involves an alpha-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan and valine belong to class I synthetases; these synthetases are further divided into three subclasses, a, b and c, according to sequence homology. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine belong to class-II synthetases.

    Glutamyl-tRNA synthetase is a class Ic synthetase and shows several similarities with glutaminyl-tRNA synthetase concerning structure and catalytic properties. It is an alpha2 dimer. To date one crystal structure of a glutamyl-tRNA synthetase (Thermus thermophilus) has been solved. The molecule has the form of a bent cylinder and consists of four domains. The N-terminal half (domains 1 and 2) contains the 'Rossman fold' typical for class I synthetases and resembles the corresponding part of Escherichia coli GlnRS, whereas the C-terminal half exhibits a GluRS-specific structure.

    Proteins where this domain is known:
    PY02178    PY02891   


    PF03951 - Gln-synt_N (Pfam link)

    Interpro entry IPR008147 : Glutamine synthetase, beta-Grasp (Interpro link)

    Interpro description:

    Glutamine synthetase (GS) plays an essential role in the metabolism of nitrogen by catalyzing the condensation of glutamate and ammonia to form glutamine.

    There seem to be three different classes of GS:

    While the three classes of GS's are clearly structurally related, the sequence similarities are not so extensive.

    Proteins where this domain is known:
    PY04688   


    PF03952 - Enolase_N (Pfam link)

    Interpro entry IPR000941 : Enolase (Interpro link)

    Interpro description:

    Enolase (2-phospho-D-glycerate hydrolase) is an essential glycolytic enzyme that catalyses the interconversion of 2-phosphoglycerate and phosphoenolpyruvate. In vertebrates, there are 3 different, tissue-specific isoenzymes, designated alpha, beta and gamma. Alpha is present in most tissues, beta is localised in muscle tissue, and gamma is found only in nervous tissue. The functional enzyme exists as a dimer of any 2 isoforms. In immature organs and in adult liver, it is usually an alpha homodimer, in adult skeletal muscle, a beta homodimer, and in adult neurons, a gamma homodimer. In developing muscle, it is usually an alpha/beta heterodimer, and in the developing nervous system, an alpha/gamma heterodimer. The tissue specific forms display minor kinetic differences. Tau-crystallin, one of the major lens proteins in some fish, reptiles and birds, has been shown to be evolutionary related to enolase.

    Neuron-specific enolase is released in a variety of neurological diseases, such as multiple sclerosis and after seizures or acute stroke. Several tumour cells have also been found positive for neuron-specific enolase. Beta-enolase deficiency is associated with glycogenosis type XIII defect.

    Proteins where this domain is known:
    PY06644   


    PF03953 - Tubulin_C (Pfam link)

    Interpro entry IPR018316 : Tubulin/FtsZ, 2-layer sandwich domain (Interpro link)

    Pfam description:
    This family includes the tubulin alpha, beta and gamma chains, as well as the bacterial FtsZ family of proteins. Members of this family are involved in polymer formation. FtsZ is the polymer-forming protein of bacterial cell division. It is part of a ring in the middle of the dividing cell that is required for constriction of cell membrane and cell envelope to yield two daughter cells. FtsZ and tubulin are GTPases. FtsZ can polymerise into tubes, sheets, and rings in vitro and is ubiquitous in eubacteria and archaea. Tubulin is the major component of microtubules.

    Interpro description:

    This domain is found in the tubulin alpha, beta and gamma chains, as well as the bacterial FtsZ family of proteins. These proteins are GTPases and are involved in polymer formation. Tubulin is the major component of microtubules, while FtsZ is the polymer-forming protein of bacterial cell division, it is part of a ring in the middle of the dividing cell that is required for constriction of cell membrane and cell envelope to yield two daughter cells. FtsZ can polymerise into tubes, sheets, and rings in vitro and is ubiquitous in bacteria and archaea. This is the C-terminal domain.

    Proteins where this domain is known:
    PY00808    PY01155    PY04063    PY05711    PY05777   


    PF03962 - Mnd1 (Pfam link)

    Interpro entry IPR005647 : (Interpro link)

    Pfam description:
    This family of proteins includes MND1 from S. cerevisiae. The mnd1 protein forms a complex with hop2 to promote homologous chromosome pairing and meiotic double-strand break repair.

    Interpro description:
    This family of proteins includes meiotic nuclear division protein 1 (MND1) from Saccharomyces cerevisiae (Baker's yeast). The mnd1 protein forms a complex with hop2 to promote homologous chromosome pairing and meiotic double-strand break repair.

    Proteins where this domain is known:
    PY04140   


    PF03966 - Trm112p (Pfam link)

    Interpro entry IPR005651 : (Interpro link)

    Pfam description:
    The function of this family is uncertain. The bacterial members are about 60-70 amino acids in length and the eukaryotic examples are about 120 amino acids in length. The C terminus contains the strongest conservation. Trm112p is required for tRNA methylation in S. cerevisiae and is found in complexes with 2 tRNA methylases (TRM9 and TRM11) also with putative methyltransferase YDR140W. The zinc-finger protein Ynr046w is plurifunctional and a component of the eRF1 methyltransferase in yeast. The crystal structure of Ynr046w has been determined to 1.7 A resolution. It comprises a zinc-binding domain built from both the N- and C-terminal sequences and an inserted domain, absent from bacterial and archaeal orthologs of the protein, composed of three alpha-helices.

    Interpro description:

    This family of short proteins have no known function. The bacterial members are about 60-70 amino acids in length and the eukaryotic examples are about 120 amino acids in length. The C-terminus contains the strongest conservation.

    Proteins where this domain is known:
    PY05264    PY05265   


    PF03969 - AFG1_ATPase (Pfam link)

    Interpro entry IPR005654 : ATPase, AFG1-like (Interpro link)

    Pfam description:
    This family of proteins contains a P-loop motif and are predicted to be ATPases.

    Interpro description:

    ATPase family gene 1 (AFG1) ATPase is a 377 amino acid putative protein with an ATPase motif typical of the protein family including SEC18p PAS1, CDC48-VCP and TBP. AFG1 also has substantial homology to these proteins outside the ATPase domain. This family of proteins contains a P-loop motif.

    Proteins where this domain is known:
    PY01368   


    PF03986 - Autophagy_N (Pfam link)

    Interpro entry IPR007134 : (Interpro link)

    Pfam description:
    Autophagocytosis is a starvation-induced process responsible for transport of cytoplasmic proteins to the lysosome/vacuole. Atg3 is a ubiquitin like modifier that is topologically similar to the canonical E2 enzyme. It catalyses the conjugation of Atg8 and phosphatidylethanolamine.

    Interpro description:

    Proteins in this entry belong to the Atg3 group of proteins and the Atg3 conjugation enzymes.

    Autophagy is a degradative transport pathway that delivers cytosolic proteins to the lysosome (vacuole) and is induced by starvation. Cytosolic proteins appear inside the vacuole enclosed in autophagic vesicles. Autophagy significantly differs from other transport pathways by using double membrane layered transport intermediates, called autophagosomes. The breakdown of vesicular transport intermediates is a unique feature of autophagy. Autophagy can also function in the elimination of invading bacteria and antigens.

    Atg3 is the E2 enzyme for the LC3 lipidation process. It is essential for autophagocytosis. The super protein complex, the Atg16L complex, consists of multiple Atg12-Atg5 conjugates. Atg16L has an E3-like role in the LC3 lipidation reaction. The activated intermediate, LC3-Atg3 (E2), is recruited to the site where the lipidation takes place.

    Atg3 catalyses the conjugation of Atg8 and phosphatidylethanolamine (PE). Atg3 has an alpha/beta-fold, and its core region is topologically similar to canonical E2 enzymes. Atg3 has two regions inserted in the core region and another with a long alpha-helical structure that protrudes from the core region as far as 30 A.. It interacts with atg8 through an intermediate thioester bond between Cys-288 and the C-terminal Gly of atg8. It also interacts with the C-terminal region of the E1-like atg7 enzyme.

    Autophagocytosis is a starvation-induced process responsible for transport of cytoplasmic proteins to the lysosome/vacuole. Atg3 is a ubiquitin like modifier that is topologically similar to the canonical E2 enzyme. It catalyses the conjugation of Atg8 and phosphatidylethanolamine.

    This domain is the N-terminal of Atg3 while the C-terminal is represented by

    Proteins where this domain is known:
    PY04568   


    PF03987 - Autophagy_act_C (Pfam link)

    Interpro entry IPR007135 : (Interpro link)

    Pfam description:
    Autophagocytosis is a starvation-induced process responsible for transport of cytoplasmic proteins to the vacuole. The cysteine residue within the HPC motif is the putative active-site residue for recognition of the Apg5 subunit of the autophagosome complex.

    Interpro description:

    Proteins in this entry belong to the Atg3 group of proteins and the Atg3 conjugation enzymes.

    Autophagy is a degradative transport pathway that delivers cytosolic proteins to the lysosome (vacuole) and is induced by starvation. Cytosolic proteins appear inside the vacuole enclosed in autophagic vesicles. Autophagy significantly differs from other transport pathways by using double membrane layered transport intermediates, called autophagosomes. The breakdown of vesicular transport intermediates is a unique feature of autophagy. Autophagy can also function in the elimination of invading bacteria and antigens.

    Atg3 is the E2 enzyme for the LC3 lipidation process. It is essential for autophagocytosis. The super protein complex, the Atg16L complex, consists of multiple Atg12-Atg5 conjugates. Atg16L has an E3-like role in the LC3 lipidation reaction. The activated intermediate, LC3-Atg3 (E2), is recruited to the site where the lipidation takes place.

    Atg3 catalyses the conjugation of Atg8 and phosphatidylethanolamine (PE). Atg3 has an alpha/beta-fold, and its core region is topologically similar to canonical E2 enzymes. Atg3 has two regions inserted in the core region and another with a long alpha-helical structure that protrudes from the core region as far as 30 A.. It interacts with atg8 through an intermediate thioester bond between Cys-288 and the C-terminal Gly of atg8. It also interacts with the C-terminal region of the E1-like atg7 enzyme.

    Autophagocytosis is a starvation-induced process responsible for transport of cytoplasmic proteins to the vacuole. The cysteine residue within the HPC motif is the putative active-site residue for recognition of the Apg5 subunit of the autophagosome complex.

    Proteins where this domain is known:
    PY04567   


    PF03989 - DNA_gyraseA_C (Pfam link)

    Interpro entry IPR006691 : DNA gyrase/topoisomerase IV, subunit A, C-terminal beta-pinwheel (Interpro link)

    Pfam description:
    This repeat is found as 6 tandem copies at the C-termini of GyrA and ParC DNA gyrases. It is predicted to form 4 beta strands and to probably form a beta-propeller structure. This region has been shown to bind DNA non-specifically and may stabilise the DNA-topoisomerase complex.

    Interpro description:

    DNA topoisomerases regulate the number of topological links between two DNA strands (i.e. change the number of superhelical turns) by catalysing transient single- or double-strand breaks, crossing the strands through one another, then resealing the breaks. These enzymes have several functions: to remove DNA supercoils during transcription and DNA replication; for strand breakage during recombination; for chromosome condensation; and to disentangle intertwined DNA during mitosis. DNA topoisomerases are divided into two classes: type I enzymes (topoisomerases I, III and V) break single-strand DNA, and type II enzymes (topoisomerases II, IV and VI) break double-strand DNA.

    Type II topoisomerases are ATP-dependent enzymes, and can be subdivided according to their structure and reaction mechanisms: type IIA (topoisomerase II or gyrase, and topoisomerase IV) and type IIB (topoisomerase VI). These enzymes are responsible for relaxing supercoiled DNA as well as for introducing both negative and positive supercoils.

    This entry represents the beta-pinwheel repeat found at the C-terminal end of subunit A of topoisomerase IV (ParC) and subunit A of DNA gyrase (GyrA). DNA gyrase is the topoisomerase II found primarily in bacteria and archaea that consists of two polypeptide subunits, gyrA and gyrB, which form a heterotetramer: (BA)2. This is distinct from the topoisomerase II found in most eukaryotes, which consists of a single polypeptide, with the N- and C-terminal regions corresponding to gyrB and gyrA, respectively, and which is not represented in this entry.

    The ability of DNA gyrase to introduce negative supercoils into DNA is mediated in part by the C-terminal domain of subunit A, which forms a beta-pinwheel fold that is similar to a beta-propeller but with a different blade topology, and which forms a superhelical spiral domain. This beta-pinwheel is capable of bending DNA by over 180 degrees over a 40 bp region, possibly by wrapping the DNA around the GyrA C-terminal beta-pinwheel domain.

    In topoisomerase IV, although the C-terminal domain forms a similar superhelical spiral to that of DNA gyrase A, it assembles as a broken form of a beta-pinwheel as distinct from that of gyrA, due to the absence of a DNA gyrase-specific GyrA box motif. This difference may account for parC being less efficient than gyrA in mediating DNA-bending, leading to their divergence in terms of activity, where topoisomerase IV acts to relax positive supercoils, and DNA gyrase acts to introduce negative supercoils.

    More information about this protein can be found at Protein of the Month: DNA Topoisomerase.

    Proteins where this domain is known:
    PY03453   

    Proteins where this domain has been detected by our approach:
    PY07326   


    PF03998 - Utp11 (Pfam link)

    Interpro entry IPR007144 : Small-subunit processome, Utp11 (Interpro link)

    Pfam description:
    This protein is found to be part of a large ribonucleoprotein complex containing the U3 snoRNA. Depletion of the Utp proteins impedes production of the 18S rRNA, indicating that they are part of the active pre-rRNA processing complex. This large RNP complex has been termed the small subunit (SSU) processome.

    Interpro description:

    A large ribonuclear protein complex is required for the processing of the small-ribosomal-subunit rRNA - the small-subunit (SSU) processome. This preribosomal complex contains the U3 snoRNA and at least 40 proteins, which have the following properties:

    There appears to be a linkage between polymerase I transcription and the formation of the SSU processome; as some, but not all, of the SSU processome components are required for pre-rRNA transcription initiation. These SSU processome components have been termed t-Utps. They form a pre-complex with pre-18S rRNA in the absence of snoRNA U3 and other SSU processome components. It has been proposed that the t-Utp complex proteins are both rDNA and rRNA binding proteins that are involved in the initiation of pre18S rRNA transcription. Initially binding to rDNA then associating with the 5' end of the nascent pre18S rRNA. The t-Utpcomplex forms the nucleus around which the rest of the SSU processome components, including snoRNA U3, assemble. From electron microscopy the SSU processome may correspond to the terminal knobs visualized at the 5' ends of nascent 18S rRNA.

    This entry contains Utp11, a large ribonuclear protein that associates with snoRNA U3.

    Proteins where this domain is known:
    PY01569   


    PF04000 - Sas10_Utp3 (Pfam link)

    Interpro entry IPR007146 : (Interpro link)

    Pfam description:
    This family contains Utp3 and LCP5 which are components of the U3 ribonucleoprotein comple. It also includes the human C1D protein and Saccharomyces cerevisiae YHR081W (rrp47), an exosome-associated protein required for the 3\' processing of stable RNAs, and Sas10 which has been identified as a regulator of chromatin silencing. This family also includes the human protein Neuroguidin an initiation factor 4E (eIF4E) binding protein.

    Interpro description:

    This family contains Utp3 and LCP5 which are components of the U3 ribonucleoprotein complex. It also includes the Homo sapiens (Human) C1D protein and Saccharomyces cerevisiae (Baker's yeast) YHR081W (rrp47), an exosome-associated protein required for the 3' processing of stable RNAs and Sas10 which has been identified as a regulator of chromatin silencing. This entry also includes the human protein Neuroguidin, an initiation factor 4E (eIF4E)-binding protein.

    Proteins where this domain is known:
    PY06014   


    PF04003 - Utp12 (Pfam link)

    Interpro entry IPR007148 : Small-subunit processome, Utp12 (Interpro link)

    Pfam description:
    This domain is found at the C-terminus of proteins containing WD40 repeats. These proteins are part of the U3 ribonucleoprotein the yeast protein is called Utp12 or DIP2 Swiss:Q12220.

    Interpro description:

    A large ribonuclear protein complex is required for the processing of the small-ribosomal-subunit rRNA - the small-subunit (SSU) processome. This preribosomal complex contains the U3 snoRNA and at least 40 proteins, which have the following properties:

    There appears to be a linkage between polymerase I transcription and the formation of the SSU processome; as some, but not all, of the SSU processome components are required for pre-rRNA transcription initiation. These SSU processome components have been termed t-Utps. They form a pre-complex with pre-18S rRNA in the absence of snoRNA U3 and other SSU processome components. It has been proposed that the t-Utp complex proteins are both rDNA and rRNA binding proteins that are involved in the initiation of pre18S rRNA transcription. Initially binding to rDNA then associating with the 5' end of the nascent pre18S rRNA. The t-Utpcomplex forms the nucleus around which the rest of the SSU processome components, including snoRNA U3, assemble. From electron microscopy the SSU processome may correspond to the terminal knobs visualized at the 5' ends of nascent 18S rRNA.

    This domain is found at the C terminus of proteins containing WD40 repeats. These proteins are part of the U3 ribonucleoprotein and the yeast protein is called Utp12 or DIP2 Utp12 specifacally interacts with snoRNA U3 and with MPP10.

    Proteins where this domain has been detected by our approach:
    PY01106   


    PF04006 - Mpp10 (Pfam link)

    Interpro entry IPR007151 : (Interpro link)

    Pfam description:
    This family includes proteins related to Mpp10 (M phase phosphoprotein 10). The U3 small nucleolar ribonucleoprotein (snoRNP) is required for three cleavage events that generate the mature 18S rRNA from the pre-rRNA. In Saccharomyces cerevisiae, depletion of Mpp10, a U3 snoRNP-specific protein, halts 18S rRNA production and impairs cleavage at the three U3 snoRNP-dependent sites.

    Interpro description:
    This family includes proteins related to Mpp10 (M phase phosphoprotein 10). The U3 small nucleolar ribonucleoprotein (snoRNP) is required for three cleavage events that generate the mature 18S rRNA from the pre-rRNA. In Saccharomyces cerevisiae, depletion of Mpp10, a U3 snoRNP-specific protein, halts 18S rRNA production and impairs cleavage at the three U3 snoRNP-dependent sites.

    Proteins where this domain is known:
    PY06994   


    PF04032 - Rpr2 (Pfam link)

    Interpro entry IPR007175 : (Interpro link)

    Pfam description:
    This family contains a ribonuclease P subunit of humans and yeast. Other members of the family include the probable archaeal homologues. This family includes SNM1. It is a subunit of RNase MRP (mitochondrial RNA processing), a ribonucleoprotein endoribonuclease that has roles in both mitochondrial DNA replication and nuclear 5.8S rRNA processing. SNM1 is an RNA binding protein that binds the MRP RNA specifically. This subunit possibly binds the precursor tRNA.

    Interpro description:
    This family contains a ribonuclease P subunit of human and yeast. Other members of the family include the probable archaeal homologues. This subunit possibly binds the precursor tRNA.

    Proteins where this domain is known:
    PY00262    PY00734   


    PF04034 - DUF367 (Pfam link)

    Interpro entry IPR007177 : (Interpro link)

    Interpro description:

    This domain is found in a family of proteins of unknown function. It appears to be found in eukaryotes and archaebacteria, and occurs associated with a potential metal-binding region in RNase L inhibitor, RLI.

    Proteins where this domain is known:
    PY06626   


    PF04037 - DUF382 (Pfam link)

    Interpro entry IPR007180 : Protein of unknown function DUF382 (Interpro link)

    Pfam description:
    This domain is specific to the human splicing factor 3b subunit 2 and it\'s orthologues. Splicing factor 3b subunit 2 or SAP145 is a suppressor of U2 snRNA mutations. Pre-mRNA splicing is catalysed by a large ribonucleoprotein complex called the spliceosome. Spliceosomes are multi-component enzymes that catalyse pre-mRNA splicing and form step-wise by the ordered interaction of UsnRNPs and non-snRNP proteins with short conserved regions of the pre-mRNA at the 5\' and 3\' splice sites and branch site.

    Interpro description:
    This domain is specific to the human splicing factor 3b subunit 2 and its orthologs.

    Proteins where this domain is known:
    PY04139   


    PF04042 - DNA_pol_E_B (Pfam link)

    Interpro entry IPR007185 : DNA polymerase alpha/epsilon, subunit B (Interpro link)

    Pfam description:
    This family contains a number of DNA polymerase subunits. The B subunit of the DNA polymerase alpha plays an essential role at the initial stage of DNA replication in S. cerevisiae and is phosphorylated in a cell cycle-dependent manner. DNA polymerase epsilon is essential for cell viability and chromosomal DNA replication in budding yeast. In addition, DNA polymerase epsilon may be involved in DNA repair and cell-cycle checkpoint control. The enzyme consists of at least four subunits in mammalian cells as well as in yeast. The largest subunit of DNA polymerase epsilon is responsible for polymerase epsilon is responsible for polymerase activity. In mouse, the DNA polymerase epsilon subunit B is the second largest subunit of the DNA polymerase. A part of the N-terminal was found to be responsible for the interaction with SAP18. Experimental evidence suggests that this subunit may recruit histone deacetylase to the replication fork to modify the chromatin structure.

    Interpro description:
    DNA polymerase epsilon is essential for cell viability and chromosomal DNA replication in budding yeast. In addition, DNA polymerase epsilon may be involved in DNA repair and cell-cycle checkpoint control. The enzyme consists of at least four subunits in mammalian cells as well as in yeast. The largest subunit of DNA polymerase epsilon is responsible for polymerase activity. In mouse, the DNA polymerase epsilon subunit B is the second largest subunit of the DNA polymerase. A part of the N-terminal was found to be responsible for the interaction with SAP18. Experimental evidence suggests that this subunit may recruit histone deacetylase to the replication fork to modify the chromatin structure.

    Proteins where this domain is known:
    PY01936    PY04256    PY06200   


    PF04046 - PSP (Pfam link)

    Interpro entry IPR006568 : (Interpro link)

    Pfam description:
    Proline rich domain found in numerous spliceosome associated proteins.

    Interpro description:

    PSP is a proline-rich domain of unknown function found in spliceosome associated proteins.

    Proteins where this domain is known:
    PY04139   


    PF04047 - PWP2 (Pfam link)

    Interpro entry IPR007190 : (Interpro link)

    Interpro description:
    This domain is found in PWP2, a member of the WD-repeat family of proteins, which is an essential Saccharomyces cerevisiae (Baker's yeast) protein involved in cell separation.

    Proteins where this domain has been detected by our approach:
    PY04284   


    PF04051 - TRAPP (Pfam link)

    Interpro entry IPR007194 : (Interpro link)

    Pfam description:
    TRAPP plays a key role in the targeting and/or fusion of ER-to-Golgi transport vesicles with their acceptor compartment. TRAPP is a large multimeric protein that contains at least 10 subunits. This family contains many TRAPP family proteins. The Bet3 subunit is one of the better characterised TRAPP proteins and has a dimeric structure with hydrophobic channels. The channel entrances are located on a putative membrane-interacting surface that is distinctively flat, wide and decorated with positively charged residues. Bet3 is proposed to localise TRAPP to the Golgi.

    Interpro description:

    TRAPP plays a key role in the targeting and/or fusion of ER-to-Golgi transport vesicles with their acceptor compartment. TRAPP is a large multimeric protein that contains at least 10 subunits. This family contains many TRAPP family proteins. The Bet3 subunit is one of the better characterised TRAPP proteins and has a dimeric structure with hydrophobic channels. The channel entrances are located on a putative membrane-interacting surface that is distinctively flat, wide and decorated with positively charged residues. Bet3 is proposed to localise TRAPP to the Golgi.

    Proteins where this domain is known:
    PY02417    PY02475    PY03378    PY05174   


    PF04053 - Coatomer_WDAD (Pfam link)

    Interpro entry IPR006692 : Coatomer, WD associated region (Interpro link)

    Interpro description:

    Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits.

    This entry represents the WD-associated region found in coatomer subunits alpha, beta and beta' subunits. The alpha-subunit (RET1P) of the coatomer complex in Saccharomyces cerevisiae (Baker's yeast), participates in membrane transport between the endoplasmic reticulum and Golgi apparatus. The protein contains six WD-40 repeat motifs in its N-terminal region.

    More information about these proteins can be found at Protein of the Month: Clathrin.

    Proteins where this domain is known:
    PY01045    PY05987   


    PF04054 - Not1 (Pfam link)

    Interpro entry IPR007196 : (Interpro link)

    Pfam description:
    The Ccr4-Not complex is a global regulator of transcription that affects genes positively and negatively and is thought to regulate transcription factor TFIID.

    Interpro description:

    The Ccr4-Not complex is a global regulator of gene expression that is conserved from yeast to human. It affects genes positively and negatively and is thought to regulate transcription factor IID function. In Saccharomyces cerevisiae, it exists in two prominent forms and consists of at least nine core subunits: the five Not proteins (Not1p to Not5p), Caf1p, Caf40p, Caf130p and Ccr4p. The Ccr4-Not complex regulates many different cellular functions, including RNA degradation and transcription initiation. It may be a regulatory platform that senses nutrient levels and stress. Caf1p and Ccr4p, are directly involved in mRNA deadenylation, and Caf1p is associated with Dhh1p, a putative RNA helicase thought to be a component of the decapping complex. Pop2, a component of the Ccr4-Not complex, functions as a deadenylase.

    The Ccr4-Not complex is a global regulator of transcription that affects genes positively and negatively and is thought to regulate transcription factor TFIID.

    Proteins where this domain is known:
    PY01432    PY02383    PY03055   


    PF04055 - Radical_SAM (Pfam link)

    Interpro entry IPR007197 : Radical SAM (Interpro link)

    Pfam description:
    Radical SAM proteins catalyse diverse reactions, including unusual methylations, isomerisation, sulphur insertion, ring formation, anaerobic oxidation and protein radical formation.

    Interpro description:

    Radical SAM proteins catalyze diverse reactions, including unusual methylations, isomerization, sulphur insertion, ring formation, anaerobic oxidation and protein radical formation. Evidence exists that these proteins generate a radical species by reductive cleavage of S:-adenosylmethionine (SAM) through an unusual Fe-S centre.

    Proteins where this domain is known:
    PY01072    PY01291    PY03999    PY05984    PY06208   


    PF04056 - Ssl1 (Pfam link)

    Interpro entry IPR007198 : Ssl1-like (Interpro link)

    Pfam description:
    Ssl1-like proteins are 40kDa subunits of the Transcription factor II H complex.

    Interpro description:

    Ssl1-like proteins are 40 kDa subunits of the transcription factor II H complex. This domain is often found associated with the C2H2 type Zn-finger.

    Proteins where this domain is known:
    PY06921   


    PF04059 - RRM_2 (Pfam link)

    Interpro entry IPR007201 : (Interpro link)

    Interpro description:

    This RNA recognition motif 2 is found in Meiosis protein mei2. It is found C-terminal to the RNA-binding region RNP-1.

    Proteins where this domain is known:
    PY04891   


    PF04068 - RLI (Pfam link)

    Interpro entry IPR007209 : (Interpro link)

    Pfam description:
    Possible metal-binding domain in endoribonuclease RNase L inhibitor. Found at the N-terminal end of RNase L inhibitor proteins, adjacent to the 4Fe-4S binding domain, fer4, Pfam:PF00037. Also often found adjacent to the DUF367 domain Pfam:PF04034 in uncharacterised proteins. The RNase L system plays a major role in the anti-viral and anti-proliferative activities of interferons, and could possibly play a more general role in the regulation of RNA stability in mammalian cells. Inhibitory activity requires concentration-dependent association of RLI with RNase L.

    Interpro description:
    This is a possible metal-binding domain in endoribonuclease RNase L inhibitor. It is found at the N-terminal end of RNase L inhibitor proteins, adjacent to the 4Fe-4S binding domain, fer4. Also often found adjacent toin uncharacterised proteins. The RNase L system plays a major role in the anti-viral and anti-proliferative activities of interferons, and could possibly play a more general role in the regulation of RNA stability in mammalian cells. Inhibitory activity requires concentration-dependent association of RLI with RNase L.

    Proteins where this domain is known:
    PY04219    PY06626   


    PF04072 - LCM (Pfam link)

    Interpro entry IPR007213 : Leucine carboxyl methyltransferase (Interpro link)

    Pfam description:
    Family of leucine carboxyl methyltransferases EC:2.1.1.- . This family may need divides a the full alignment contains a significantly shorter mouse sequence.

    Interpro description:

    This entry represents a group of leucine carboxymethyltransferases which methylate the carboxyl group of leucine residues to form alpha-leucine ester residues. It includes LCTM1 which regulates the activity of serine/threonine phosphatase 2A (PP2A) through methylation of the C-terminal leucine residue of the catalytic subunit of PP2A . This affects the heteromultimeric composition of PP2A which in turn affects protein recognition and substrate specificity. Like many other methyltransferases LCTM1 uses S-adenosylmethionine (SAM) as the methyl donor. LCTM1 contains the common SAM-dependent methyltransferase core fold, with various insertions and additions creating a specific PP2A binding site. This entry also contains LCTM2, a homologue of LCTM1 which is not necessary for PP2A methylation and whose function is not clear.

    Proteins where this domain is known:
    PY00828   


    PF04073 - YbaK (Pfam link)

    Interpro entry IPR007214 : (Interpro link)

    Pfam description:
    This domain of unknown function is found in numerous prokaryote organisms. The structure of YbaK shows a novel fold. This domain also occurs in a number of prolyl-tRNA synthetases (proRS) from prokaryotes. Thus, the domain is thought to be involved in oligo-nucleotide binding, with possible roles in recognition/discrimination or editing of prolyl-tRNA.

    Interpro description:
    This domain of unknown function is found in numerous prokaryote organisms. The structure of YbaK shows a novel fold. This domain also occurs in a number of prolyl-tRNA synthetases (proRS) from prokaryotes. Thus, the domain is thought to be involved in oligonucleotide binding, with possible roles in recognition/discrimination or editing of prolyl-tRNA.

    Proteins where this domain is known:
    PY02018   


    PF04078 - Rcd1 (Pfam link)

    Interpro entry IPR007216 : (Interpro link)

    Pfam description:
    Two of the members in this family have been characterised as being involved in regulation of Ste11 regulated sex genes. Mammalian Rcd1 is a novel transcriptional cofactor that mediates retinoic acid-induced cell differentiation.

    Interpro description:

    Rcd1 (Required cell differentiation 1) -like proteins are found among a wide range of organisms. Rcd1 was initially identified as an essential factor in nitrogen starvation-invoked differentiation in fission yeast. This results largely from a defect in nitrogen starvation-invoked induction of ste11+, a key transcriptional factor gene required for the onset of sexual development. It is one of the most conserved proteins in eukaryotes, and its mammalian homologue is expressed in a variety of differentiating tissues. The mammalian Rcd1 is a novel transcriptional cofactor and is critical for retinoic acid-induced differentiation of F9 mouse teratocarcinoma cells, at least in part, via forming complexes with retinoic acid receptor and activation transcription factor-2 (ATF-2). Two of the members in this family have been characterised as being involved in regulation of Ste11 regulated sex genes.

    Proteins where this domain is known:
    PY05128   


    PF04083 - Abhydro_lipase (Pfam link)

    Interpro entry IPR006693 : AB-hydrolase associated lipase region (Interpro link)

    Interpro description:

    The alpha/beta hydrolase fold is common to several hydrolytic enzymes of widely differing phylogenetic origin and catalytic function. The core of each enzyme is similar: an alpha/beta sheet, not barrel, of eight beta-sheets connected by alpha-helices. This entry describes a closely associated region, which is found in a number of lipases.

    Proteins where this domain is known:
    PY04938   


    PF04084 - ORC2 (Pfam link)

    Interpro entry IPR007220 : Origin recognition complex subunit 2 (Interpro link)

    Pfam description:
    All DNA replication initiation is driven by a single conserved eukaryotic initiator complex termed he origin recognition complex (ORC). The ORC is a six protein complex. The function of ORC is reviewed in.

    Interpro description:

    All DNA replication initiation is driven by a single conserved eukaryotic initiator complex termed the origin recognition complex (ORC). The ORC is a six protein complex. The function of ORC is reviewed in. This entry is subunit 2, which binds the origin of replication. It plays a role in chromosome replication and mating type transcriptional silencing.

    Proteins where this domain is known:
    PY03235   


    PF04086 - SRP-alpha_N (Pfam link)

    Interpro entry IPR007222 : Signal recognition particle receptor, alpha subunit, N-terminal (Interpro link)

    Pfam description:
    SRP is a complex of six distinct polypeptides and a 7S RNA that is essential for transferring nascent polypeptide chains that are destined for export from the cell to the translocation apparatus of the endoplasmic reticulum (ER) membrane. SRP binds hydrophobic signal sequences as they emerge from the ribosome, and arrests translation.

    Interpro description:

    The signal recognition particle (SRP) is a multimeric protein, which along with its conjugate receptor (SR), is involved in targeting secretory proteins to the rough endoplasmic reticulum (RER) membrane in eukaryotes, or to the plasma membrane in prokaryotes. SRP recognises the signal sequence of the nascent polypeptide on the ribosome, retards its elongation, and docks the SRP-ribosome-polypeptide complex to the RER membrane via the SR receptor. SRP consists of six polypeptides (SRP9, SRP14, SRP19, SRP54, SRP68 and SRP72) and a single 300 nucleotide 7S RNA molecule. The RNA component catalyses the interaction of SRP with its SR receptor. In higher eukaryotes, the SRP complex consists of the Alu domain and the S domain linked by the SRP RNA. The Alu domain consists of a heterodimer of SRP9 and SRP14 bound to the 5' and 3' terminal sequences of SRP RNA. This domain is necessary for retarding the elongation of the nascent polypeptide chain, which gives SRP time to dock the ribosome-polypeptide complex to the RER membrane.

    The SR receptor is a monomer consisting of the loosely membrane-associated SR-alpha homologue FtsY, while the eukaryotic SR receptor is a heterodimer of SR-alpha (70 kDa) and SR-beta (25 kDa), both of which contain a GTP-binding domain. SR-alpha regulates the targeting of SRP-ribosome-nascent polypeptide complexes to the translocon. SR-alpha binds to the SRP54 subunit of the SRP complex. The SR-beta subunit is a transmembrane GTPase that anchors the SR-alpha subunit (a peripheral membrane GTPase) to the ER membrane. SR-beta interacts with the N-terminal SRX-domain of SR-alpha, which is not present in the bacterial FtsY homologue. SR-beta also functions in recruiting the SRP-nascent polypeptide to the protein-conducting channel.

    This entry represents the alpha subunit of the SR receptor.

    Proteins where this domain has been detected by our approach:
    PY04912   


    PF04095 - NAPRTase (Pfam link)

    Interpro entry IPR015977 : (Interpro link)

    Pfam description:
    Nicotinate phosphoribosyltransferase (EC:2.4.2.11) is the rate limiting enzyme that catalyses the first reaction in the NAD salvage synthesis. This family also includes Pre-B cell enhancing factor that is a cytokine Swiss:P43490. This family is related to Quinolinate phosphoribosyltransferase Pfam:PF01729.

    Interpro description:
    Nicotinate phosphoribosyltransferase is the rate-limiting enzyme that catalyses the first reaction in the NAD salvage synthesis. This family also contains a number of closely related proteins for which a catalytic activity has not been experimentally demonstrated.

    Proteins where this domain is known:
    PY03546   


    PF04099 - Sybindin (Pfam link)

    Interpro entry IPR007233 : Sybindin-like protein (Interpro link)

    Pfam description:
    Sybindin is a physiological syndecan-2 ligand on dendritic spines, the small protrusions on the surface of dendrites that receive the vast majority of excitatory synapses.

    Interpro description:
    Sybindin is a physiological syndecan-2 ligand on dendritic spines, the small protrusions on the surface of dendrites that receive the vast majority of excitatory synapses. Syndecan-2 induces spine formation by recruiting intracellular vesicles toward postsynaptic sites through the interaction with synbindin.

    Proteins where this domain is known:
    PY04582    PY07060   


    PF04100 - Vps53_N (Pfam link)

    Interpro entry IPR007234 : (Interpro link)

    Pfam description:
    Vps53 complexes with Vps52 and Vps54 to form a multi- subunit complex involved in regulating membrane trafficking events.

    Interpro description:
    Vps53 complexes with Vps52 and Vps54 to form a multi-subunit complex involved in regulating membrane trafficking events.

    Proteins where this domain is known:
    PY02098   


    PF04104 - DNA_primase_lrg (Pfam link)

    Interpro entry IPR007238 : DNA primase, large subunit, eukaryotic/archaeal (Interpro link)

    Pfam description:
    DNA primase is the polymerase that synthesises small RNA primers for the Okazaki fragments made during discontinuous DNA replication. DNA primase is a heterodimer of two subunits, the small subunit Pri1 (48 kDa in yeast), and the large subunit Pri2 (58 kDa in the yeast S. cerevisiae). The large subunit of DNA primase forms interactions with the small subunit and the structure implicates that it is not directly involved in catalysis, but plays roles in correctly positioning the primase/DNA complex, and in the transfer of RNA to DNA polymerase.

    Interpro description:
    DNA primase is the polymerase that synthesises small RNA primers for the Okazaki fragments made during discontinuous DNA replication. DNA primase is a heterodimer of two subunits, the small subunit Pri1 (48 kDa in yeast), and the large subunit Pri2 (58 kDa in the yeast Saccharomyces cerevisiae). Both subunits participate in the formation of the active site, but the ATP binding site is located on the small subunit. Primase function has also been demonstrated for human and mouse primase subunits.

    Proteins where this domain is known:
    PY00851   


    PF04106 - APG5 (Pfam link)

    Interpro entry IPR007239 : Autophagy protein 5 (Interpro link)

    Pfam description:
    Apg5 is directly required for the import of aminopeptidase I via the cytoplasm-to-vacuole targeting pathway.

    Interpro description:
    Macroautophagy is a bulk degradation process induced by starvation in eukaryotic cells. In yeast, 15 Apg proteins coordinate the formation of autophagosomes. No molecule involved in autophagy has yet been identified in higher eukaryotes. The pre-autophagosomal structure contains at least five Apg proteins: Apg1p, Apg2p, Apg5p, Aut7p/Apg8p and Apg16p. It is found in the vacuole. The C-terminal glycine of Apg12p is conjugated to a lysine residue of Apg5p via an isopeptide bond. During autophagy, cytoplasmic components are enclosed in autophagosomes and delivered to lysosomes/vacuoles. Auotphagy protein 16 (Apg16) has been shown to be bind to Apg5 and is required for the function of the Apg12p-Apg5p conjugate. Autophagy protein 5 (Apg5) is directly required for the import of aminopeptidase I via the cytoplasm-to-vacuole targeting pathway. This entry represents autophagy protein 5 (Apg5).

    Proteins where this domain is known:
    PY01266   


    PF04117 - Mpv17_PMP22 (Pfam link)

    Interpro entry IPR007248 : Mpv17/PMP22 (Interpro link)

    Pfam description:
    The 22-kDa peroxisomal membrane protein (PMP22) is a major component of peroxisomal membranes. PMP22 seems to be involved in pore forming activity and may contribute to the unspecific permeability of the organelle membrane. PMP22 is synthesised on free cytosolic ribosomes and then directed to the peroxisome membrane by specific targeting information. Mpv17 is a closely related peroxisomal protein. In mouse, the Mpv17 protein is involved in the development of early-onset glomerulosclerosis. More recently a homolog of Mpv17 in S. cerevisiae has been been found to be an integral membrane protein of the inner mitochondrial membrane where it has been proposed to have a role in ethanol metabolism and tolerance during heat-shock. Defects in MPV17 is associated with mitochondrial DNA depletion syndrome (MDDS) and Navajo neurohepatopathy (NNH. MDDS is a clinically heterogeneous group of disorders characterised by a reduction in mitochondrial DNA (mtDNA) copy number. Primary mtDNA depletion is inherited as an autosomal recessive trait and may affect single organs, typically muscle or liver, or multiple tissues. Individuals with the hepatocerebral form of mitochondrial DNA depletion syndrome have early progressive liver failure and neurologic abnormalities, hypoglycemia, and increased lactate in body fluids. NNH is an autosomal recessive disease that is prevalent among Navajo children in the South Western states of America. The major clinical features are hepatopathy, peripheral neuropathy, corneal anesthesia and scarring, acral mutilation, cerebral leukoencephalopathy, failure to thrive, and recurrent metabolic acidosis with intercurrent infections. Infantile, childhood, and classic forms of NNH have been described. Mitochondrial DNA depletion was detected in the livers of patients, suggesting a primary defect in mtDNA maintenance.

    Interpro description:

    The 22 kDa peroxisomal membrane protein (PMP22) is a major component of peroxisomal membranes. PMP22 seems to be involved in pore-forming activity and may contribute to the unspecific permeability of the organelle membrane. PMP22 is synthesised on free cytosolic ribosomes and then directed to the peroxisome membrane by specific targeting information. Mpv17 is a closely related peroxisomal protein involved in the development of early-onset glomerulosclerosis.

    A member of this family found in Saccharomyces cerevisiae (Baker's yeast) is an integral membrane protein of the inner mitochondrial membrane and has been suggested to play a role in mitochondrial function during heat shock.

    Proteins where this domain is known:
    PY05805   


    PF04118 - Dopey_N (Pfam link)

    Interpro entry IPR007249 : (Interpro link)

    Pfam description:
    DopA is the founding member of the Dopey family and is required for correct cell morphology and spatiotemporal organisation of multicellular structures in the filamentous fungus Aspergillus nidulans. DopA homologues are found in mammals. S. cerevisiae DOP1 is essential for viability and, affects cellular morphogenesis.

    Interpro description:
    DopA is the founding member of the Dopey family and is required for correct cell morphology and spatiotemporal organisation of multicellular structures in the filamentous fungus Emericella nidulans (Aspergillus nidulans). DopA homologues are found in mammals. Saccharomyces cerevisiae DOP1 is essential for viability and, affects cellular morphogenesis.

    Proteins where this domain is known:
    PY02646   


    PF04127 - DFP (Pfam link)

    Interpro entry IPR007085 : (Interpro link)

    Pfam description:
    The DNA/pantothenate metabolism flavoprotein (EC:4.1.1.36) affects synthesis of DNA, and pantothenate metabolism.

    Interpro description:

    This entry represents the C-terminal domain found in DNA/pantothenate metabolism flavoproteins, which affects synthesis of DNA and pantothenate metabolism. These proteins contain ATP, phosphopantothenate, and cysteine binding sites. The structure of this domain has been determined in human phosphopantothenoylcysteine (PPC) synthetase and as the PPC synthase domain (CoaB) from the Escherichia coli coenzyme A bifunctional protein CoaBC. This domain adopts a 3-layer alpha/beta/alpha fold with mixed beta-sheets, which topologically resembles a combination of Rossmann-like and ribokinase-like folds. The structure of these proteins predicts a ping pong mechanism with initial formation of an acyladenylate intermediate, followed by release of pyrophosphate and attack by cysteine to form the final products PPC and AMP.

    Proteins where this domain is known:
    PY05326   


    PF04128 - Psf2 (Pfam link)

    Interpro entry IPR007257 : GINS complex, Psf2 component (Interpro link)

    Pfam description:
    A eukaryotic specific domain of undetermined function.` The GINS complex is essential for initiation of DNA replication in Xenopus egg extracts. This 100 kD stable complex includes Sld5, Psf1, Psf2, and Psf3. Homologues of these components are found also in yeasts and in humans.

    Interpro description:

    DNA replication in eukaryotes results from a highly coordinated interaction between proteins, often as part of protein complexes, and the DNA template. One of the key early steps leading to DNA replication is formation of the prereplication complex, or pre-RC. The pre-RC is formed by the sequential binding of the origin recognition complex (ORC), Cdc6 and Cdt1 proteins, and the MCM complex. Activation of the pre-RC into the initiation complex (IC) is achieved via the action of S-phase kinases, eventually leading to the loading of the replication machinery.

    Recently, a novel replication complex, GINS (for Go, Ichi, Nii, and San; five, one, two, and three in Japanese), has been identified. The precise function of GINS is not known. However, genetic and two-hybrid interactions indicate that it mediates the loading of the enzymatic replication machinery at a step after the action of the S-phase kinases. Furthermore, GINS may be a part of the replication machinery itself, since it is found associated with replicating DNA. Electron microscopy of GINS shows that it forms a ring-like structure, reminiscent of the structure of PCNA, the DNA polymerase delta replication clamp.This observation, coupled with the observed interactions for GINS, indicates that the complex may represent the replication clamp for DNA polymerase epsilon.

    The GINS complex is essential for initiation of DNA replication in Xenopus egg extracts. This 100 kDa stable complex includes Sld5, Psf1, Psf2, and Psf3. Homologues of these components are found also in other eukaryotes. This family of proteins represents the Psf2 component.

    Proteins where this domain is known:
    PY03048    PY06505   


    PF04129 - Vps52 (Pfam link)

    Interpro entry IPR007258 : (Interpro link)

    Pfam description:
    Vps52 complexes with Vps53 and Vps54 to form a multi- subunit complex involved in regulating membrane trafficking events.

    Interpro description:
    Vps52 complexes with Vps53 and Vps54 to form a multi-subunit complex involved in regulating membrane trafficking events.

    Proteins where this domain is known:
    PY06640   


    PF04130 - Spc97_Spc98 (Pfam link)

    Interpro entry IPR007259 : Spc97/Spc98 (Interpro link)

    Pfam description:
    The spindle pole body (SPB) functions as the microtubule-organising centre in yeast. Members of this family are spindle pole body (SBP) components such as Spc97 and Spc98 that form a complex with gamma-tubulin. This family of proteins includes the grip motif 1 and grip motif 2.

    Interpro description:

    Members of this family are spindle pole body (SBP) components such as Spc97, Spc98 and gamma-tubulin. The SPB functions as the microtubule-organising centre in yeast, with the microtubule cytoskeleton playing an essential role in chromosome segregation, cellular organisation and vesicle trafficking in eukaryotic cells. In most cells, the centrosome is the primary microtubule-organising centre that nucleates and organises microtubules. Gamma-tubulin localises to centrosomes and is required for microtubule nucleation. In Saccharomyces cerevisiae, gamma-tubulin forms a stable complex with Spc97 and Spc98.

    Proteins where this domain is known:
    PY00501    PY01833    PY01910    PY05908   


    PF04135 - Nop10p (Pfam link)

    Interpro entry IPR007264 : (Interpro link)

    Pfam description:
    Nop10p is a nucleolar protein that is specifically associated with H/ACA snoRNAs. It is essential for normal 18S rRNA production and rRNA pseudouridylation by the ribonucleoprotein particles containing H/ACA snoRNAs (H/ACA snoRNPs). Nop10p is probably necessary for the stability of these RNPs.

    Interpro description:
    Nop10p is a nucleolar protein that is specifically associated with H/ACA snoRNAs. It is essential for normal 18S rRNA production and rRNA pseudouridylation by the ribonucleoprotein particles containing H/ACA snoRNAs (H/ACA snoRNPs). Nop10p is probably necessary for the stability of these RNPs.

    Proteins where this domain is known:
    PY05263   


    PF04137 - ERO1 (Pfam link)

    Interpro entry IPR007266 : Endoplasmic reticulum oxidoreductin 1 (Interpro link)

    Pfam description:
    Members of this family are required for the formation of disulphide bonds in the ER.

    Interpro description:
    Members of this family are required for the formation of disulphide bonds in the endoplasmic reticulum.

    Proteins where this domain is known:
    PY07009   


    PF04140 - ICMT (Pfam link)

    Interpro entry IPR007269 : Isoprenylcysteine carboxyl methyltransferase (Interpro link)

    Pfam description:
    The isoprenylcysteine o-methyltransferase (EC:2.1.1.100) family carry out carboxyl methylation of cleaved eukaryotic proteins that terminate in a CaaX motif. In Saccharomyces cerevisiae this methylation is carried out by Ste14p, an integral endoplasmic reticulum membrane protein. Ste14p is the founding member of the isoprenylcysteine carboxyl methyltransferase (ICMT) family, whose members share significant sequence homology.

    Interpro description:
    The isoprenylcysteine o-methyltransferase carries out carboyxl methylation of cleaved eukaryotic proteins that terminate in a CaaX motif. In Saccharomyces cerevisiae (Baker's yeast) this methylation is carried out by Ste14p, an integral endoplasmic reticulum membrane protein. Ste14p is the founding member of the isoprenylcysteine carboxyl methyltransferase (ICMT) family, whose members share significant sequence homology.

    Proteins where this domain is known:
    PY05541   


    PF04145 - Ctr (Pfam link)

    Interpro entry IPR007274 : Ctr copper transporter (Interpro link)

    Pfam description:
    The redox active metal copper is an essential cofactor in critical biological processes such as respiration, iron transport, oxidative stress protection, hormone production, and pigmentation. A widely conserved family of high-affinity copper transport proteins (Ctr proteins) mediates copper uptake at the plasma membrane. A series of clustered methionine residues in the hydrophilic extracellular domain, and an MXXXM motif in the second transmembrane domain, are important for copper uptake. These methionine probably coordinate copper during the process of metal transport.

    Interpro description:

    The redox active metal copper is an essential cofactor in critical biological processes such as respiration, iron transport, oxidative stress protection, hormone production, and pigmentation. A widely conserved family of high-affinity copper transport proteins (Ctr proteins) mediates copper uptake at the plasma membrane. A series of clustered methionine residues in the hydrophilic extracellular domain, and an MXXXM motif in the second transmembrane domain, are important for copper uptake. These methionines probably coordinate copper during the process of metal transport.

    Proteins where this domain is known:
    PY00413   


    PF04146 - YTH (Pfam link)

    Interpro entry IPR007275 : (Interpro link)

    Pfam description:
    A protein of the YTH family has been shown to selectively remove transcripts of meiosis-specific genes expressed in mitotic cells. It has been speculated that in higher eukaryotic YTH-family members may be involved in similar mechanaisms to supress gene regulation during gametogenesis or general silencing. The rat protein Swiss:Q9QY02 YT521-B is a tyrosine-phosphorylated nuclear protein, that interacts with the nuclear transcriptosomal component scaffold attachment factor B, and the 68-kDa Src substrate associated during mitosis, Sam68. In vivo splicing assays demonstrated that YT521-B modulates alternative splice site selection in a concentration-dependent manner. The domain is predicted to have four alpha helices and six beta strands.

    Interpro description:
    This family of poorly characterised proteins containsYT521-B, a putative splicing factor from rat. YT521-B is a tyrosine-phosphorylated nuclear protein, that interacts with the nuclear transcriptosomal component scaffold attachment factor B, and the 68 kDa Src substrate associated during mitosis, Sam68. In vivo splicing assays demonstrated that YT521-B modulates alternative splice site selection in a concentration-dependent manner.

    Proteins where this domain is known:
    PY01203    PY07308   


    PF04152 - Mre11_DNA_bind (Pfam link)

    Interpro entry IPR007281 : Mre11, DNA-binding (Interpro link)

    Pfam description:
    The Mre11 complex is a multi-subunit nuclease that is composed of Mre11, Rad50 and Nbs1/Xrs2, and is involved in checkpoint signalling and DNA replication. Mre11 has an intrinsic DNA-binding activity that is stimulated by Rad50 on its own or in combination with Nbs1.

    Interpro description:
    The Mre11 complex is a multi-subunit nuclease that is composed of Mre11, Rad50 and Nbs1/Xrs2, and is involved in checkpoint signalling and DNA replication. Mre11 has an intrinsic DNA-binding activity that is stimulated by Rad50 on its own or in combination with Nbs1.

    Proteins where this domain is known:
    PY06177   


    PF04153 - NOT2_3_5 (Pfam link)

    Interpro entry IPR007282 : NOT2/NOT3/NOT5 (Interpro link)

    Pfam description:
    NOT1, NOT2, NOT3, NOT4 and NOT5 form a nuclear complex that negatively regulates the basal and activated transcription of many genes. This family includes NOT2, NOT3 and NOT5.

    Interpro description:
    NOT1, NOT2, NOT3, NOT4 and NOT5 form a nuclear complex that negatively regulates the basal and activated transcription of many genes. This family includes NOT2, NOT3 and NOT5.

    Proteins where this domain is known:
    PY03469    PY06599   


    PF04158 - Sof1 (Pfam link)

    Interpro entry IPR007287 : (Interpro link)

    Pfam description:
    Sof1 is essential for cell growth and is a component of the nucleolar rRNA processing machinery.

    Interpro description:
    Sof1 is essential for cell growth and is a component of the nucleolar rRNA processing machinery.

    Proteins where this domain is known:
    PY04096   


    PF04177 - TAP42 (Pfam link)

    Interpro entry IPR007304 : TAP42-like protein (Interpro link)

    Pfam description:
    The TOR signalling pathway activates a cell-growth program in response to nutrients. TIP41 (PFAM:PF04176) interacts with TAP42 and negatively regulates the TOR signaling pathway.

    Interpro description:
    The TOR signalling pathway activates a cell-growth program in response to nutrients. TIP41 interacts with TAP42 and negatively regulates the TOR signalling pathway.

    Proteins where this domain is known:
    PY01610   


    PF04178 - Got1 (Pfam link)

    Interpro entry IPR007305 : Got1-like protein (Interpro link)

    Pfam description:
    Traffic through the yeast Golgi complex depends on a member of the syntaxin family of SNARE proteins, Sed5, present in early Golgi cisternae. Got1 is thought to facilitate Sed5-dependent fusion events.

    Interpro description:
    Traffic through the yeast Golgi complex depends on a member of the syntaxin family of SNARE proteins, Sed5, present in early Golgi cisternae. Got1 is thought to facilitate Sed5-dependent fusion events.

    Proteins where this domain is known:
    PY07464   


    PF04181 - RPAP2_Rtr1 (Pfam link)

    Interpro entry IPR007308 : (Interpro link)

    Pfam description:
    This family includes the human RPAP2 (RNAP II associated polypeptide) protein and the yeast Rtr1 protein. It has been suggested that this family of proteins are regulators of core RNA polymerase II function.

    Interpro description:
    This is a protein of unknown function.

    Proteins where this domain is known:
    PY03933   


    PF04182 - B-block_TFIIIC (Pfam link)

    Interpro entry IPR007309 : (Interpro link)

    Pfam description:
    Yeast transcription factor IIIC (TFIIIC) is a multi-subunit protein complex that interacts with two control elements of class III promoters called the A and B blocks. This family represents the subunit within TFIIIC involved in B-block binding.

    Interpro description:

    Yeast transcription factor IIIC (TFIIIC) is a multisubunit protein complex that interacts with two control elements of class III promoters called the A and B blocks. This family represents the subunit within TFIIIC involved in B-block binding. Although defined as a yeast protein, it is also found in a number of other organisms.

    Proteins where this domain is known:
    PY02442   


    PF04189 - Gcd10p (Pfam link)

    Interpro entry IPR007316 : Eukaryotic initiation factor 3, gamma subunit (Interpro link)

    Pfam description:
    eIF-3 is a multi-subunit complex that stimulates translation initiation in vitro at several different steps. This family corresponds to the gamma subunit if eIF3. The Yeast protein Gcd10p has also been shown to be part of a complex with the methyltransferase Gcd14p that is involved in modifying tRNA.

    Interpro description:
    eIF-3 is a multisubunit complex that stimulates translation initiation in vitro at several different steps. This family corresponds to the gamma subunit of eIF3.

    Proteins where this domain is known:
    PY03441   


    PF04192 - Utp21 (Pfam link)

    Interpro entry IPR007319 : Small-subunit processome, Utp21 (Interpro link)

    Pfam description:
    Utp21 is a subunit of U3 snoRNP, which is essential for synthesis of 18S rRNA.

    Interpro description:

    A large ribonuclear protein complex is required for the processing of the small-ribosomal-subunit rRNA - the small-subunit (SSU) processome. This preribosomal complex contains the U3 snoRNA and at least 40 proteins, which have the following properties:

    There appears to be a linkage between polymerase I transcription and the formation of the SSU processome; as some, but not all, of the SSU processome components are required for pre-rRNA transcription initiation. These SSU processome components have been termed t-Utps. They form a pre-complex with pre-18S rRNA in the absence of snoRNA U3 and other SSU processome components. It has been proposed that the t-Utp complex proteins are both rDNA and rRNA binding proteins that are involved in the initiation of pre18S rRNA transcription. Initially binding to rDNA then associating with the 5' end of the nascent pre18S rRNA. The t-Utpcomplex forms the nucleus around which the rest of the SSU processome components, including snoRNA U3, assemble. From electron microscopy the SSU processome may correspond to the terminal knobs visualized at the 5' ends of nascent 18S rRNA.

    Utp21 is a component of the SSU processome, which is required for pre-18S rRNA processing. It interacts with Utp18.

    Proteins where this domain has been detected by our approach:
    PY05885   


    PF04194 - PDCD2_C (Pfam link)

    Interpro entry IPR007320 : Programmed cell death protein 2, C-terminal (Interpro link)

    Interpro description:

    PDCD2 is localized predominantly in the cytosol of cells situated at the opposite pole of the germinal centre from the centroblasts as well as in cells in the mantle zone. It has been shown to interact with BCL6, an evolutionarily conserved Kruppel-type zinc finger protein that functions as a strong transcriptional repressor and is required for germinal centre development. The rat homologue, Rp8, is associated with programmed cell death in thymocytes.

    Proteins where this domain is known:
    PY01026   


    PF04212 - MIT (Pfam link)

    Interpro entry IPR007330 : (Interpro link)

    Pfam description:
    The MIT domain forms an asymmetric three-helix bundle and binds ESCRT-III (endosomal sorting complexes required for transport) substrates.

    Interpro description:

    The MIT domain is found in vacuolar sorting proteins, spastin (probable ATPase involved in the assembly or function of nuclear protein complexes), and a sorting nexin, which may play a role in intracellular trafficking.

    Proteins where this domain is known:
    PY00672   


    PF04252 - RNA_Me_trans (Pfam link)

    Interpro entry IPR007364 : (Interpro link)

    Pfam description:
    This family of proteins are predicted to be alpha/beta-knot SAM-dependent RNA methyltransferases.

    Interpro description:

    This family of proteins are predicted to be alpha/beta-knot SAM-dependent RNA methyltransferases.

    Proteins where this domain is known:
    PY05726   


    PF04258 - Peptidase_A22B (Pfam link)

    Interpro entry IPR007369 : Peptidase A22B, signal peptide peptidase (Interpro link)

    Pfam description:
    The members of this family are membrane proteins. In some proteins this region is found associated with Pfam:PF02225. This family corresponds with Merops subfamily A22B, the type example of which is signal peptide peptidase. There is a sequence-similarity relationship with Pfam:PF01080.

    Interpro description:

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    Aspartic endopeptidases of vertebrate, fungal and retroviral origin have been characterised. More recently, aspartic endopeptidases associated with the processing of bacterial type 4 prepilin and archaean preflagellin have been described.

    Structurally, aspartic endopeptidases are bilobal enzymes, each lobe contributing a catalytic Asp residue, with an extended active site cleft localised between the two lobes of the molecule. One lobe has probably evolved from the other through a gene duplication event in the distant past. In modern-day enzymes, although the three-dimensional structures are very similar, the amino acid sequences are more divergent, except for the catalytic site motif, which is very conserved. The presence and position of disulphide bridges are other conserved features of aspartic peptidases. All or most aspartate peptidases are endopeptidases. These enzymes have been assigned into clans (proteins which are evolutionary related), and further sub-divided into families, largely on the basis of their tertiary structure.

    This group of sequences contain aspartic endopeptidases belong to MEROPS peptidase family A22 (presenilin family, clan AD): subfamily A22B.

    The peptidases were originally classified by hierarchical homology to the most conserved member - IMPAS 1. They are also known as signal peptide peptidase (SPP). They belong to the I-CliP family of peptidases. SPP cleaves cleaves remnant signal peptides left behind in the membrane by the action of signal peptidase and also plays key roles in immune surveillance and the maturation of certain viral proteins . SPPs do not require cofactors as demonstrated by expression in bacteria and purification of a proteolytically active form. The C-terminal region defines the functional domain, which is in itself sufficient for proteolytic activity.

    Proteins where this domain is known:
    PY06507   


    PF04263 - TPK_catalytic (Pfam link)

    Interpro entry IPR007371 : Thiamin pyrophosphokinase, catalytic region (Interpro link)

    Pfam description:
    Family of thiamin pyrophosphokinase (EC:2.7.6.2). Thiamin pyrophosphokinase (TPK) catalyses the transfer of a pyrophosphate group from ATP to vitamin B1 (thiamin) to form the coenzyme thiamin pyrophosphate (TPP). Thus, TPK is important for the formation of a coenzyme required for central metabolic functions. The structure of thiamin pyrophosphokinase suggest that the enzyme may operate by a mechanism of pyrophosphoryl transfer similar to those described for pyrophosphokinases functioning in nucleotide biosynthesis.

    Interpro description:
    Thiamin pyrophosphokinase (TPK) catalyzes the transfer of a pyrophosphate group from ATP to vitamin B1 (thiamin) to form the coenzyme thiamin pyrophosphate (TPP). Thus, TPK is important for the formation of a coenzyme required for central metabolic functions. The structure of thiamin pyrophosphokinase suggests that the enzyme may operate by a mechanism of pyrophosphoryl transfer similar to those described for pyrophosphokinases functioning in nucleotide biosynthesis.

    Proteins where this domain is known:
    PY05196   


    PF04265 - TPK_B1_binding (Pfam link)

    Interpro entry IPR007373 : Thiamin pyrophosphokinase, vitamin B1-binding region (Interpro link)

    Pfam description:
    Family of thiamin pyrophosphokinase (EC:2.7.6.2). Thiamin pyrophosphokinase (TPK) catalyses the transfer of a pyrophosphate group from ATP to vitamin B1 (thiamin) to form the coenzyme thiamin pyrophosphate (TPP). Thus, TPK is important for the formation of a coenzyme required for central metabolic functions. The structure of thiamin pyrophosphokinase suggest that the enzyme may operate by a mechanism of pyrophosphoryl transfer similar to those described for pyrophosphokinases functioning in nucleotide biosynthesis.

    Interpro description:
    Thiamin pyrophosphokinase (TPK) catalyzes the transfer of a pyrophosphate group from ATP to vitamin B1 (thiamin) to form the coenzyme thiamin pyrophosphate (TPP). Thus, TPK is important for the formation of a coenzyme required for central metabolic functions. The structure of thiamin pyrophosphokinase suggest that the enzyme may operate by a mechanism of pyrophosphoryl transfer similar to those described for pyrophosphokinases functioning in nucleotide biosynthesis.

    Proteins where this domain is known:
    PY05196   


    PF04280 - Tim44 (Pfam link)

    Interpro entry IPR007379 : Mitochondrial import inner membrane translocase, subunit Tim44 (Interpro link)

    Pfam description:
    Tim44 is an essential component of the machinery that mediates the translocation of nuclear-encoded proteins across the mitochondrial inner membrane. Tim44 is thought to bind phospholipids of the mitochondrial inner membrane both by electrostatic interactions and by penetrating the polar head group region. This family includes the C-terminal region of Tim44 that has been shown to form a stable proteolytic fragment in yeast. This region is also found in a set of smaller bacterial proteins. The molecular function of the bacterial members of this family is unknown but transport seems likely. The crystal structure of the C terminal of Tim44 has revealed a large hydrophobic pocket which might play an important role in interacting with the acyl chains of lipid molecules in the mitochondrial membrane.

    Interpro description:

    Tim44 is an essential component of the machinery that mediates the translocation of nuclear-encoded proteins across the mitochondrial inner membrane. Tim44 is thought to bind phospholipids of the mitochondrial inner membrane both by electrostatic interactions and by penetrating the polar head group region.

    Proteins where this domain is known:
    PY06558   


    PF04325 - DUF465 (Pfam link)

    Interpro entry IPR007420 : (Interpro link)

    Pfam description:
    Family members are found in small bacterial proteins, and also in the heavy chains of eukaryotic myosin and kinesin, C terminal of the motor domain (Myosin Pfam:PF00063, Kinesin Pfam:PF00225). Members of this family may form coiled coil structures.

    Interpro description:
    Family members are found in small bacterial proteins, and also in the heavy chains of eukaryotic myosin and kinesin, C-terminal of the motor domain. Members of this family may form coiled coil structures.

    Proteins where this domain has been detected by our approach:
    PY05257   


    PF04383 - KilA-N (Pfam link)

    Interpro entry IPR018004 : (Interpro link)

    Pfam description:
    The amino-terminal module of the D6R/N1R proteins defines a novel, conserved DNA-binding domain (the KilA-N domain) that is found in a wide range of proteins of large bacterial and eukaryotic DNA viruses. The KilA-N domain family also includes the previously defined APSES domain. The KilA-N and APSES domains may also share a common fold with the nucleic acid-binding modules of the LAGLIDADG nucleases and the amino-terminal domains of the tRNA endonuclease.

    Interpro description:

    The amino-terminal module of the poxvirus D6R/NIR proteins defines a novel conserved DNA-binding domain (the KilA-N domain) that is found in a wide range of proteins of large bacterial and eukaryotic DNA viruses. Putative proteins with homology to the KilA-N domain have also been identified in Maverick transposable elements of the parabasalid protozoa Trichomonas vaginalis. The KilA-N domain has been suggested to be homologous to the fungal DNA-binding APSES domain (see. In all proteins shown to contain the KilA-N domain, it occurs at the extreme amino terminus accompanied by a wide range of distinct carboxy-terminal domains. These carboxy-terminal modules may be enzymes, such as the nuclease domains, or might mediate additional, specific interactions with nucleic acids or proteins, like the RING (see or CCCH fingers in the poxviruses. The KilA-N domain is predicted to adopt an alpha-beta fold with four conserved strands and at least two conserved helices. Some proteins known to contain a KilA-N domain are listed below:

    Proteins where this domain has been detected by our approach:
    PY06922   


    PF04384 - DUF528 (Pfam link)

    Interpro entry IPR007479 : ISC system FeS cluster assembly, IscX (Interpro link)

    Pfam description:
    Small bacterial protein of unknown function.

    Interpro description:

    Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S] form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.

    The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transfering them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly.

    The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. No specific functions have been assigned to SufB and SufD. SufA is homologous to IscA, acting as a scaffold protein in which Fe and S atoms are assembled into [FeS] cluster forms, which can then easily be transferred to apoproteins targets.

    In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen.

    This entry represents IscX proteins (also known as hypothetical protein YfhJ) that are part of the ISC system. IscX is active as a monomer. The structure of YfhJ is an orthogonal alpha-bundle. YfhJ is a small acidic protein that binds IscS, and contains a modified winged helix motif that is usually found in DNA-binding proteins. YfhJ/IscX can bind Fe, and may function as an Fe donor in the assembly of FeS clusters

    Proteins where this domain is known:
    PY01090   


    PF04387 - PTPLA (Pfam link)

    Interpro entry IPR007482 : (Interpro link)

    Pfam description:
    This family includes the mammalian protein tyrosine phosphatase-like protein, PTPLA. A significant variation of PTPLA from other protein tyrosine phosphatases is the presence of proline instead of catalytic arginine at the active site. It is thought that PTPLA proteins have a role in the development, differentiation, and maintenance of a number of tissue types.

    Interpro description:

    Protein tyrosine (pTyr) phosphorylation is a common post-translational modification which can create novel recognition motifs for protein interactions and cellular localisation, affect protein stability, and regulate enzyme activity. Consequently, maintaining an appropriate level of protein tyrosine phosphorylation is essential for many cellular functions. Tyrosine-specific protein phosphatases (PTPase; catalyse the removal of a phosphate group attached to a tyrosine residue, using a cysteinyl-phosphate enzyme intermediate. These enzymes are key regulatory components in signal transduction pathways (such as the MAP kinase pathway) and cell cycle control, and are important in the control of cell growth, proliferation, differentiation and transformation. The PTP superfamily can be divided into four subfamilies:

    Based on their cellular localisation, PTPases are also classified as:

    All PTPases carry the highly conserved active site motif C(X)5R (PTP signature motif), employ a common catalytic mechanism, and share a similar core structure made of a central parallel beta-sheet with flanking alpha-helices containing a beta-loop-alpha-loop that encompasses the PTP signature motif. Functional diversity between PTPases is endowed by regulatory domains and subunits.

    This family includes the mammalian protein tyrosine phosphatase-like protein, PTPLA. A significant variation of PTPLA from other protein tyrosine phosphatases is the presence of proline instead of catalytic arginine at the active site. It is thought that PTPLA proteins have a role in the development, differentiation, and maintenance of a number of tissue types.

    Proteins where this domain is known:
    PY02178   


    PF04408 - HA2 (Pfam link)

    Interpro entry IPR007502 : Helicase-associated region (Interpro link)

    Pfam description:
    This presumed domain is about 90 amino acid residues in length. It is found is a diverse set of RNA helicases. Its function is unknown, however it seems likely to be involved in nucleic acid binding.

    Interpro description:
    This presumed domain is about 90 amino acid residues in length. It is found as a diverse set of RNA helicases. Its function is unknown, however it seems likely to be involved in nucleic acid binding.

    Proteins where this domain is known:
    PY00707    PY00835    PY03686    PY04107    PY06080    PY07358   

    Proteins where this domain has been detected by our approach:
    PY00742   


    PF04410 - Gar1 (Pfam link)

    Interpro entry IPR007504 : Gar1 protein RNA-binding region (Interpro link)

    Pfam description:
    Gar1 is a small nucleolar RNP that is required for pre-mRNA processing and pseudouridylation. It is co-immunoprecipitated with the H/ACA families of snoRNAs. This family represents the conserved central region of Gar1. This region is necessary and sufficient for normal cell growth, and specifically binds two snoRNAs snR10 and snR30. This region is also necessary for nucleolar targeting, and it is thought that the protein is co-transported to the nucleolus as part of a nucleoprotein complex. In humans, Gar1 is also component of telomerase in vivo.

    Interpro description:
    Gar1 is a small nucleolar RNP that is required for pre-mRNA processing and pseudouridylation. It is co-immunoprecipitated with the H/ACA families of snoRNAs. This family represents the conserved central region of Gar1. This region is necessary and sufficient for normal cell growth, and specifically binds two snoRNAs snR10 and snR30. This region is also necessary for nucleolar targeting, and it is thought that the protein is co-transported to the nucleolus as part of a nucleoprotein complex. In humans, Gar1 is also component of telomerase in vivo.

    Proteins where this domain is known:
    PY02326   


    PF04423 - Rad50_zn_hook (Pfam link)

    Interpro entry IPR007517 : Rad50 zinc hook (Interpro link)

    Pfam description:
    The Mre11 complex (Mre11 Rad50 Nbs1) is central to chromosomal maintenance and functions in homologous recombination, telomere maintenance and sister chromatid association. The Rad50 coiled-coil region contains a dimer interface at the apex of the coiled coils in which pairs of conserved Cys-X-X-Cys motifs form interlocking hooks that bind one Zn ion. This alignment includes the zinc hook motif and a short stretch of coiled-coil on either side.

    Interpro description:
    The Mre11 complex (Mre11 Rad50 Nbs1) is central to chromosomal maintenance and functions in homologous recombination, telomere maintenance and sister chromatid association. The Rad50 coiled-coil region contains a dimer interface at the apex of the coiled coils in which pairs of conserved Cys-X-X-Cys motifs form interlocking hooks that bind one Zn ion. This alignment includes the zinc hook motif and a short stretch of coiled-coil on either side.

    Proteins where this domain has been detected by our approach:
    PY00626   


    PF04424 - DUF544 (Pfam link)

    Interpro entry IPR007518 : (Interpro link)

    Pfam description:
    Eukaryotic protein of unknown function.

    Interpro description:
    This is a eukaryotic protein of unknown function.

    Proteins where this domain is known:
    PY03275   


    PF04427 - Brix (Pfam link)

    Interpro entry IPR007109 : (Interpro link)

    Interpro description:

    The Brix domain is found in a number of eukaryotic proteins including some from Saccharomyces cerevisiae and Homo sapiens, Arabidopsis thaliana Peter Pan-like protein and several hypothetical proteins.

    There are six (one archaean and five eukaryotic) protein families which have a similar domain architecture with a central globular Brix domain. They have an optional N- and obligatory C-terminal segments, which both have charged low-complexity regions.

    Proteins from the Imp4/Brix superfamily appear to be involved in ribosomal RNA processing, which essential for the functioning of all cells. The N- and C-terminal halves of a member of the superfamily, Mil, show significant structural similarity to one another. This suggests an origin by means of an ancestral duplication. Both halves have the same fold as the anticodon-binding domain of class IIa aminoacyl-tRNA synthetases, with greater conservation seen in the N-terminal half. Structural evidence suggests that the Imp4/Brix superfamily proteins could bind single-stranded segments of RNA along a concave surface formed by the N-terminal half of their beta-sheet and a central alpha-helix.

    Proteins where this domain is known:
    PY01473    PY03839    PY05510   


    PF04433 - SWIRM (Pfam link)

    Interpro entry IPR007526 : (Interpro link)

    Pfam description:
    This SWIRM domain is a small alpha-helical domain of about 85 amino acid residues found in chromosomal proteins. It contains a helix-turn helix motif and binds to DNA.

    Interpro description:

    The SWIRM domain is a small alpha-helical domain of about 85 amino acid residues found in eukaryotic chromosomal proteins. It is named after the proteins SWI3, RSC8 and MOIRA in which it was first recognised. This domain is predicted to mediate protein-protein interactions in the assembly of chromatin-protein complexes. The SWIRM domain can be linked to different domains, such as the ZZ-type zinc finger, the Myb DNA-binding domain, the HORMA domain, the amino-oxidase domain, the chromo domain, and the JAB1/PAD1 domain.

    Proteins where this domain has been detected by our approach:
    PY03808   


    PF04434 - SWIM (Pfam link)

    Interpro entry IPR007527 : Zinc finger, SWIM-type (Interpro link)

    Pfam description:
    This domain is found in bacterial, archaeal and eukaryotic proteins. It is predicted to be organised into two N-terminal beta-strands and a C-terminal alpha helix, thus possibly adopting a fold similar to that of the C2H2 zinc finger (Pfam:PF00096). SWIM is thought to be a versatile domain that can interact with DNA or proteins in different contexts.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents the SWIM (SWI2/SNF2 and MuDR) zinc-binding domain, which is found in a variety of prokaryotic and eukaryotic proteins, such as mitogen-activated protein kinase kinase kinase 1 (or MEKK1). It is also found in the related protein MEX (MEKK1-related protein X), a testis-expressed protein that acts as an E3 ubiquitin ligase through the action of E2 ubiquitin-conjugating enzymes in the proteasome degradation pathway; the SWIM domain is critical for MEX ubiquitination. SWIM domains are also found in the homologous recombination protein Sws1, as well as in several hypothetical proteins.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY04860   


    PF04438 - zf-HIT (Pfam link)

    Interpro entry IPR007529 : (Interpro link)

    Pfam description:
    This presumed zinc finger contains up to 6 cysteine residues that could coordinate zinc. The domain is named after the HIT protein Swiss:P46973. This domain is also found in the Thyroid receptor interacting protein 3 (TRIP-3) Swiss:Q15649 that specifically interact with the ligand binding domain of the thyroid receptor.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents the HIT-type zinc finger, which contains 7 conserved cysteines and one histidine that can potentially coordinate two zinc atoms. It has been named after the first protein that originally defined the domain: the yeast HIT1 protein. The HIT-type zinc finger displays some sequence similarities to the MYND-type zinc finger. The function of this domain is unknown but it is mainly found in nuclear proteins involved in gene regulation and chromatin remodeling. This domain is also found in the thyroid receptor interacting protein 3 (TRIP-3) that specifically interacts with the ligand binding domain of the thyroid receptor.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY03423   


    PF04442 - CtaG_Cox11 (Pfam link)

    Interpro entry IPR007533 : Cytochrome c oxidase assembly protein CtaG/Cox11 (Interpro link)

    Pfam description:
    Cytochrome c oxidase assembly protein is essential for the assembly of functional cytochrome oxidase protein. In eukaryotes it is an integral protein of the mitochondrial inner membrane . Cox11 is essential for the insertion of Cu(I) ions to form the CuB site. This is essential for the stability of other structures in subunit I, for example haems a and a3, and the magnesium/manganese centre. Cox11 is probably only required in sub-stoichiometric amounts relative to the structural units. The C terminal region of the protein is known to form a dimer. Each monomer coordinates one Cu(I) ion via three conserved cysteine residues (111, 208 and 210) in Saccharomyces cerevisiae (Swiss:P19516). Met 224 is also thought to play a role in copper transfer or stabilising the copper site.

    Interpro description:
    Cytochrome c oxidase assembly protein is essential for the assembly of functional cytochrome oxidase protein. In eukaryotes it is an integral protein of the mitochondrial inner membrane. Cox11 is essential for the insertion of Cu(I) ions to form the CuB site. This is essential for the stability of other structures in subunit I, for example haems a and a3, and the magnesium/manganese centre. Cox11 is probably only required in sub-stoichiometric amounts relative to the structural units. The C-terminal region of the protein is known to form a dimer. Each monomer coordinates one Cu(I) ion via three conserved cysteine residues (111, 208 and 210) in Saccharomyces cerevisiae . Met 224 is also thought to play a role in copper transfer or stabilising the copper site.

    Proteins where this domain is known:
    PY00708   


    PF04446 - Thg1 (Pfam link)

    Interpro entry IPR007537 : (Interpro link)

    Pfam description:
    The Thg1 protein from Saccharomyces cerevisiae is responsible for adding a GMP residue to the 5\' end of tRNA His.

    Interpro description:

    The Thg1 protein from Saccharomyces cerevisiae (Baker's yeast) is responsible for adding a GMP residue to the 5' end of tRNA His.

    Proteins where this domain is known:
    PY02052   


    PF04452 - Methyltrans_RNA (Pfam link)

    Interpro entry IPR006700 : Ribosomal RNA small subunit methyltransferase E (Interpro link)

    Pfam description:
    RNA methyltransferases modify nucleotides during ribosomal RNA maturation in a site-specific manner. The Escherichia coli member is specific for U1498 methylation.

    Interpro description:

    Methyltransferases (Mtases) are responsible for the transfer of methyl groups between two molecules. The transfer of the methyl group from the ubiquitous S-adenosyl-L-methionine (AdoMet) to either nitrogen, oxygen or carbon atoms is frequently employed in diverse organisms. The reaction is catalyzed by Mtases and modifies DNA, RNA, proteins or small molecules, such as catechol, for regulatory purposes. Proteins in this entry belong to the RsmE family of Mtases, this is supported by crystal structural studying, which show a close structural homology to other known methyltransferases.

    This entry contains RsmE of Escherichia coli, which specifically methylates the uridine in position 1498 of 16S rRNA in the fully assembled 30S ribosomal subunit.

    Proteins where this domain is known:
    PY02213   


    PF04502 - DUF572 (Pfam link)

    Interpro entry IPR007590 : (Interpro link)

    Pfam description:
    Family of eukaryotic proteins with undetermined function.

    Interpro description:
    This is a family of eukaryotic proteins with undetermined function.

    Proteins where this domain is known:
    PY07440   


    PF04511 - DER1 (Pfam link)

    Interpro entry IPR007599 : (Interpro link)

    Pfam description:
    The endoplasmic reticulum (ER) of the yeast Saccharomyces cerevisiae contains of proteolytic system able to selectively degrade misfolded lumenal secretory proteins. For examination of the components involved in this degradation process, mutants were isolated. They could be divided into four complementation groups. The mutations led to stabilisation of two different substrates for this process. The mutant classes were called \'der\' for \'degradation in the ER\'. DER1 was cloned by complementation of the der1-2 mutation. The DER1 gene codes for a novel, hydrophobic protein, that is localised to the ER. Deletion of DER1 abolished degradation of the substrate proteins. The function of the Der1 protein seems to be specifically required for the degradation process associated with the ER. Interestingly this family seems distantly related to the Rhomboid family of membrane peptidases. Suggesting that this family may also mediate degradation of misfolded proteins (Bateman A pers. obs.).

    Interpro description:

    The endoplasmic reticulum (ER) of the yeast Saccharomyces cerevisiae (Baker's yeast) contains a proteolytic system able to selectively degrade misfolded lumenal secretory proteins. For examination of the components involved in this degradation process, mutants were isolated. They could be divided into four complementation groups. The mutations led to stabilisation of two different substrates for this process, and the classes were called der for degradation in the ER. DER1 was cloned by complementation of the der1-2 mutation. The DER1 gene codes for a novel, hydrophobic protein that is localized to the ER. Deletion of DER1 abolished degradation of the substrate proteins, suggesting that the function of the Der1 protein may be specifically required for the degradation process associated with the ER. Interestingly this family seems distantly related to the Rhomboid family of membrane peptidases. This family may also mediate degradation of misfolded proteins.

    Proteins where this domain is known:
    PY02283    PY02870    PY03142   


    PF04551 - GcpE (Pfam link)

    Interpro entry IPR004588 : 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase, bacterial-type (Interpro link)

    Pfam description:
    In a variety of organisms, including plants and several eubacteria, isoprenoids are synthesised by the mevalonate-independent 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway. Although different enzymes of this pathway have been described, the terminal biosynthetic steps of the MEP pathway have not been fully elucidated. GcpE gene of Escherichia coli is involved in this pathway.

    Interpro description:

    This protein previously of unknown biochemical function is essential in Escherichia coli. It has now been characterised as 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase, which converts 2C-methyl-D-erythritol 2,4-cyclodiphosphate (ME-2,4CPP) into 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate in the sixth step of nonmevalonate terpenoid biosynthesis. The family is restricted to bacteria, where it is widely but not universally distributed. No homology can be detected between this family and other proteins.

    Proteins where this domain is known:
    PY01664   


    PF04560 - RNA_pol_Rpb2_7 (Pfam link)

    Interpro entry IPR007641 : RNA polymerase Rpb2, domain 7 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). Rpb2 is the second largest subunit of the RNA polymerase. This domain comprised of the structural domains anchor and clamp. The clamp region (C-terminal) contains a zinc-binding motif. The clamp region is named due to its interaction with the clamp domain found in Rpb1. The domain also contains a region termed "switch 4". The switches within the polymerase are thought to signal different stages of transcription.

    Interpro description:

    RNA polymerases catalyse the DNA-dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Rpb2 is the second largest subunit of the RNA polymerase. This domain comprised of the structural domains anchor and clamp. The clamp region (C-terminal) contains a zinc-binding motif. The clamp region is named due to its interaction with the clamp domain found in Rpb1. The domain also contains a region termed switch 4. The switches within the polymerase are thought to signal different stages of transcription.

    Proteins where this domain is known:
    PY01115    PY01847    PY05002   


    PF04561 - RNA_pol_Rpb2_2 (Pfam link)

    Interpro entry IPR007642 : RNA polymerase Rpb2, domain 2 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). Rpb2 is the second largest subunit of the RNA polymerase. This domain forms one of the two distinctive lobes of the Rpb2 structure. This domain is also known as the lobe domain. DNA has been demonstrated to bind to the concave surface of the lobe domain, and plays a role in maintaining the transcription bubble. Many of the bacterial members contain large insertions within this domain, as region known as dispensable region 1 (DRI).

    Interpro description:

    RNA polymerases catalyse the DNA-dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Rpb2 is the second largest subunit of the RNA polymerase. This domain forms one of the two distinctive lobes of the Rpb2 structure. This domain is also known as the lobe domain. DNA has been demonstrated to bind to the concave surface of the lobe domain, and plays a role in maintaining the transcription bubble. Many of the bacterial members contain large insertions within this domain, a region known as dispensable region 1 (DRI).

    Proteins where this domain is known:
    PY01115    PY01847   


    PF04563 - RNA_pol_Rpb2_1 (Pfam link)

    Interpro entry IPR007644 : RNA polymerase, beta subunit, protrusion (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). This domain forms one of the two distinctive lobes of the Rpb2 structure. This domain is also known as the protrusion domain. The other lobe (PFAM:PF04561) is nested within this domain.

    Interpro description:

    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain forms one of the two distinctive lobes of the Rpb2 structure. This domain is also known as the protrusion domain. The other lobe, RNA polymerase Rpb2, domain 2, is nested within this domain.

    Proteins where this domain is known:
    PY01115    PY01847    PY05002   


    PF04564 - U-box (Pfam link)

    Interpro entry IPR003613 : U box (Interpro link)

    Pfam description:
    This domain is related to the Ring finger Pfam:PF00097 but lacks the zinc binding residues.

    Interpro description:

    Quality control of intracellular proteins is essential for cellular homeostasis. Molecular chaperones recognise and contribute to the refolding of misfolded or unfolded proteins, whereas the ubiquitin-proteasome system mediates the degradation of such abnormal proteins. Ubiquitin-protein ligases (E3s) determine the substrate specificity for ubiquitylation and have been classified into HECT and RING-finger families. More recently, however, U-box proteins, which contain a domain (the U box) of about 70 amino acids that is conserved from yeast to humans, have been identified as a new type of E3.

    Members of the U-box family of proteins constitute a class of ubiquitin-protein ligases (E3s) distinct from the HECT-type and RING finger-containing E3 families. Using yeast two-hybrid technology, all mammalian U-box proteins have been reported to interact with molecular chaperones or co-chaperones, including Hsp90, Hsp70, DnaJc7, EKN1, CRN, and VCP. This suggests that the function of U box-type E3s is to mediate the degradation of unfolded or misfolded proteins in conjunction with molecular chaperones as receptors that recognise such abnormal proteins.

    Unlike the RING finger domain that is stabilised by Zn2+ ions coordinated by the cysteines and a histidine, the U-box scaffold is probably stabilised by a system of salt-bridges and hydrogen bonds. The charged and polar residues that participate in this network of bonds are more strongly conserved in the U-box proteins than in classic RING fingers, which supports their role in maintaining the stability of the U box. Thus, the U box appears to have evolved from a RING finger domain by appropriation of a new set of residues required to stabilise its structure, concomitant with the loss of the original, metal-chelating residues.

    Proteins where this domain is known:
    PY00139    PY00518   


    PF04565 - RNA_pol_Rpb2_3 (Pfam link)

    Interpro entry IPR007645 : RNA polymerase Rpb2, domain 3 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). Domain 3, s also known as the fork domain and is proximal to catalytic site.

    Interpro description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Domain 3, is also known as the fork domain and is proximal to catalytic site.

    Proteins where this domain is known:
    PY01115    PY01847    PY05002   


    PF04566 - RNA_pol_Rpb2_4 (Pfam link)

    Interpro entry IPR007646 : RNA polymerase Rpb2, domain 4 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). Domain 4, is also known as the external 2 domain.

    Interpro description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Domain 4, is also known as the external 2 domain.

    Proteins where this domain is known:
    PY01115    PY01847   


    PF04567 - RNA_pol_Rpb2_5 (Pfam link)

    Interpro entry IPR007647 : RNA polymerase Rpb2, domain 5 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). Domain 5, is also known as the external 2 domain.

    Interpro description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Domain 5, is also known as the external 2 domain.

    Proteins where this domain is known:
    PY01115    PY01847   


    PF04571 - Lipin_N (Pfam link)

    Interpro entry IPR007651 : (Interpro link)

    Pfam description:
    Mutations in the lipin gene lead to fatty liver dystrophy in mice. The protein has been shown to be phosphorylated by the TOR Ser/Thr protein kinases in response to insulin stimulation. The conserved region is found at the N-terminus of the member proteins.

    Interpro description:
    Mutations in the lipin gene lead to fatty liver dystrophy in mice. The protein has been shown to be phosphorylated by the TOR Ser/Thr protein kinases in response to insulin stimulation. The conserved region is found at the N terminus of the member proteins.

    Proteins where this domain is known:
    PY01351   


    PF04573 - SPC22 (Pfam link)

    Interpro entry IPR007653 : Signal peptidase 22 kDa subunit (Interpro link)

    Pfam description:
    Translocation of polypeptide chains across the endoplasmic reticulum membrane is triggered by signal sequences. During translocation of the nascent chain through the membrane, the signal sequence of most secretory and membrane proteins is cleaved off. Cleavage occurs by the signal peptidase complex (SPC) which consists of four subunits in yeast and five in mammals. This family is common to yeast and mammals .

    Interpro description:
    Translocation of polypeptide chains across the endoplasmic reticulum membrane is triggered by signal sequences. During translocation of the nascent chain through the membrane, the signal sequence of most secretory and membrane proteins is cleaved off. Cleavage occurs by the signal peptidase complex (SPC), which consists of four subunits in yeast and five in mammals. This family is is described as similar to microsomal signal peptidase 23 kDa subunit. Found in eukaryotes.

    Proteins where this domain is known:
    PY04820   


    PF04597 - Ribophorin_I (Pfam link)

    Interpro entry IPR007676 : Ribophorin I (Interpro link)

    Pfam description:
    Ribophorin I is an essential subunit of oligosaccharyltransferase (OST), which is also known as Dolichyl-diphosphooligosaccharide--protein glycosyltransferase, (EC:2.4.1.119). OST catalyses the transfer of an oligosaccharide from dolichol pyrophosphate to selected asparagine residues of nascent polypeptides as they are translocated into the lumen of the rough endoplasmic reticulum. Ribophorin I and OST48 are though to be responsible for OST catalytic activity. Both yeast and mammalian proteins are glycosylated but the sites are not conserved. Glycosylation may contribute towards general solubility but is unlikely to be involved in a specific biochemical function Most family members are predicted to have a transmembrane helix at the C terminus of this region.

    Interpro description:
    Ribophorin I is an essential subunit of oligosaccharyltransferase (OST), which is also known as dolichyl-diphosphooligosaccharide--protein glycosyltransferase,. OST catalyses the transfer of an oligosaccharide from dolichol pyrophosphate to selected asparagine residues of nascent polypeptides as they are translocated into the lumen of the rough endoplasmic reticulum. Ribophorin I and OST48 are thought to be responsible for OST catalytic activity. Both yeast and mammalian proteins are glycosylated but the sites are not conserved. Glycosylation may contribute towards general solubility but is unlikely to be involved in a specific biochemical function. Most family members are predicted to have a transmembrane helix at the C terminus of this region.

    Proteins where this domain is known:
    PY01717   


    PF04603 - Mog1 (Pfam link)

    Interpro entry IPR007681 : (Interpro link)

    Pfam description:
    Segregation of nuclear and cytoplasmic processes facilitates regulation of many eukaryotic cellular functions such as gene expression and cell cycle progression. Trafficking through the nuclear pore requires a number of highly conserved soluble factors that escort macromolecular substrates into and out of the nucleus. The Mog1 protein has been shown to interact with RanGTP which stimulates guanine nucleotide release, suggesting Mog1 regulates the nuclear transport functions of Ran. The human homologue of Mog1 is thought to be alternatively spliced.

    Interpro description:
    Segregation of nuclear and cytoplasmic processes facilitates regulation of many eukaryotic cellular functions such as gene expression and cell cycle progression. Trafficking through the nuclear pore requires a number of highly conserved soluble factors that escort macromolecular substrates into and out of the nucleus. The Mog1 protein has been shown to interact with RanGTP, which stimulates guanine nucleotide release, suggesting Mog1 regulates the nuclear transport functions of Ran. The human homologue of Mog1 is thought to be alternatively spliced.

    Proteins where this domain is known:
    PY02511   


    PF04628 - Sedlin_N (Pfam link)

    Interpro entry IPR006722 : Sedlin (Interpro link)

    Pfam description:
    Mutations in this protein are associated with the X-linked spondyloepiphyseal dysplasia tarda syndrome (OMIM:313400). This family represents an N-terminal conserved region.

    Interpro description:

    Sedlin is a 140 amino-acid protein with a putative role in endoplasmic reticulum-to-Golgi transport. Several missense mutations and deletion mutations in the SEDL gene, which result in protein truncation by frame shift, are responsible for spondyloepiphyseal dysplasia tarda, a progressive skeletal disorder (OMIM:313400). .

    Proteins where this domain is known:
    PY05729   


    PF04641 - DUF602 (Pfam link)

    Interpro entry IPR006735 : (Interpro link)

    Pfam description:
    This family represents several uncharacterised eukaryotic proteins.

    Interpro description:
    This family represents several uncharacterised eukaryotic proteins.

    Proteins where this domain is known:
    PY00254   


    PF04652 - DUF605 (Pfam link)

    Interpro entry IPR006745 : (Interpro link)

    Pfam description:
    Vta1 (VPS20-associated protein 1) is a positive regulator of Vps4. Vps4 is an ATPase that is required in the multivesicular body (MVB) sorting pathway to dissociate the endosomal sorting complex required for transport (ESCRT). Vta1 promotes correct assembly of Vps4 and stimulates its ATPase activity through its conserved Vta1/SBP1/LIP5 region.

    Interpro description:

    This family contains proteins from the Eukaryota; functionally they are uncharacterised.

    Proteins where this domain is known:
    PY01512   


    PF04658 - TAFII55_N (Pfam link)

    Interpro entry IPR006751 : TAFII55 protein conserved region (Interpro link)

    Pfam description:
    The general transcription factor, TFIID, consists of the TATA-binding protein (TBP) associated with a series of TBP-associated factors (TAFs) that together participate in the assembly of the transcription preinitiation complex. TAFII55 binds to TAFII250 and inhibits it acetyltransferase activity. The exact role of TAFII55 is currently unknown. The conserved region is situated towards the N-terminus of the protein.

    Interpro description:
    The general transcription factor, TFIID, consists of the TATA-binding protein (TBP) associated with a series of TBP-associated factors (TAFs) that together participate in the assembly of the transcription preinitiation complex. TAFII55 binds to TAFII250 and inhibits its acetyltransferase activity. The exact role of TAFII55 is currently unknown. The conserved region is situated towards the N-terminal of the protein.

    Proteins where this domain is known:
    PY04173   


    PF04675 - DNA_ligase_A_N (Pfam link)

    Interpro entry IPR012308 : DNA ligase, N-terminal (Interpro link)

    Pfam description:
    This region is found in many but not all ATP-dependent DNA ligase enzymes (EC:6.5.1.1). It is thought to be involved in DNA binding and in catalysis. In human DNA ligase I (Swiss:P18858), and in Saccharomyces cerevisiae (Swiss:P04819), this region was necessary for catalysis, and separated from the amino terminus by targeting elements. In vaccinia virus (Swiss:P16272) this region was not essential for catalysis, but deletion decreases the affinity for nicked DNA and decreased the rate of strand joining at a step subsequent to enzyme-adenylate formation.

    Interpro description:

    This region is found in many but not all ATP-dependent DNA ligase enzymes. It is thought to be involved in DNA binding and in catalysis. In human DNA ligase I, and in Saccharomyces cerevisiae (Baker's yeast), this region was necessary for catalysis, and separated from the amino terminus by targeting elements. In Vaccinia virus this region was not essential for catalysis, but deletion decreases the affinity for nicked DNA and decreased the rate of strand joining at a step subsequent to enzyme-adenylate formation.

    Proteins where this domain is known:
    PY01533   


    PF04676 - CwfJ_C_2 (Pfam link)

    Interpro entry IPR006767 : (Interpro link)

    Pfam description:
    This region is found in the N terminus of Schizosaccharomyces pombe protein CwfJ (Swiss:Q09909). CwfJ is part of the Cdc5p complex involved in mRNA splicing.

    Interpro description:

    This group of sequences contain a conserved C-terminal domain which is found in the Schizosaccharomyces pombe (Fission yeast) protein CwfJ. CwfJ is part of the Cdc5p complex involved in mRNA splicing. This domain is found in association with which is generally N-terminal and adjacent to this domain.

    Proteins where this domain has been detected by our approach:
    PY06625   


    PF04677 - CwfJ_C_1 (Pfam link)

    Interpro entry IPR006768 : (Interpro link)

    Pfam description:
    This region is found in the N terminus of Schizosaccharomyces pombe protein CwfJ (Swiss:Q09909). CwfJ is part of the Cdc5p complex involved in mRNA splicing.

    Interpro description:

    This group of sequences contain a conserved C-terminal domain which is found in the Schizosaccharomyces pombe (Fission yeast) protein CwfJ. CwfJ is part of the Cdc5p complex involved in mRNA splicing. This domain is found in association with which is generally C-terminal and adjacent to this domain.

    Proteins where this domain is known:
    PY06625   


    PF04679 - DNA_ligase_A_C (Pfam link)

    Interpro entry IPR012309 : ATP dependent DNA ligase, C-terminal (Interpro link)

    Pfam description:
    This region is found in many but not all ATP-dependent DNA ligase enzymes (EC:6.5.1.1). It is thought to constitute part of the catalytic core of ATP dependent DNA ligase.

    Interpro description:

    This region is found in many but not all ATP-dependent DNA ligase enzymes. It is thought to constitute part of the catalytic core of ATP dependent DNA ligase.

    Proteins where this domain is known:
    PY01533   


    PF04707 - PRELI (Pfam link)

    Interpro entry IPR006797 : (Interpro link)

    Pfam description:
    This family includes a conserved region found in the PRELI protein and yeast YLR168C gene MSF1 product. The function of this protein is unknown, though it is thought to be involved in intra-mitochondrial protein sorting. This region is also found in a number of other eukaryotic proteins.

    Interpro description:

    These proteins contain a conserved region found in the yeast YLR168C gene MSF1 product. The function of this protein is unknown, though it is thought to be involved in intra-mitochondrial protein sorting. GFP-tagged MSF1 localizes to mitochondria and is required for wild-type respiratory growth. This region is also found in a number of other eukaryotic proteins. The PRELI/MSF1 domain is an eukaryotic protein module which occurs in stand-alone form in several proteins, including the human PRELI protein and the yeast MSF1 protein, and as an amino-terminal domain in an orthologous group of proteins typified by human SEC14L1, which is conserved in all animals. In this group of proteins, the PRELI/MSF1 domain co-occurs with the CRAL-TRIO (see and the GOLD domains (see. The PRELI/MSF1 domain is approximately 170 residues long and is predicted to assume a globular alpha + beta fold with six beta strands and four alpha helices. It has been suggested that the PRELI/MSF1 domain may have a function associated with cellular membrane.

    Proteins where this domain is known:
    PY02716   


    PF04712 - Radial_spoke (Pfam link)

    Interpro entry IPR006802 : (Interpro link)

    Pfam description:
    This family includes the radial spoke head proteins RSP4 and RSP6 from Chlamydomonas reinhardtii, and several eukaryotic homologues, including mammalian RSHL1, the protein product of a familial ciliary dyskinesia candidate gene.

    Interpro description:
    This family includes the radial spoke head proteins RSP4 and RSP6 from Chlamydomonas reinhardtii, and several eukaryotic homologues, including mammalian RSHL1, the protein product of a familial ciliary dyskinesia candidate gene.

    Proteins where this domain is known:
    PY02386   


    PF04729 - ASF1_hist_chap (Pfam link)

    Interpro entry IPR006818 : Histone chaperone, ASF1-like (Interpro link)

    Pfam description:
    This family includes the yeast and human ASF1 protein. These proteins have histone chaperone activity. ASF1 participates in both the replication-dependent and replication-independent pathways. The structure three-dimensional has been determined as a a compact immunoglobulin-like beta sandwich fold topped by three helical linkers.

    Interpro description:

    This family includes the yeast and human ASF1 protein. These proteins have histone chaperone activity. ASF1 participates in both the replication-dependent and replication-independent pathways. The structure three-dimensional has been determined as a compact immunoglobulin-like beta sandwich fold topped by three helical linkers.

    Proteins where this domain is known:
    PY04448   


    PF04733 - Coatomer_E (Pfam link)

    Interpro entry IPR006822 : Coatomer, epsilon subunit (Interpro link)

    Pfam description:
    This family represents the epsilon subunit of the coatomer complex, which is involved in the regulation of intracellular protein trafficking between the endoplasmic reticulum and the Golgi complex.

    Interpro description:

    Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits.

    This entry represents the epsilon subunit of the coatomer complex, which is involved in the regulation of intracellular protein trafficking between the endoplasmic reticulum and the Golgi complex.

    More information about these proteins can be found at Protein of the Month: Clathrin.

    Proteins where this domain is known:
    PY01892   


    PF04760 - IF2_N (Pfam link)

    Interpro entry IPR006847 : Translation initiation factor IF-2, N-terminal (Interpro link)

    Pfam description:
    This conserved feature at the N-terminus of bacterial translation initiation factor IF2 has recently had its structure solved. It shows structural similarity to the tRNA anticodon Stem Contact Fold domains of the methionyl-tRNA and glutaminyl-tRNA synthetases, and a similar fold is also found in the B5 domain of the phenylalanine-tRNA synthetase.

    Interpro description:
    This region is found in the N-terminal half of translation initiation factor IF-2. It is found in two copies in IF-2 alpha isoforms, and in only one copy in the N-terminally truncated beta and gamma isoforms. Its function is unknown.

    Proteins where this domain has been detected by our approach:
    PY06191   


    PF04777 - Evr1_Alr (Pfam link)

    Interpro entry IPR006863 : Erv1/Alr (Interpro link)

    Pfam description:
    Biogenesis of Fe/S clusters involves a number of essential mitochondrial proteins. Erv1p of Saccharomyces cerevisiae mitochondria is required for the maturation of Fe/S proteins in the cytosol. The ALR (augmenter of liver regeneration) represents a mammalian orthologue of yeast Erv1p. Both Erv1p and full-length ALR are located in the mitochondrial intermembrane an d it thought to operate downstream of the mitochondrial ABC transporter.

    Interpro description:
    Biogenesis of Fe/S clusters involves a number of essential mitochondrial proteins. Erv1p of Saccharomyces cerevisiae (Baker's yeast) mitochondria is required for the maturation of Fe/S proteins in the cytosol. The ALR (augmenter of liver regeneration) represents a mammalian ortholog of yeast Erv1p. Both Erv1p and full-length ALR are located in the mitochondrial intermembrane and it is thought to operate downstream of the mitochondrial ABC transporter..

    Proteins where this domain is known:
    PY00421   

    Proteins where this domain has been detected by our approach:
    PY01490   


    PF04810 - zf-Sec23_Sec24 (Pfam link)

    Interpro entry IPR006895 : Zinc finger, Sec23/Sec24-type (Interpro link)

    Pfam description:
    COPII-coated vesicles carry proteins from the endoplasmic reticulum to the Golgi complex. This vesicular transport can be reconstituted by using three cytosolic components containing five proteins: the small GTPase Sar1p, the Sec23p/24p complex, and the Sec13p/Sec31p complex. This domain is found to be zinc binding domain.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    COPII (coat protein complex II)-coated vesicles carry proteins from the endoplasmic reticulum (ER) to the Golgi complex. COPII-coated vesicles form on the ER by the stepwise recruitment of three cytosolic components: Sar1-GTP to initiate coat formation, Sec23/24 heterodimer to select SNARE and cargo molecules, and Sec13/31 to induce coat polymerisation and membrane deformation.

    Sec23 p and Sec24p are structurally related, folding into five distinct domains: a beta-barrel, a zinc-finger, an alpha/beta trunk domain, an all-helical region, and a C-terminal gelsolin-like domain. This entry describes an approximately 55-residue Sec23/24 zinc-binding domain, which lies against the beta-barrel at the periphery of the complex.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY01094    PY02496   

    Proteins where this domain has been detected by our approach:
    PY03895   


    PF04811 - Sec23_trunk (Pfam link)

    Interpro entry IPR006896 : Sec23/Sec24 trunk region (Interpro link)

    Pfam description:
    COPII-coated vesicles carry proteins from the endoplasmic reticulum to the Golgi complex. This vesicular transport can be reconstituted by using three cytosolic components containing five proteins: the small GTPase Sar1p, the Sec23p/24p complex, and the Sec13p/Sec31p complex. This domain is known as the trunk domain and has an alpha/beta vWA fold and forms the dimer interface.

    Interpro description:

    COPII (coat protein complex II)-coated vesicles carry proteins from the endoplasmic reticulum (ER) to the Golgi complex. COPII-coated vesicles form on the ER by the stepwise recruitment of three cytosolic components: Sar1-GTP to initiate coat formation, Sec23/24 heterodimer to select SNARE and cargo molecules, and Sec13/31 to induce coat polymerisation and membrane deformation.

    Sec23 p and Sec24p are structurally related, folding into five distinct domains: a beta-barrel, a zinc-finger, an alpha/beta trunk domain, an all-helical region, and a C-terminal gelsolin-like domain. This entry describes the Sec23/24 alpha/beta trunk domain, which is formed from a single, approximately 250-residue segment plugged into the beta-barrel between strands beta-1 and beta-19. The trunk has an alpha/beta fold with a vWA topology, and it forms the dimer interface, primarily involving strand beta-14 on Sec23 and Sec24; in addition, the trunk domain of Sec23 contacts Sar1.

    Proteins where this domain is known:
    PY01094    PY02497    PY03895   


    PF04815 - Sec23_helical (Pfam link)

    Interpro entry IPR006900 : Sec23/Sec24 helical region (Interpro link)

    Pfam description:
    COPII-coated vesicles carry proteins from the endoplasmic reticulum to the Golgi complex. This vesicular transport can be reconstituted by using three cytosolic components containing five proteins: the small GTPase Sar1p, the Sec23p/24p complex, and the Sec13p/Sec31p complex. This domain is composed of five alpha helices.

    Interpro description:

    COPII (coat protein complex II)-coated vesicles carry proteins from the endoplasmic reticulum (ER) to the Golgi complex. COPII-coated vesicles form on the ER by the stepwise recruitment of three cytosolic components: Sar1-GTP to initiate coat formation, Sec23/24 heterodimer to select SNARE and cargo molecules, and Sec13/31 to induce coat polymerisation and membrane deformation.

    Sec23 p and Sec24p are structurally related, folding into five distinct domains: a beta-barrel, a zinc-finger, an alpha/beta trunk domain, an all-helical region, and a C-terminal gelsolin-like domain. This entry describes the all-helical domain, which forms an approximately 105-residue segment with the C-terminal 30 residues. The linker between alpha-M and alpha-N contacts Sar1.

    Proteins where this domain is known:
    PY01094    PY02497    PY03895   


    PF04818 - DUF618 (Pfam link)

    Interpro entry IPR006903 : (Interpro link)

    Pfam description:
    This family represents a conserved region found in a number of uncharacterised eukaryotic proteins.

    Interpro description:
    This entry represents a conserved region found in a number of uncharacterised eukaryotic proteins.

    Proteins where this domain is known:
    PY05532   

    Proteins where this domain has been detected by our approach:
    PY00123   


    PF04824 - Rad21_Rec8 (Pfam link)

    Interpro entry IPR006909 : Rad21/Rec8 like protein, C-terminal (Interpro link)

    Pfam description:
    This family represents a conserved region found in eukaryotic cohesins of the Rad21, Rec8 and Scc1 families. Members of this family mediate sister chromatid cohesion during mitosis and meiosis, as part of the cohesin complex. Cohesion is necessary for homologous recombination (including double-strand break repair) and correct chromatid segregation. These proteins may also be involved in chromosome condensation. Dissociation at the metaphase to anaphase transition causes loss of cohesion and chromatid segregation.

    Interpro description:
    This family represents a conserved C-terminal region found in eukaryotic cohesins of the Rad21, Rec8 and Scc1 families. Rad21/Rec8 like proteins mediate sister chromatid cohesion during mitosis and meiosis, as part of the cohesin complex. Cohesion is necessary for homologous recombination (including double-strand break repair) and correct chromatid segregation. These proteins may also be involved in chromosome condensation. Dissociation at the metaphase to anaphase transition causes loss of cohesion and chromatid segregation.

    Proteins where this domain has been detected by our approach:
    PY04303   


    PF04825 - Rad21_Rec8_N (Pfam link)

    Interpro entry IPR006910 : (Interpro link)

    Pfam description:
    This family represents a conserved N-terminal region found in eukaryotic cohesins of the Rad21, Rec8 and Scc1 families. Members of this family mediate sister chromatid cohesion during mitosis and meiosis, as part of the cohesin complex. Cohesion is necessary for homologous recombination (including double-strand break repair) and correct chromatid segregation. These proteins may also be involved in chromosome condensation. Dissociation at the metaphase to anaphase transition causes loss of cohesion and chromatid segregation.

    Interpro description:
    This domain represents a conserved N-terminal region found in eukaryotic cohesins of the Rad21, Rec8 and Scc1 families. Rad21/Rec8 like proteins mediate sister chromatid cohesion during mitosis and meiosis, as part of the cohesin complex. Cohesion is necessary for homologous recombination (including double-strand break repair) and correct chromatid segregation. These proteins may also be involved in chromosome condensation. Dissociation at the metaphase to anaphase transition causes loss of cohesion and chromatid segregation.

    Proteins where this domain has been detected by our approach:
    PY04303   


    PF04840 - Vps16_C (Pfam link)

    Interpro entry IPR006925 : Vps16, C-terminal (Interpro link)

    Pfam description:
    This protein forms part of the Class C vacuolar protein sorting (Vps) complex. Vps16 is essential for vacuolar protein sorting, which is essential for viability in plants, but not yeast. The Class C Vps complex is required for SNARE-mediated membrane fusion at the lysosome-like yeast vacuole. It is thought to play essential roles in membrane docking and fusion at the Golgi-to-endosome and endosome-to-vacuole stages of transport. The role of VPS16 in this complex is not known.

    Interpro description:
    This protein forms part of the Class C vacuolar protein sorting (Vps) complex. Vps16 is essential for vacuolar protein sorting, which is essential for viability in plants, but not yeast. The Class C Vps complex is required for SNARE-mediated membrane fusion at the lysosome-like yeast vacuole. It is thought to play essential roles in membrane docking and fusion at the Golgi-to-endosome and endosome-to-vacuole stages of transport. The role of VPS16 in this complex is not known.

    Proteins where this domain is known:
    PY02218   


    PF04841 - Vps16_N (Pfam link)

    Interpro entry IPR006926 : Vps16, N-terminal (Interpro link)

    Pfam description:
    This protein forms part of the Class C vacuolar protein sorting (Vps) complex. Vps16 is essential for vacuolar protein sorting, which is essential for viability in plants, but not yeast. The Class C Vps complex is required for SNARE-mediated membrane fusion at the lysosome-like yeast vacuole. It is thought to play essential roles in membrane docking and fusion at the Golgi-to-endosome and endosome-to-vacuole stages of transport. The role of VPS16 in this complex is not known.

    Interpro description:
    This protein forms part of the Class C vacuolar protein sorting (Vps) complex. Vps16 is essential for vacuolar protein sorting, which is essential for viability in plants, but not yeast. The Class C Vps complex is required for SNARE-mediated membrane fusion at the lysosome-like yeast vacuole. It is thought to play essential roles in membrane docking and fusion at the Golgi-to-endosome and endosome-to-vacuole stages of transport. The role of VPS16 in this complex is not known.

    Proteins where this domain is known:
    PY02218   


    PF04851 - ResIII (Pfam link)

    Interpro entry IPR006935 : Restriction endonuclease, type I, R subunit/Type III, Res subunit (Interpro link)

    Interpro description:

    There are four classes of restriction endonucleases: types I, II,III and IV. All types of enzymes recognise specific short DNA sequences and carry out the endonucleolytic cleavage of DNA to give specific double-stranded fragments with terminal 5'-phosphates. They differ in their recognition sequence, subunit composition, cleavage position, and cofactor requirements, as summarised below:

    Type I restriction endonucleases are components of prokaryotic DNA restriction-modification mechanisms that protects the organism against invading foreign DNA. Type I enzymes have three different subunits subunits - M (modification), S (specificity) and R (restriction) - that form multifunctional enzymes with restriction, methylase and ATPase activities. The S subunit is required for both restriction and modification and is responsible for recognition of the DNA sequence specific for the system. The M subunit is necessary for modification, and the R subunit is required for restriction. These enzymes use S-Adenosyl-L-methionine (AdoMet) as the methyl group donor in the methylation reaction, and have a requirement for ATP. They recognise asymmetric DNA sequences split into two domains of specific sequence, one 3-4 bp long and another 4-5 bp long, separated by a nonspecific spacer 6-8 bp in length. Cleavage occurs a considerable distance from the recognition sites, rarely less than 400 bp away and up to 7000 bp away. Adenosyl residues are methylated, one on each strand of the recognition sequence. These enzymes are widespread in eubacteria and archaea. In enteric bacteria they have been subdivide into four families: types IA, IB, IC and ID.

    Type III restriction endonucleases are components of prokaryotic DNA restriction-modification mechanisms that protect the organism against invading foreign DNA. Type III enzymes are hetero-oligomeric, multifunctional proteins composed of two subunits, Res and Mod. The Mod subunit recognises the DNA sequence specific for the system and is a modification methyltransferase; as such it is functionally equivalent to the M and S subunits of type I restriction endonuclease. Res is required for restriction, although it has no enzymatic activity on its own. Type III enzymes recognise short 5-6 bp long asymmetric DNA sequences and cleave 25-27 bp downstream to leave short, single-stranded 5' protrusions. They require the presence of two inversely oriented unmethylated recognition sites for restriction to occur. These enzymes methylate only one strand of the DNA, at the N-6 position of adenosyl residues, so newly replicated DNA will have only one strand methylated, which is sufficient to protect against restriction. Type III enzymes belong to the beta-subfamily of N6 adenine methyltransferases, containing the nine motifs that characterise this family, including motif I, the AdoMet binding pocket (FXGXG), and motif IV, the catalytic region (S/D/N (PP) Y/F).

    This entry represents the R subunit (HsdR) of type I restriction endonucleases, the Res subunit of type III endonucleases, and the B subunit of excinuclease ABC (uvrB).

    Proteins where this domain is known:
    PY04074   

    Proteins where this domain has been detected by our approach:
    PY00683    PY00738    PY01287   


    PF04857 - CAF1 (Pfam link)

    Interpro entry IPR006941 : Ribonuclease CAF1 (Interpro link)

    Pfam description:
    The major pathways of mRNA turnover in eukaryotes initiate with shortening of the polyA tail. CAF1 Swiss:P39008 encodes a critical component of the major cytoplasmic deadenylase in yeast. Both Caf1p is required for normal mRNA deadenylation in vivo and localises to the cytoplasm. Caf1p copurifies with a Ccr4p-dependent polyA-specific exonuclease activity. Some members of this family include and inserted RNA binding domain Pfam:PF01424. This family of proteins is related to other exonucleases Pfam:PF00929 (Bateman A pers. obs.). The crystal structure of Saccharomyces cerevisiae Pop2 (Swiss:P39008) has been resolved at 2.3 Angstrom…resolution.

    Interpro description:
    CAF1 is an RNase of the DEDD superfamily, and a subunit of the Ccr4-Not complex that mediates 3' to 5' mRNA deadenylation. The major pathways of mRNA turnover in eukaryotes initiate with shortening of the poly(A) tail. CAF1encodes a critical component of the major cytoplasmic deadenylase in yeast. Caf1p is required for normal mRNA deadenylation in vivo and localises to the cytoplasm. Caf1p copurifies with a Ccr4p-dependent poly(A)-specific exonuclease activity. Some members of this family contain a single-stranded nucleic acid binding domain, R3H.

    Proteins where this domain is known:
    PY01168    PY01911   


    PF04874 - Mak16 (Pfam link)

    Interpro entry IPR006958 : (Interpro link)

    Pfam description:
    The precise function of this eukaryotic protein family is unknown. The yeast orthologues have been implicated in cell cycle progression and biogenesis of 60S ribosomal subunits. The Schistosoma mansoni Mak16 has been shown to target protein transport to the nucleolus.

    Interpro description:
    The function of these proteins is unknown. The yeast orthologues have been implicated in cell cycle progression and biogenesis of 60S ribosomal subunits. The Schistosoma mansoni (Blood fluke) Mak16 has been shown to target protein transport to the nucleolus.

    Proteins where this domain is known:
    PY01531   


    PF04889 - Cwf_Cwc_15 (Pfam link)

    Interpro entry IPR006973 : Cwf15/Cwc15 cell cycle control protein (Interpro link)

    Pfam description:
    This family represents Cwf15/Cwc15 (from Schizosaccharomyces pombe and Saccharomyces cerevisiae respectively) and their homologues. The function of these proteins is unknown, but they form part of the spliceosome and are thus thought to be involved in mRNA splicing.

    Interpro description:
    This family represents Cwf15/Cwc15 (from Schizosaccharomyces pombe and Saccharomyces cerevisiae respectively) and their homologues. The function of these proteins is unknown, but they form part of the spliceosome and are thus thought to be involved in mRNA splicing.

    Proteins where this domain is known:
    PY02267   


    PF04898 - Glu_syn_central (Pfam link)

    Interpro entry IPR006982 : Glutamate synthase, central-N (Interpro link)

    Pfam description:
    The central domain of glutamate synthase connects the amino terminal amidotransferase domain with the FMN-binding domain and has an alpha / beta overall topology.

    Interpro description:

    Glutamate synthase (GltS)1 is a key enzyme in the early stages of the assimilation of ammonia in bacteria, yeasts, and plants. In bacteria, L-glutamate is involved in osmoregulation, is the precursor for other amino acids, and can be the precursor for haem biosynthesis. In plants, GltS is especially essential in the reassimilation of ammonia released by photorespiration. On the basis of the amino acid sequence and the nature of the electron donor, three different classes of GltS can de defined as follows: 1) ferredoxin-dependent GltS (Fd-GltS), 2) NADPH-dependent GltS (NADPH-GltS), and 3) NADH-dependent GltS (properties of the three classes have been reviewed extensively). The enzyme is a complex iron-sulphur flavoprotein catalysing the reductive transfer of the amido nitrogen from L-glutamine to 2-oxoglutarate to form two molecules of L-glutamate via intramolecular channelling of ammonia from the amidotransferase domain to the FMN-binding domain.

    Reaction of amidotransferase domain:

      L-glutamine + H2O = L-glutamate + NH3 

    Reactions of FMN-binding domain:

      2-oxoglutarate + NH3 = 2-iminoglutarate + H2O 
    2e + FMNox = FMNred  
    2-iminoglutarate + FMNred = L-glutamate + FMNox  
    The central domain of glutamate synthase connects the N-terminal amidotransferase domain with the FMN-binding domain and has an alpha/beta overall topology.

    Proteins where this domain is known:
    PY03719   


    PF04900 - Fcf1 (Pfam link)

    Interpro entry IPR006984 : (Interpro link)

    Pfam description:
    Fcf1 is a nucleolar protein involved in pre-rRNA processing. Depletion of yeast Fcf1 and Fcf2 leads to a decrease in synthesis of the 18S rRNA and results in a deficit in 40S ribosomal subunits.

    Interpro description:
    This family is comprises of uncharacterised eukaryotic proteins.

    Proteins where this domain is known:
    PY05993   


    PF04910 - DUF654 (Pfam link)

    Interpro entry IPR006994 : (Interpro link)

    Pfam description:
    This family includes a number of poorly characterised eukaryotic proteins.

    Interpro description:

    This entry appears to represent a novel family of basic helix-loop-helix (bHLH) proteins that control differentiation and development of a variety of organs.

    Human Nulp1 is a basic helix-loop-helix protein expressed broadly during early embryonic organogenesis. Over expression of human Nulp1 in COS-7 cells inhibits the transcriptional activity of serum response factor (SRF), suggesting that Nulp1 may act as a novel bHLH transcriptional repressor in the SRF signalling pathway to mediate cellular functions.

    Proteins where this domain is known:
    PY04352   


    PF04921 - XAP5 (Pfam link)

    Interpro entry IPR007005 : XAP5 protein (Interpro link)

    Pfam description:
    This protein is found in a wide range of eukaryotes. Its function is uncertain. It is a nuclear protein and is suggested to be DNA binding.

    Interpro description:
    These proteins are found in a wide range of eukaryotes. Their function is uncertain though they are nuclear proteins, possibly with DNA-binding activity.

    Proteins where this domain is known:
    PY02808   


    PF04926 - PAP_RNA-bind (Pfam link)

    Interpro entry IPR007010 : Poly(A) polymerase, RNA-binding region (Interpro link)

    Pfam description:
    Based on its similarity structurally to the RNA recognition motif this domain is thought to be RNA binding.

    Interpro description:

    In eukaryotes, polyadenylation of pre-mRNA plays an essential role in the initiation step of protein synthesis, as well as in the export and stability of mRNAs. Poly(A) polymerase, the enzyme at the heart of the polyadenylation machinery, is a template-independent RNA polymerase that specifically incorporates ATP at the 3' end of mRNA. The crystal structure of bovine poly(A) polymerase bound to an ATP analogue at 2.5 A resolution has been determined. The structure revealed expected and unexpected similarities to other proteins. As expected, the catalytic domain of poly(A) polymerase shares substantial structural homology with other nucleotidyl transferases such as DNA polymerase beta and kanamycin transferase.

    The C-terminal domain unexpectedly folds into a compact domain reminiscent of the RNA-recognition motif fold. The three invariant aspartates of the catalytic triad ligate two of the three active site metals. One of these metals also contacts the adenine ring. Furthermore, conserved, catalytically important residues contact the nucleotide. These contacts, taken together with metal coordination of the adenine base, provide a structural basis for ATP selection by poly(A) polymerase.

    Proteins where this domain is known:
    PY02044   


    PF04928 - PAP_central (Pfam link)

    Interpro entry IPR007012 : Poly(A) polymerase, central region (Interpro link)

    Pfam description:
    The central domain of Poly(A) polymerase shares structural similarity with the allosteric activity domain of ribonucleotide reductase R1, which comprises a four-helix bundle and a three-stranded mixed beta- sheet. Even though the two enzymes bind ATP, the ATP-recognition motifs are different.

    Interpro description:

    In eukaryotes, polyadenylation of pre-mRNA plays an essential role in the initiation step of protein synthesis, as well as in the export and stability of mRNAs. Poly(A) polymerase, the enzyme at the heart of the polyadenylation machinery, is a template-independent RNA polymerase which specifically incorporates ATP at the 3' end of mRNA. The crystal structure of bovine poly(A) polymerase bound to an ATP analog at 2.5 A resolutio has been determined. The structure revealed expected and unexpected similarities to other proteins. As expected, the catalytic domain of poly(A) polymerase shares substantial structural homology with other nucleotidyl transferases such as DNA polymerase beta and kanamycin transferase.

    The central domain of Poly(A) polymerase shares structural similarity with the allosteric activity domain of ribonucleotide reductase R1, which comprises a four-helix bundle and a three-stranded mixed beta-sheet. Even though the two enzymes bind ATP, the ATP-recognition motifs are different.

    Proteins where this domain is known:
    PY02044   


    PF04939 - RRS1 (Pfam link)

    Interpro entry IPR007023 : Ribosomal biogenesis regulatory protein (Interpro link)

    Pfam description:
    This family consists of several eukaryotic ribosome biogenesis regulatory (RRS1) proteins. RRS1 is a nuclear protein that is essential for the maturation of 25 S rRNA and the 60 S ribosomal subunit assembly in Saccharomyces cerevisiae.

    Interpro description:

    This is a family of eukaryotic ribosomal biogenesis regulatory proteins.

    Proteins where this domain is known:
    PY00816   


    PF04950 - DUF663 (Pfam link)

    Interpro entry IPR007034 : (Interpro link)

    Pfam description:
    This family contains several uncharacterised eukaryotic proteins.

    Interpro description:

    This conserved region is found in a number of eukaryotic proteins, including the ribosome biogenesis protein (BMS) which may act as a molecular switch during maturation of the 40S ribosomal subunit in the nucleolus.

    Proteins where this domain is known:
    PY01621    PY04551   


    PF04969 - CS (Pfam link)

    Interpro entry IPR017447 : (Interpro link)

    Pfam description:
    The CS and CHORD (Pfam:PF04968) are fused into a single polypeptide chain in metazoans but are found in separate proteins in plants; this is thought to be indicative of an interaction between CS and CHORD. It has been suggested that the CS domain is a binding module for HSP90, implying that CS domain-containing proteins are involved in recruiting heat shock proteins to multiprotein assemblies.

    Interpro description:
    The function of the CS domain is unknown. The CS domain is sometimes found C-terminal to the CHORD domain in metazoan proteins, but occurs separately from the CHORD domain in plants. This association is thought to be indicative of an functional interaction between CS and CHORD domains.

    Proteins where this domain is known:
    PY00792    PY02847    PY02910    PY04294    PY04829    PY05249    PY05996    PY06251   

    Proteins where this domain has been detected by our approach:
    PY02599   


    PF04981 - NMD3 (Pfam link)

    Interpro entry IPR007064 : (Interpro link)

    Pfam description:
    The NMD3 protein is involved in nonsense mediated mRNA decay. This amino terminal region contains four conserved CXXC motifs that could be metal binding. NMD3 is involved in export of the 60S ribosomal subunit is mediated by the adapter protein Nmd3p in a Crm1p-dependent pathway.

    Interpro description:
    The NMD3 protein is involved in nonsense mediated mRNA decay. This N-terminal region contains four conserved CXXC motifs that could be metal binding. NMD3 is involved in export of the 60S ribosomal subunit is mediated by the adapter protein Nmd3p in a Crm1p-dependent pathway.

    Proteins where this domain is known:
    PY01474   


    PF04983 - RNA_pol_Rpb1_3 (Pfam link)

    Interpro entry IPR007066 : RNA polymerase Rpb1, domain 3 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). This domain, domain 3, represents the pore domain. The 3\' end of RNA is positioned close to this domain. The pore delimited by this domain is thought to act as a channel through which nucleotides enter the active site and/or where the 3\' end of the RNA may be extruded during back-tracking .

    Interpro description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain, domain 3, represents the pore domain. The 3' end of RNA is positioned close to this domain. The pore delimited by this domain is thought to act as a channel through which nucleotides enter the active site and/or where the 3' end of the RNA may be extruded during back-tracking .

    Proteins where this domain is known:
    PY01037    PY03187    PY03255   


    PF04990 - RNA_pol_Rpb1_7 (Pfam link)

    Interpro entry IPR007073 : RNA polymerase Rpb1, domain 7 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). This domain, domain 7, represents a mobile module of the RNA polymerase. Domain 7 forms a substantial interaction with the lobe domain of Rpb2 (Pfam:PF04561).

    Interpro description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain, domain 7, represents a mobile module of the RNA polymerase. Domain 7 interacts with the lobe domain of Rpb2.

    Proteins where this domain is known:
    PY03187   


    PF04992 - RNA_pol_Rpb1_6 (Pfam link)

    Interpro entry IPR007075 : RNA polymerase Rpb1, domain 6 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). This domain, domain 6, represents a mobile module of the RNA polymerase. Domain 6 forms part of the shelf module. This family appears to be specific to the largest subunit of RNA polymerase II.

    Interpro description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain, domain 6, represents a mobile module of the RNA polymerase. Domain 6 forms part of the shelf module. This family appears to be specific to the largest subunit of RNA polymerase II.

    Proteins where this domain is known:
    PY03187   


    PF04997 - RNA_pol_Rpb1_1 (Pfam link)

    Interpro entry IPR007080 : RNA polymerase Rpb1, domain 1 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). This domain, domain 1, represents the clamp domain, which a mobile domain involved in positioning the DNA, maintenance of the transcription bubble and positioning of the nascent RNA strand.

    Interpro description:

    RNA polymerases catalyse the DNA-dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain, domain 1, represents the clamp domain, which is a mobile domain involved in positioning the DNA, maintenance of the transcription bubble and positioning of the nascent RNA strand.

    Proteins where this domain is known:
    PY01037    PY03187    PY03255    PY04439   


    PF04998 - RNA_pol_Rpb1_5 (Pfam link)

    Interpro entry IPR007081 : RNA polymerase Rpb1, domain 5 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). This domain, domain 5, represents the discontinuous cleft domain that is required to from the central cleft or channel where the DNA is bound.

    Interpro description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This domain, domain 5, represents the discontinuous cleft domain that is required to form the central cleft or channel where the DNA is bound.

    Proteins where this domain is known:
    PY01037    PY03187    PY03255    PY04440   


    PF05000 - RNA_pol_Rpb1_4 (Pfam link)

    Interpro entry IPR007083 : RNA polymerase Rpb1, domain 4 (Interpro link)

    Pfam description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial. and chloroplast polymerases). This domain, domain 4, represents the funnel domain. The funnel contain the binding site for some elongation factors.

    Interpro description:
    RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). This entry, domain 4, represents the funnel domain. The funnel domain contains the binding site for some elongation factors.

    Proteins where this domain is known:
    PY01037    PY03187    PY03255   

    Proteins where this domain has been detected by our approach:
    PY04440   


    PF05001 - RNA_pol_Rpb1_R (Pfam link)

    Interpro entry IPR000684 : RNA polymerase II, heptapeptide repeat, eukaryotic (Interpro link)

    Pfam description:
    The repetitive C-terminal domain (CTD) of Rpb1 (RNA polymerase Pol II) plays a critical role in the regulation of gene expression. The activity of the CTD is dependent on its state of phosphorylation.

    Interpro description:

    RNA polymerase II is one of the three forms of RNA polymerase that exist in eukaryotic nuclei. The C-terminal region of the largest subunit of this oligomeric enzyme consists of the tandem repeat of a conserved heptapeptide. The number of repeats varies according to the species (for example there are 17 in Plasmodium, 26 in yeast, 44 in Drosophila, and 52 in mammals). The region containing these repeats is essential for the function of polymerase II. This repeated heptapeptide (called CT7n or CTD) is rich in hydroxyl groups. It probably projects out of the globular catalytic domain and may interact with the acidic activator domains of transcriptional regulatory proteins. It is also known to bind by intercalation to DNA. RNA polymerase II is activated by phosphorylation. The serine and threonine residues in the CT7n repeats are the target of such phosphorylation.

    Proteins where this domain is known:
    PY03187   


    PF05002 - SGS (Pfam link)

    Interpro entry IPR007699 : (Interpro link)

    Pfam description:
    This domain was thought to be unique to the SGT1-like proteins, but is also found in calcyclin binding proteins.

    Interpro description:
    This domain was thought to be unique to the SGT1-like proteins, but is also found in calcyclin binding proteins. Sgt1p is a highly conserved eukaryotic protein that is required for both SCF (Skp1p/Cdc53p-Cullin-F-box)-mediated ubiquitination and kinetochore function in yeast and also plays a role in the cAMP pathway. Calcyclin (S100A6) is a member of the S100A family of calcium binding proteins and appears to play a role in cell proliferation.

    Proteins where this domain is known:
    PY04829   


    PF05007 - Mannosyl_trans (Pfam link)

    Interpro entry IPR007704 : Mannosyltransferase, DXD (Interpro link)

    Pfam description:
    PIG-M has a DXD motif. The DXD motif is found in many glycosyltransferases that utilise nucleotide sugars. It is thought that the motif is involved in the binding of a manganese ion that is required for association of the enzymes with nucleotide sugar substrates.

    Interpro description:
    PIG-M has a DXD motif. The DXD motif is found in many glycosyltransferases that utilise nucleotide sugars. It is thought that the motif is involved in the binding of a manganese ion that is required for association of the enzymes with nucleotide sugar substrates.

    Proteins where this domain is known:
    PY02427   


    PF05011 - DBR1 (Pfam link)

    Interpro entry IPR007708 : Lariat debranching enzyme, C-terminal (Interpro link)

    Pfam description:
    This presumed domain is found at the C-terminus of lariat debranching enzyme. This domain is always found in association with Pfam:PF00149.

    Interpro description:

    This presumed domain is found at the C terminus of lariat debranching enzyme. This domain is always found in association with a metallo-phosphoesterase domain RNA lariat debranching enzyme is capable of digesting a variety of branched nucleic acid substrates and multicopy single-stranded DNAs. The enzyme degrades intron lariat structures during splicing.

    Proteins where this domain has been detected by our approach:
    PY04559   


    PF05018 - DUF667 (Pfam link)

    Interpro entry IPR007714 : (Interpro link)

    Pfam description:
    This family of proteins are highly conserved in eukaryotes. Some proteins in the family are annotated as transcription factors. However, there is currently no support for this in the literature.

    Interpro description:
    This family of proteins are highly conserved in eukaryotes. Some proteins in the family are annotated as transcription factors. However, there is currently no support for this in the literature.

    Proteins where this domain is known:
    PY00388   


    PF05019 - Coq4 (Pfam link)

    Interpro entry IPR007715 : Coenzyme Q biosynthesis Coq4 (Interpro link)

    Pfam description:
    Coq4p was shown to peripherally associate with the matrix face of the mitochondrial inner membrane. The putative mitochondrial- targeting sequence present at the amino-terminus of the polypeptide efficiently imported it to mitochondria. The function of Coq4p is unknown, although its presence is required to maintain a steady-state level of Coq7p, another component of the Q biosynthetic pathway.

    Interpro description:
    Coq4p was shown to peripherally associate with the matrix face of the mitochondrial inner membrane. The putative mitochondrial- targeting sequence present at the N terminus of the polypeptide efficiently imports it to mitochondria. The function of Coq4p is unknown, although its presence is required to maintain a steady-state level of Coq7p, another component of the Q biosynthetic pathway.

    Proteins where this domain is known:
    PY05764   


    PF05020 - zf-NPL4 (Pfam link)

    Interpro entry IPR007716 : (Interpro link)

    Pfam description:
    The HRD4 gene was identical to NPL4, a gene previously implicated in nuclear transport. Using a diverse set of substrates and direct ubiquitination assays, analysis revealed that HRD4/NPL4 is required for a poorly characterised step in ER-associated degradation after ubiquitination of target proteins but before their recognition by the 26S proteasome. This region of the protein contains possibly two zinc binding motifs (Bateman A pers. obs.). Npl4p physically associates with Cdc48p via Ufd1p to form a Cdc48p-Ufd1p-Npl4p complex. The Cdc48-Ufd1-Npl4 complex functions in the recognition of several polyubiquitin-tagged proteins and facilitates their presentation to the 26S proteasome for processive degradation or even more specific processing.

    Interpro description:
    The HRD4 gene is identical to NPL4, a gene previously implicated in nuclear transport. Using a diverse set of substrates and direct ubiquitination assays, analysis revealed that HRD4/NPL4 is required for a poorly characterised step in ER-associated degradation after ubiquitination of target proteins but before their recognition by the 26S proteasome. This region of the protein contains possibly two zinc binding motifs. Npl4p physically associates with Cdc48p via Ufd1p to form a Cdc48p-Ufd1p-Npl4p complex. The Cdc48-Ufd1-Npl4 complex functions in the recognition of several polyubiquitin-tagged proteins and facilitates their presentation to the 26S proteasome for processive degradation or even more specific processing.

    Proteins where this domain has been detected by our approach:
    PY05126   


    PF05021 - NPL4 (Pfam link)

    Interpro entry IPR007717 : (Interpro link)

    Pfam description:
    The HRD4 gene was identical to NPL4, a gene previously implicated in nuclear transport. Using a diverse set of substrates and direct ubiquitination assays, analysis revealed that HRD4/NPL4 is required for a poorly characterised step in ER-associated degradation after ubiquitination of target proteins but before their recognition by the 26S proteasome. Npl4p physically associates with Cdc48p via Ufd1p to form a Cdc48p-Ufd1p-Npl4p complex. The Cdc48-Ufd1-Npl4 complex functions in the recognition of several polyubiquitin-tagged proteins and facilitates their presentation to the 26S proteasome for processive degradation or even more specific processing.

    Interpro description:
    The HRD4 gene is identical to NPL4, a gene previously implicated in nuclear transport. Using a diverse set of substrates and direct ubiquitination assays, analysis revealed that HRD4/NPL4 is required for a poorly characterised step in ER-associated degradation following ubiquitination of target proteins but preceeding their recognition by the 26S proteasome. Npl4p physically associates with Cdc48p via Ufd1p to form a Cdc48p-Ufd1p-Npl4p complex. The Cdc48-Ufd1-Npl4 complex functions in the recognition of several polyubiquitin-tagged proteins and facilitates their presentation to the 26S proteasome for processive degradation or even more specific processing.

    Proteins where this domain is known:
    PY05126   


    PF05024 - Gpi1 (Pfam link)

    Interpro entry IPR007720 : N-acetylglucosaminyl transferase component (Interpro link)

    Pfam description:
    Glycosylphosphatidylinositol (GPI) represents an important anchoring molecule for cell surface proteins.The first step in its synthesis is the transfer of N-acetylglucosamine (GlcNAc) from UDP-N-acetylglucosamine to phosphatidylinositol (PI). This chemically simple step is genetically complex because three or four genes are required in both yeast (GPI1, GPI2 and GPI3) and mammals (GPI1, PIG A, PIG H and PIG C), respectively.

    Interpro description:
    Glycosylphosphatidylinositol (GPI) represents an important anchoring molecule for cell surface proteins. The first step in its synthesis is the transfer of N-acetylglucosamine (GlcNAc) from UDP-N-acetylglucosamine to phosphatidylinositol (PI). This chemically simple step is genetically complex because three or four genes are required in both Saccharomyces cerevisiae (GPI1, GPI2 and GPI3) and mammals (GPI1, PIG A, PIG H and PIG C), respectively.

    Proteins where this domain is known:
    PY03381   


    PF05026 - DCP2 (Pfam link)

    Interpro entry IPR007722 : Dcp2, box A (Interpro link)

    Pfam description:
    This domain is always found to the amino terminal side of Pfam:PF00293. This domain is specific to mRNA decapping protein 2 and this region has been termed Box A. Removal of the cap structure is catalysed by the Dcp1-Dcp2 complex.

    Interpro description:
    This presumed domain is always found to the N-terminal side of the NUDIX hydrolase domain This domain appears to be specific to mRNA decapping protein 2and its close homologues. This region has been termed Box A.

    Proteins where this domain has been detected by our approach:
    PY07320   


    PF05033 - Pre-SET (Pfam link)

    Interpro entry IPR007728 : Pre-SET zinc-binding region (Interpro link)

    Pfam description:
    This protein motif is a zinc binding motif. It contains 9 conserved cysteines that coordinate three zinc ions. It is thought that this region plays a structural role in stabilising SET domains.

    Interpro description:

    This region is found in a number of histone lysine methyltransferases (HMTase), N-terminal to the SET domain; it is generally described as the pre-SET domain.

    Histone lysine methylation is part of the histone code that regulated chromatin function and epigenetic control of gene function. Histone lysine methyltransferases (HMTase) differ both in their substrate specificity for the various acceptor lysines as well as in their product specificity for the number of methyl groups (one, two, or three) they transfer. With just one exception, the HMTases belong to SET family that can be classified according to the sequences surrounding the SET domain. Structural studies on the human SET7/9, a mono-methylase, have revealed the molecular basis for the specificity of the enzyme for the histone-target and the roles of the invariant residues in the SET domain in determining the methylation specificities.

    The pre-SET domain, as found in the SUV39 SET family, contains nine invariant cysteine residues that are grouped into two segments separated by a region of variable length. These 9 cysteines coordinate 3 zinc ions to form a triangular cluster, where each of the zinc ions is coordinated by 4 four cysteines to give a tetrahedral configuration. The function of this domain is structural, holding together 2 long segments of random coils and stabilizing the SET domain.

    The C-terminal region including the post-SET domain is disordered when not interacting with a histone tail and in the absence of zinc. The three conserved cysteines in the post-SET domain form a zinc-binding site when coupled to a fourth conserved cysteine in the knot-like structure close to the SET domain active site. The structured post-SET region brings in the C-terminal residues that participate in S-adenosylmethine-binding and histone tail interactions. The three conserved cysteine residues are essential for HMTase activity, as replacement with serine abolishes HMTase activity.

    Proteins where this domain is known:
    PY00637   


    PF05047 - L51_S25_CI-B8 (Pfam link)

    Interpro entry IPR007741 : (Interpro link)

    Pfam description:
    The proteins in this family are located in the mitochondrion. The family includes ribosomal protein L51, and S25. This family also includes mitochondrial NADH-ubiquinone oxidoreductase B8 subunit (CI-B8) EC:1.6.5.3. It is not known whether all members of this family form part of the NADH-ubiquinone oxidoreductase and whether they are also all ribosomal proteins.

    Interpro description:
    Proteins containing this domain are located in the mitochondrion and include ribosomal protein L51, and S25. This domain is also found in mitochondrial NADH-ubiquinone oxidoreductase B8 subunit (CI-B8) It is not known whether all members of this family form part of the NADH-ubiquinone oxidoreductase and whether they are also all ribosomal proteins.

    Proteins where this domain is known:
    PY04009   


    PF05051 - COX17 (Pfam link)

    Interpro entry IPR007745 : Cytochrome c oxidase copper chaperone (Interpro link)

    Pfam description:
    Cox17 is essential for the assembly of functional cytochrome c oxidase (CCO) and for delivery of copper ions to the mitochondrion for insertion into the enzyme in yeast. The structure of Cox17 shows the protein to have an unstructured N-terminal region followed by two helices and several unstructured C-terminal residues. The Cu(I) binding site has been modelled as two-coordinate with ligation by conserved residues Cys23 and Cys26.

    Interpro description:
    Cox17p is essential for the assembly of functional cytochrome c oxidase (CCO) and for delivery of copper ions to the mitochondrion for insertion into the enzyme in Saccharomyces cerevisiae.

    Proteins where this domain is known:
    PY03823   


    PF05057 - DUF676 (Pfam link)

    Interpro entry IPR007751 : (Interpro link)

    Pfam description:
    This family of proteins are probably serine esterase type enzymes with an alpha/beta hydrolase fold.

    Interpro description:

    This domain is associated with eukaryotic proteins of unknown function, which are hydrolase-like.

    Proteins where this domain is known:
    PY02157   


    PF05063 - MT-A70 (Pfam link)

    Interpro entry IPR007757 : MT-A70 (Interpro link)

    Pfam description:
    MT-A70 is the S-adenosylmethionine-binding subunit of human mRNA:m6A methyl-transferase (MTase), an enzyme that sequence-specifically methylates adenines in pre-mRNAs.

    Interpro description:
    MT-A70 is the S-adenosylmethionine-binding subunit of human mRNA:m6A methyl-transferase (MTase), an enzyme that sequence-specifically methylates adenines in pre-mRNAs.

    Proteins where this domain is known:
    PY00961    PY01472   


    PF05091 - eIF-3_zeta (Pfam link)

    Interpro entry IPR007783 : Eukaryotic translation initiation factor 3, subunit 7 (Interpro link)

    Pfam description:
    This family is made up of eukaryotic translation initiation factor 3 subunit 7 (eIF-3 zeta/eIF3 p66/eIF3d). Eukaryotic initiation factor 3 is a multi-subunit complex that is required for binding of mRNA to 40 S ribosomal subunits, stabilisation of ternary complex binding to 40 S subunits, and dissociation of 40 and 60 S subunits. These functions and the complex nature of eIF3 suggest multiple interactions with many components of the translational machinery. The gene coding for the protein has been implicated in cancer in mammals.

    Interpro description:
    This family is made up of eukaryotic translation initiation factor 3 subunit 7 (eIF-3 zeta/eIF3 p66/eIF3d). Eukaryotic initiation factor 3 is a multi-subunit complex that is required for binding of mRNA to 40S ribosomal subunits, stabilisation of ternary complex binding to 40 S subunits, and dissociation of 40 and 60 S subunits. These functions and the complex nature of eIF3 suggest multiple interactions with many components of the translational machinery. The gene coding for the protein has been implicated in cancer in mammals.

    Proteins where this domain is known:
    PY04269   


    PF05093 - DUF689 (Pfam link)

    Interpro entry IPR007785 : (Interpro link)

    Pfam description:
    This family contains several uncharacterised eukaryotic proteins of unknown function. The most conserved region is at the C-terminus and contains several conserved cysteines.

    Interpro description:
    This family contains several uncharacterised eukaryotic proteins of unknown function.

    Proteins where this domain is known:
    PY03124   


    PF05096 - Glu_cyclase_2 (Pfam link)

    Interpro entry IPR007788 : (Interpro link)

    Pfam description:
    This family of enzymes EC:2.3.2.5 catalyse the cyclization of free L-glutamine and N-terminal glutaminyl residues in proteins to pyroglutamate (5-oxoproline) and pyroglutamyl residues respectively. This family includes plant and bacterial enzymes and seems unrelated to the mammalian enzymes.

    Interpro description:
    This family of enzymescatalyse the cyclization of free L-glutamine and N-terminal glutaminyl residues in proteins to pyroglutamate (5-oxoproline) and pyroglutamyl residues respectively. This family includes plant and bacterial enzymes and seems unrelated to the mammalian enzymes.

    Proteins where this domain is known:
    PY06896   


    PF05127 - DUF699 (Pfam link)

    Interpro entry IPR007807 : (Interpro link)

    Pfam description:
    This putative domain is about 350 amino acid residues long and appears to have a P-loop motif, suggesting this is an ATPase. This domain is often associated with Pfam:PF00583. This domain is found in isolation in Swiss:P44140.

    Interpro description:
    This domain is about 350 amino acid residues long and appears to have a P-loop motif, suggesting this is an ATPase. This domain is often N-terminal to a GCN5-related N-acetyltransferase domain

    Proteins where this domain is known:
    PY04574   


    PF05129 - Elf1 (Pfam link)

    Interpro entry IPR007808 : (Interpro link)

    Pfam description:
    This family of short proteins contains a putative zinc binding domain with four conserved cysteines. Swiss:P36053 has been identified as a transcription elongation factor in Saccharomyces cerevisiae.

    Interpro description:
    This family of uncharacterised, mostly short, proteins contain a putative zinc binding domain with four conserved cysteines.

    Proteins where this domain is known:
    PY06377   


    PF05131 - Pep3_Vps18 (Pfam link)

    Interpro entry IPR007810 : (Interpro link)

    Pfam description:
    This region is found in a number of protein identified as involved in golgi function and vacuolar sorting. The molecular function of this region is unknown. The members of this family contain a C-terminal ring finger domain.

    Interpro description:
    This region is found in a number of proteins identified as being involved in Golgi function and vacuolar sorting. The molecular function of this region is unknown. Proteins containing this domain also contain a C-terminal ring finger domain.

    Proteins where this domain has been detected by our approach:
    PY03107   


    PF05147 - LANC_like (Pfam link)

    Interpro entry IPR007822 : (Interpro link)

    Pfam description:
    Lanthionines are thioether bridges that are putatively generated by dehydration of Ser and Thr residues followed by addition of cysteine residues within the peptide. This family contains the lanthionine synthetase C-like proteins 1 and 2 which are related to the bacterial lanthionine synthetase components C (LanC). LANCL1 (P40 seven-transmembrane-domain protein) and LANCL2 (testes-specific adriamycin sensitivity protein) are thought to be peptide-modifying enzyme components in eukaryotic cells. Both proteins are produced in large quantities in the brain and testes and may have role in the immune surveillance of these organs. Lanthionines are found in lantibiotics, which are peptide-derived, post-translationally modified antimicrobials produced by several bacterial strains. This region contains seven internal repeats.

    Interpro description:

    This family contains the lanthionine synthetase C-like proteins 1 and 2 which are related to the bacterial lanthionine synthetase components C (LanC). LANCL1(P40 seven-transmembrane-domain protein) and LANCL2 (testes-specific adriamycin sensitivity protein) are thought to be peptide-modifying enzyme components in eukaryotic cells. Both proteins are produced in large quantities in the brain and testes and may have role in the immune surveillance of these organs.

    In Arabidopsis thaliana (Mouse-ear cress) GCR2 is a plasma-membrane abscisic acid receptor, which interacts with GPA1 to mediate all known ABA responsis in A. thaliana.

    Proteins where this domain is known:
    PY02110   


    PF05148 - Methyltransf_8 (Pfam link)

    Interpro entry IPR007823 : (Interpro link)

    Pfam description:
    This family consists of several uncharacterised eukaryotic proteins which are related to methyltransferases Pfam:PF01209.

    Interpro description:
    This family consists of uncharacterised eukaryotic proteins which are related to S-adenosyl-L-methionine-dependent methyltransferases.

    Proteins where this domain is known:
    PY00928   


    PF05158 - RNA_pol_Rpc34 (Pfam link)

    Interpro entry IPR007832 : RNA polymerase Rpc34 (Interpro link)

    Pfam description:
    Subunit specific to RNA Pol III, the tRNA specific polymerase. The C34 subunit of yeast RNA Pol III is part of a subcomplex of three subunits which have no counterpart in the other two nuclear RNA polymerases. This subunit interacts with TFIIIB70 and is therefore participates in Pol III recruitment.

    Interpro description:
    The family comprises a subunit specific to RNA Pol III, the tRNA specific polymerase. The C34 subunit of Saccharomyces cerevisiae RNA Pol III is part of a subcomplex of three subunits which have no counterpart in the other two nuclear RNA polymerases. This subunit interacts with TFIIIB70 and therefore participates in Pol III recruitment.

    Proteins where this domain is known:
    PY07258   


    PF05175 - MTS (Pfam link)

    Interpro entry IPR007848 : Methyltransferase small (Interpro link)

    Pfam description:
    This domain is found in ribosomal RNA small subunit methyltransferase C (eg Swiss:P44453) as well as other methyltransferases (eg Swiss:Q53742).

    Interpro description:
    This domain is found in ribosomal RNA small subunit methyltransferase C (e.g. as well as other methyltransferases (e.g..

    Proteins where this domain is known:
    PY03902   


    PF05178 - Kri1 (Pfam link)

    Interpro entry IPR007851 : (Interpro link)

    Pfam description:
    The yeast member of this family (Kri1p) is found to be required for 40S ribosome biogenesis in the nucleolus.

    Interpro description:

    The Kri1 protein is also known as KRR1-interacting protein 1. The Saccharomyces cerevisiae member of this family is found to be required for the assembly of preribosomal 40S subunits in the nucleolus. KRR1 is highly expressed in dividing cells and its expression ceases almost completely when cells enter the stationary phase.

    This entry represents a subgroup of the KRR1 interacting protein 1.

    Proteins where this domain is known:
    PY01886   


    PF05180 - zf-DNL (Pfam link)

    Interpro entry IPR007853 : (Interpro link)

    Pfam description:
    The domain is named after a short C-terminal motif of D(N/H)L. This domain is a novel zinc-finger protein essential for protein import into mitochondria.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry prepresents the Zim17-type zinc finger motif thought to bind zinc. This domain is found in a number of eukaryotic proteins and is named after a short C-terminal motif of D(N/H)L. The domain is found in proteins having a novel zinc-finger essential for protein import into mitochondria.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY01199   


    PF05182 - Fip1 (Pfam link)

    Interpro entry IPR007854 : (Interpro link)

    Pfam description:
    This short motif is about 40 amino acids in length. In the Fip1 protein that is a component of a yeast pre-mRNA polyadenylation factor that directly interacts with poly(A) polymerase. This region of Fip1 is needed for the interaction with the Th1 subunit of the complex and for specific polyadenylation of the cleaved mRNA precursor.

    Interpro description:
    This short motif is about 40 amino acids in length. In the Fip1 protein that is a component of a Saccharomyces cerevisiae pre-mRNA polyadenylation factor that directly interacts with poly(A) polymerase. This region of Fip1 is needed for the interaction with the Yth1 subunit of the complex and for specific polyadenylation of the cleaved mRNA precursor.

    Proteins where this domain is known:
    PY00739   


    PF05185 - PRMT5 (Pfam link)

    Interpro entry IPR007857 : Skb1 methyltransferase (Interpro link)

    Pfam description:
    The human homologue of yeast Skb1 (Shk1 kinase-binding protein 1) is PRMT5, an arginine-N-methyltransferas. These proteins appear to be key mitotic regulators. They play a role in Jak signalling in higher eukaryotes.

    Interpro description:
    The human homologue of Saccharomyces cerevisiae Skb1 (Shk1 kinase-binding protein 1) is a protein methyltransferase. These proteins seem to play a role in Jak signalling.

    Proteins where this domain is known:
    PY07031   


    PF05186 - Dpy-30 (Pfam link)

    Interpro entry IPR007858 : (Interpro link)

    Pfam description:
    This motif is found in a wide variety of domain contexts. It is found in the Dpy-30 proteins hence the motifs name. It is about 40 residues long and is probably formed of two alpha-helices. It may be a dimerisation motif analogous to Pfam:PF02197 (Bateman A pers obs).

    Interpro description:

    This motif is about 40 residues long and is probably formed of two alpha-helices. It is found in the Dpy-30 proteins, hence the motifs name. Dpy-30 from Caenorhabditis elegans is an essential component of dosage compensation machinery and loss of dpy-30 activity results in XX-specific lethality; in XO animals, Dpy-30 is required for developmental processes other than dosage compensation. In yeast, the homologue of DPY-30, Saf19p, functions as part of the Set1 complex that is necessary for the methylation of histone H3 at lysine residue 4; Set1 is a key part of epigenetic developmental control. There is also a human homologue of Dpy-30. This Dpy-30 region may be a dimerisation motif analogous that found in the cAMP-dependent protein kinase regulator, type II PKA, R subunit

    Proteins where this domain is known:
    PY05887   


    PF05188 - MutS_II (Pfam link)

    Interpro entry IPR007860 : DNA mismatch repair protein MutS, connector (Interpro link)

    Pfam description:
    This domain is found in proteins of the MutS family (DNA mismatch repair proteins) and is found associated with Pfam:PF00488, Pfam:PF01624, Pfam:PF05192 and Pfam:PF05190. The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair; other members of the family included the eukaryotic MSH 1,2,3, 4,5 and 6 proteins. These have various roles in DNA repair and recombination. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein. This domain corresponds to domain II in Thermus aquaticus MutS as characterised in, and has similarity resembles RNAse-H-like domains (see Pfam:PF00075).

    Interpro description:

    Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA.

    MutS is a modular protein with a complex structure, and is composed of:

    Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts.

    This entry represents the connector domain (domain 2) found in proteins of the MutS family. The structure of the MutS connector domain consists of a parallel beta-sheet surrounded by four alpha helices, which is similar to the structure of the Holliday junction resolvase ruvC.

    Proteins where this domain has been detected by our approach:
    PY01096    PY02936   


    PF05190 - MutS_IV (Pfam link)

    Interpro entry IPR007861 : DNA mismatch repair protein MutS, clamp (Interpro link)

    Pfam description:
    This domain is found in proteins of the MutS family (DNA mismatch repair proteins) and is found associated with Pfam:PF01624, Pfam:PF05188, Pfam:PF05192 and Pfam:PF00488. The mutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair; other members of the family included the eukaryotic MSH 1,2,3, 4,5 and 6 proteins. These have various roles in DNA repair and recombination. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein. The aligned region corresponds in part with globular domain IV, which is involved in DNA binding, in Thermus aquaticus MutS as characterised in.

    Interpro description:

    Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA.

    MutS is a modular protein with a complex structure, and is composed of:

    Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts.

    This entry represents the clamp domain (domain 4) found in proteins of the MutS family. The clamp domain is inserted within the core domain at the top of the lever helices. It has a beta-sheet structure.

    Proteins where this domain is known:
    PY02936   

    Proteins where this domain has been detected by our approach:
    PY01096    PY07191   


    PF05191 - ADK_lid (Pfam link)

    Interpro entry IPR007862 : Adenylate kinase, zinc-finger lid region (Interpro link)

    Pfam description:
    Comparisons of adenylate kinases have revealed a particular divergence in the active site lid. In some organisms, particularly the Gram-positive bacteria, residues in the lid domain have been mutated to cysteines and these cysteine residues are responsible for the binding of a zinc ion. The bound zinc ion in the lid domain, is clearly structurally homologous to Zinc-finger domains. However, it is unclear whether the adenylate kinase lid is a novel zinc-finger DNA/RNA binding domain, or that the lid bound zinc serves a purely structural function.

    Interpro description:

    Adenylate kinases (ADK; are phosphotransferases that catalyse the Mg-dependent reversible conversion of ATP and AMP to two molecules of ADP, an essential reaction for many processes in living cells. In large variants of adenylate kinase, the AMP and ATP substrates are buried in a domain that undergoes conformational changes from an open to a closed state when bound to substrate; the ligand is then contained within a highly specific environment required for catalysis. Adenylate kinase is a 3-domain protein consisting of a large central CORE domain flanked by a LID domain on one side and the AMP-binding NMPbind domain on the other. The LID domain binds ATP and covers the phosphates at the active site. The substrates first bind the CORE domain, followed by closure of the active site by the LID and NMPbind domains.

    Comparisons of adenylate kinases have revealed a particular divergence in the active site lid. In some organisms, particularly the Gram-positive bacteria, residues in the lid domain have been mutated to cysteines and these cysteine residues (two CX(n)C motifs) are responsible for the binding of a zinc ion. The bound zinc ion in the lid domain is clearly structurally homologous to Zinc-finger domains. However, it is unclear whether the adenylate kinase lid is a novel zinc-finger DNA/RNA binding domain, or that the lid bound zinc serves a purely structural function.

    Proteins where this domain is known:
    PY01562   

    Proteins where this domain has been detected by our approach:
    PY02813   


    PF05192 - MutS_III (Pfam link)

    Interpro entry IPR007696 : DNA mismatch repair protein MutS, core (Interpro link)

    Pfam description:
    This domain is found in proteins of the MutS family (DNA mismatch repair proteins) and is found associated with Pfam:PF00488, Pfam:PF05188, Pfam:PF01624 and Pfam:PF05190. The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair; other members of the family included the eukaryotic MSH 1,2,3, 4,5 and 6 proteins. These have various roles in DNA repair and recombination. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein. The aligned region corresponds with domain III, which is central to the structure of Thermus aquaticus MutS as characterised in.

    Interpro description:

    Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA.

    MutS is a modular protein with a complex structure, and is composed of:

    Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts.

    This entry represents the core domain (domain 3) found in proteins of the MutS family. The core domain of MutS adopts a multi-helical structure comprised of two subdomains, which are interrupted by the clamp domain. Two of the helices in the core domain comprise the levers that extend towards the DNA.

    Proteins where this domain is known:
    PY01096    PY02936    PY07191   


    PF05193 - Peptidase_M16_C (Pfam link)

    Interpro entry IPR007863 : Peptidase M16, C-terminal (Interpro link)

    Pfam description:
    Peptidase M16 consists of two structurally related domains. One is the active peptidase, whereas the other is inactive. The two domains hold the substrate like a clamp.

    Interpro description:

    Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases.

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    These metallopeptidases belong to MEROPS peptidase family M16 (clan ME). They include proteins, which are classified as non-peptidase homologues either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for the catalytic activity.

    The peptidases in this group of sequences include:

    These proteins do not share many regions of sequence similarity; the most noticeable is in the N-terminal section. This region includes a conserved histidine followed, two residues later by a glutamate and another histidine. In pitrilysin, it has been shown that this H-x-x-E-H motif is involved in enzymatic activity; the two histidines bind zinc and the glutamate is necessary for catalytic activity. The mitochondrial processing peptidase consists of two structurally related domains. One is the active peptidase whereas the other, the C-terminal region, is inactive. The two domains hold the substrate like a clamp.

    Proteins where this domain is known:
    PY00244    PY01832    PY04232    PY04302    PY07032   

    Proteins where this domain has been detected by our approach:
    PY06052   


    PF05207 - zf-CSL (Pfam link)

    Interpro entry IPR007872 : (Interpro link)

    Pfam description:
    This is a zinc binding motif which contains four cysteine residues which chelate zinc. This domain is often found associated with a Pfam:PF00226 domain. This domain is named after the conserved motif of the final cysteine.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    This entry represents a probable zinc binding motif that contains four cysteines and may chelate zinc, known as the DPH-type after the diphthamide (DPH) biosynthesis protein in which it was first characterised, including the proteins DPH3 and DPH4. This domain is also found associated with N-terminal domain of heat shock protein DnaJdomain.

    Diphthamide is a unique post-translationally modified histidine residue found only in translation elongation factor 2 (eEF-2). It is conserved from archaea to humans and serves as the target for diphteria toxin and Pseudomonas exotoxin A. These two toxins catalyse the transfer of ADP-ribose to diphtamide on eEF-2, thus inactivating eEF-2, halting cellular protein synthesis, and causing cell death. The biosynthesis of diphtamide is dependent on at least five proteins, DPH1 to -5, and a still unidentified amidating enzyme. DPH3 and DPH4 share a conserved region, which encode a putative zinc finger, the DPH-type or CSL-type (after the conserved motif of the final cysteine) zinc finger. The function of this motif is unknown.

    More information about these proteins can be found at Protein of the Month: Zinc Fingers.

    Proteins where this domain is known:
    PY04730   

    Proteins where this domain has been detected by our approach:
    PY04661   


    PF05221 - AdoHcyase (Pfam link)

    Interpro entry IPR000043 : S-adenosyl-L-homocysteine hydrolase (Interpro link)

    Interpro description:
    S-adenosyl-L-homocysteine hydrolase (AdoHcyase) is an enzyme of the activated methyl cycle, responsible for the reversible hydration of S-adenosyl-L-homocysteine into adenosine and homocysteine. AdoHcyase is an ubiquitous enzyme which binds and requires NAD+ as a cofactor. AdoHcyase is a highly conserved protein of about 430 to 470 amino acids. The family contains a glycine-rich region in the central part of AdoHcyase; a region thought to be involved in NAD-binding.

    Proteins where this domain is known:
    PY02893   


    PF05222 - AlaDh_PNT_N (Pfam link)

    Interpro entry IPR007886 : Alanine dehydrogenase/PNT, N-terminal (Interpro link)

    Pfam description:
    This family now also contains the lysine 2-oxoglutarate reductases.

    Interpro description:

    Alanine dehydrogenases and pyridine nucleotide transhydrogenase have been shown to share regions of similarity. Alanine dehydrogenase catalyzes the NAD-dependent reversible reductive amination of pyruvate into alanine. Pyridine nucleotide transhydrogenase catalyzes the reduction of NADP+ to NADPH with the concomitant oxidation of NADH to NAD+. This enzyme is located in the plasma membrane of prokaryotes and in the inner membrane of the mitochondria of eukaryotes. The transhydrogenation between NADH and NADP is coupled with the translocation of a proton across the membrane. In prokaryotes the enzyme is composed of two different subunits, an alpha chain (gene pntA) and a beta chain (gene pntB), while in eukaryotes it is a single chain protein. The sequence of alanine dehydrogenase from several bacterial species are related with those of the alpha subunit of bacterial pyridine nucleotide transhydrogenase and of the N-terminal half of the eukaryotic enzyme. The two most conserved regions correspond respectively to the N-terminal extremity of these proteins, represented in this entry, and to a central glycine-rich region which is part of the NAD(H)-binding site.

    Proteins where this domain is known:
    PY05907   


    PF05237 - MoeZ_MoeB (Pfam link)

    Interpro entry IPR007901 : (Interpro link)

    Pfam description:
    This putative domain is found in the MoeZ protein and the MoeB protein. The domain has two CXXC motifs that are only partly conserved.

    Interpro description:

    This putative domain is found in the MoeZ protein and the MoeB protein. The domain has two CXXC motifs that are only partly conserved. MoeZ is necessary for the synthesis of pyridine-2,6-bis(thiocarboxylic acid), a small secreted metabolite that has a high affinity for transition metals, increases iron uptake efficiency by 20% in Pseudomonas stutzeri, has the ability to reduce both soluble and mineral forms of iron, and has antimicrobial activity towards several species of bacteria. MoeB is the molybdopterin synthase activating enzyme in the molybdopterin cofactor biosynthesis pathway. Both these enzymes are members of a superfamily consisting of related but structurally distinct proteins that are members of pathways involved in the transfer of sulphur-containing moieties to metabolites and both also contain the UBA/THIF-type NAD/FAD binding fold.

    Proteins where this domain is known:
    PY02846   


    PF05277 - DUF726 (Pfam link)

    Interpro entry IPR007941 : (Interpro link)

    Pfam description:
    This family consists of several uncharacterised eukaryotic proteins.

    Interpro description:

    This family consists of several uncharacterised eukaryotic proteins.

    Proteins where this domain is known:
    PY07444   


    PF05282 - AAR2 (Pfam link)

    Interpro entry IPR007946 : (Interpro link)

    Pfam description:
    This family consists of several eukaryotic AAR2-like proteins. The yeast protein AAR2 is involved in splicing pre-mRNA of the a1 cistron and other genes that are important for cell growth.

    Interpro description:

    This family consists of several eukaryotic AAR2-like proteins. The Saccharomyces cerevisiae protein AAR2 is involved in splicing pre-mRNA of the a1 cistron and other genes that are important for cell growth.

    Proteins where this domain is known:
    PY00560   


    PF05290 - Baculo_IE-1 (Pfam link)

    Interpro entry IPR007954 : (Interpro link)

    Pfam description:
    The Autographa californica multinucleocapsid nuclear polyhedrosis virus (AcMNPV) ie-1 gene product (IE-1) is thought to play a central role in stimulating early viral transcription. IE-1 has been demonstrated to activate several early viral gene promoters and to negatively regulate the promoters of two other AcMNPV regulatory genes, ie-0 and ie-2. It is thought that that IE-1 negatively regulates the expression of certain genes by binding directly, or as part of a complex, to promoter regions containing a specific IE-1-binding motif (5\'-ACBYGTAA-3\') near their mRNA start sites.

    Interpro description:

    This entry contains the Baculovirus immediate-early protein IE-0.

    Proteins where this domain has been detected by our approach:
    PY00764   


    PF05291 - Bystin (Pfam link)

    Interpro entry IPR007955 : (Interpro link)

    Pfam description:
    Trophinin and tastin form a cell adhesion molecule complex that potentially mediates an initial attachment of the blastocyst to uterine epithelial cells at the time of implantation. Trophinin and tastin bind to an intermediary cytoplasmic protein called bystin. Bystin may be involved in implantation and trophoblast invasion because bystin is found with trophinin and tastin in the cells at human implantation sites and also in the intermediate trophoblasts at invasion front in the placenta from early pregnancy. This family also includes the yeast protein ENP1. ENP1 is an essential protein in Saccharomyces cerevisiae and is localised in the nucleus. It is thought that ENP1 plays a direct role in the early steps of rRNA processing as enp1 defective yeast cannot synthesise 20S pre-rRNA and hence 18S rRNA, which leads to reduced formation of 40S ribosomal subunits.

    Interpro description:

    Trophinin and tastin form a cell adhesion molecule complex that potentially mediates an initial attachment of the blastocyst to uterine epithelial cells at the time of implantation. Trophinin and tastin bind to an intermediary cytoplasmic protein called bystin. Bystin may be involved in implantation and trophoblast invasion because bystin is found with trophinin and tastin in the cells at human implantation sites and also in the intermediate trophoblasts at invasion front in the placenta from early pregnancy. This family also includes the Saccharomyces cerevisiae protein ENP1. ENP1 is an essential protein in S. cerevisiae and is localised in the nucleus. It is thought that ENP1 plays a direct role in the early steps of rRNA processing as enp1 defective S. cerevisiae cannot synthesise 20S pre-rRNA and hence 18S rRNA, which leads to reduced formation of 40S ribosomal subunits.

    Proteins where this domain is known:
    PY00699   


    PF05345 - He_PIG (Pfam link)

    Interpro entry IPR008009 : (Interpro link)

    Pfam description:
    This alignment represents the conserved core region of ~90 residue repeat found in several haemagglutinins and other cell surface proteins. Sequence similarities to (Pfam:PF02494) and (Pfam:PF00801) suggest an Ig-like fold (personal obs:C. Yeats). So this family may be similar in function to the (Pfam:PF02639) and (Pfam:PF02638) domains. This domain is also found in the WisP family of proteins of Tropheryma whipplei ).

    Interpro description:

    This alignment represents the conserved core region of a ~90 residue repeat found in several haemagglutinins and other cell surface proteins. Sequence similarities to Hyalin and the PKD domain suggest an Ig-like fold so this family may be similar in function to the and protein families.

    Proteins where this domain has been detected by our approach:
    PY00076   


    PF05346 - DUF747 (Pfam link)

    Interpro entry IPR008010 : (Interpro link)

    Pfam description:
    This family is a family of eukaryotic membrane proteins. It was previously annotated as including a putative receptor for human cytomegalovirus gH but this has has since been disputed. Analysis of the mouse Tapt1 protein (transmembrane anterior posterior transformation 1) has shown it to be involved in patterning of the vertebrate axial skeleton.

    Interpro description:

    This family is a family of eukaryotic membrane proteins. It was previously annotated as including a putative receptor for human cytomegalovirus gH but this has has since been disputed. Analysis of the mouse Tapt1 protein (transmembrane anterior posterior transformation 1) has shown it to be involved in patterning of the vertebrate axial skeleton.

    Proteins where this domain is known:
    PY03667   


    PF05362 - Lon_C (Pfam link)

    Interpro entry IPR008269 : Peptidase S16, lon C-terminal (Interpro link)

    Pfam description:
    The Lon serine proteases must hydrolyse ATP to degrade protein substrates. In Escherichia coli, these proteases are involved in turnover of intracellular proteins, including abnormal proteins following heat-shock. The active site for protease activity resides in a C-terminal domain. The Lon proteases are classified as family S16 in Merops.

    Interpro description:

    Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases.

    Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base. The geometric orientations of the catalytic residues are similar between families, despite different protein folds. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC).

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    This signature defines the C-terminal proteolytic domain of the archael, bacterial and eukaryotic lon proteases, which are ATP-dependent serine peptidases belonging to the MEROPS peptidase family S16 (lon protease family, clan SF). In the eukaryotes the majority of the proteins are located in the mitochondrial matrix. In yeast, Pim1, is located in the mitochondrial matrix, is required for mitochondrial function, is constitutively expressed but is increased after thermal stress, suggesting that Pim1 may play a role in the heat shock response.

    Proteins where this domain is known:
    PY06406   


    PF05391 - Lsm_interact (Pfam link)

    Interpro entry IPR008669 : (Interpro link)

    Pfam description:
    This short motif is found at the C-terminus of Prp24 proteins and probably interacts with the Lsm proteins to promote U4/U6 formation.

    Interpro description:
    This short motif is found at the C terminus of Prp24 proteins and probably interacts with the Lsm proteins to promote U4/U6 formation.

    Proteins where this domain is known:
    PY03602   


    PF05424 - Duffy_binding (Pfam link)

    Interpro entry IPR008602 : Plasmodium Duffy binding (Interpro link)

    Pfam description:
    This domain is found in Plasmodium Duffy binding proteins. Plasmodium vivax and Plasmodium knowlesi merozoites invade human erythrocytes that express Duffy blood group surface determinants. The Duffy receptor family is localised in micronemes, an organelle found in all organisms of the phylum Apicomplexa.

    Interpro description:
    This family contains several Plasmodium Duffy binding proteins. Plasmodium vivax and Plasmodium knowlesi merozoites invade Homo sapiens erythrocytes that express Duffy blood group surface determinants. The Duffy receptor family is localised in micronemes, an organelle found in all organisms of the phylum Apicomplexa.

    Proteins where this domain is known:
    PY04764   


    PF05470 - eIF-3c_N (Pfam link)

    Interpro entry IPR008905 : Eukaryotic translation initiation factor 3 subunit 8, N-terminal (Interpro link)

    Pfam description:
    The largest of the mammalian translation initiation factors, eIF3, consists of at least eight subunits ranging in mass from 35 to 170 kDa. eIF3 binds to the 40 S ribosome in an early step of translation initiation and promotes the binding of methionyl-tRNAi and mRNA.

    Interpro description:
    The largest of the mammalian translation initiation factors, eIF3, consists of at least eight subunits ranging in mass from 35 to 170 kDa. eIF3 binds to the 40 S ribosome in an early step of translation initiation and promotes the binding of methionyl-tRNAi and mRNA.

    Proteins where this domain is known:
    PY00916   


    PF05495 - zf-CHY (Pfam link)

    Interpro entry IPR008913 : Zinc finger, CHY-type (Interpro link)

    Pfam description:
    This family of domains are likely to bind to zinc ions. They contain many conserved cysteine and histidine residues. We have named this domain after the N-terminal motif CXHY. This domain can be found in isolation in some proteins, but is also often associated with Pfam:PF00097. One of the proteins in this family (Swiss:P36078) is a mitochondrial intermembrane space protein called Hot13. This protein is involved in the assembly of small TIM complexes.

    Interpro description:

    Zinc finger (Znf) domains are relatively small protein motifs that bind one or more zinc atoms, and which usually contain multiple finger-like protrusions that make tandem contacts with their target molecule. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.

    (Note that in certain cases, some Znf domains have diverged such that they still maintain their core structure, but have lost their ability to bind zinc, using other means such as salt bridges or binding to other metals to stabilise the finger-like folds. These domains can show strong sequence identity to zinc-binding motifs, and may therefore be included in Znf entries).

    Pirh2 is an eukaryotic ubiquitin protein ligase, which has been shown to promote p53 degradation in mammals. Pirh2 physically interacts with p53 and promotes ubiquitination of p53 independently of MDM2. Like MDM2, Pirh2 is thought to participate in an autoregulatory feedback loop that controls p53 function. Pirh2 proteins contain three distinct zinc fingers, the CHY-type, the CTCHY-type which is C-terminal to the CHY-type zinc finger and a RING finger. The CHY-type zinc finger has no currently known function.

    As well as Pirh2, the CHY-type zinc finger is also found in the following proteins:

    The solution structure of this zinc finger has been solved and binds 3 zinc atoms as shown in the following schematic representation:

    More information about these proteins can be found at Protein of the Month: Zinc Fingers

    Proteins where this domain is known:
    PY01868   


    PF05517 - p25-alpha (Pfam link)

    Interpro entry IPR008907 : (Interpro link)

    Pfam description:
    This family encodes a 25 kDa protein that is phosphorylated by a Ser/Thr-Pro kinase. It has been described as a brain specific protein, but it is found in Tetrahymena thermophila.

    Interpro description:
    This family encodes a 25 kDa protein that is phosphorylated by a Ser/Thr-Pro kinase. It has been described as a brain specific protein, but it is found in Tetrahymena thermophila.

    Proteins where this domain is known:
    PY05543   


    PF05602 - CLPTM1 (Pfam link)

    Interpro entry IPR008429 : (Interpro link)

    Pfam description:
    This family consists of several eukaryotic cleft lip and palate transmembrane protein 1 sequences. Cleft lip with or without cleft palate is a common birth defect that is genetically complex. The nonsyndromic forms have been studied genetically using linkage and candidate-gene association studies with only partial success in defining the loci responsible for orofacial clefting. CLPTM1 encodes a transmembrane protein and has strong homology to two Caenorhabditis elegans genes, suggesting that CLPTM1 may belong to a new gene family. This family also contains the human cisplatin resistance related protein CRR9p which is associated with CDDP-induced apoptosis.

    Interpro description:
    This family consists of several eukaryotic cleft lip and palate transmembrane protein 1 sequences. Cleft lip with or without cleft palate is a common birth defect that is genetically complex. The nonsyndromic forms have been studied genetically using linkage and candidate-gene association studies with only partial success in defining the loci responsible for orofacial clefting. CLPTM1 encodes a transmembrane protein and has strong homology to two Caenorhabditis elegans genes, suggesting that CLPTM1 may belong to a new gene family. This family also contains the Homo sapiens cisplatin resistance related protein CRR9p which is associated with CDDP-induced apoptosis.

    Proteins where this domain is known:
    PY01217   


    PF05608 - DUF778 (Pfam link)

    Interpro entry IPR008496 : (Interpro link)

    Pfam description:
    This family consists of several eukaryotic proteins of unknown function.

    Interpro description:
    This family consists of several eukaryotic proteins of unknown function.

    Proteins where this domain is known:
    PY04879   


    PF05646 - DUF786 (Pfam link)

    Interpro entry IPR008504 : (Interpro link)

    Pfam description:
    This family consists of several eukaryotic proteins of unknown function.

    Interpro description:
    This family consists of several eukaryotic proteins of unknown function.

    Proteins where this domain is known:
    PY02024   


    PF05653 - DUF803 (Pfam link)

    Interpro entry IPR008521 : (Interpro link)

    Pfam description:
    This family consists of several eukaryotic proteins of unknown function.

    Interpro description:
    This family consists of several eukaryotic proteins of unknown function.

    Proteins where this domain is known:
    PY00042   


    PF05669 - SOH1 (Pfam link)

    Interpro entry IPR008831 : SOH1 (Interpro link)

    Pfam description:
    The family consists of Saccharomyces cerevisiae SOH1 homologues. SOH1 is responsible for the repression of temperature sensitive growth of the HPR1 mutant and has been found to be a component of the RNA polymerase II transcription complex. SOH1 not only interacts with factors involved in DNA repair, but transcription as well. Thus, the SOH1 protein may serve to couple these two processes.

    Interpro description:
    The family consists of Saccharomyces cerevisiae SOH1 homologues. SOH1 is responsible for the repression of temperature sensitive growth of the HPR1 mutant and has been found to be a component of the RNA polymerase II transcription complex. SOH1 not only interacts with factors involved in DNA repair, but transcription as well. Thus, the SOH1 protein may serve to couple these two processes.

    Proteins where this domain is known:
    PY00705   


    PF05670 - DUF814 (Pfam link)

    Interpro entry IPR008532 : (Interpro link)

    Pfam description:
    This domain occurs in proteins that have been annotated as Fibronectin/fibrinogen binding protein by similarity. This annotation comes from Swiss:O34693 where the N-terminal region is involved in this activity. Hence the activity of this C-terminal domain is unknown. This domain contains a conserved motif D/E-X-W/Y-X-H that may be functionally important.

    Interpro description:
    This domain occurs in proteins that have been annotated as Fibronectin/fibrinogen binding protein by similarity. This annotation comes fromwhere the N-terminal region is involved in this activity. Hence the activity of this C-terminal domain is unknown. This domain contains a conserved motif D/E-X-W/Y-X-H that may be functionally important.

    Proteins where this domain is known:
    PY01520    PY03214   


    PF05671 - GETHR (Pfam link)

    Interpro entry IPR008627 : (Interpro link)

    Pfam description:
    This pentapeptide repeat is found mainly in C. elegans. The most conserved amino acid at each position leads to its name GETHR (Bateman A unpublished obs.). The family also includes a divergent repeat in a microneme protein Swiss:Q26588. The function of this repeat is unknown.

    Interpro description:
    This pentapeptide repeat is found mainly in Caenorhabditis elegans. The most conserved amino acid at each position leads to its name GETHR. The family also includes a divergent repeat in a microneme protein The function of this repeat is unknown.

    Proteins where this domain is known:
    PY00354    PY01748    PY02675    PY02988    PY05072    PY05789   


    PF05681 - Fumerase (Pfam link)

    Interpro entry IPR004646 : Fe-S type hydro-lyases tartrate/fumarate alpha region (Interpro link)

    Pfam description:
    This family consists of several bacterial fumarate hydratase proteins FumA and FumB. Fumarase, or fumarate hydratase (EC 4.2.1.2), is a component of the citric acid cycle. In facultative anaerobes such as Escherichia coli, fumarase also engages in the reductive pathway from oxaloacetate to succinate during anaerobic growth. Three fumarases, FumA, FumB, and FumC, have been reported in E. coli. fumA and fumB genes are homologous and encode products of identical sizes which form thermolabile dimers of Mr 120,000. FumA and FumB are class I enzymes and are members of the iron-dependent hydrolases, which include aconitase and malate hydratase. The active FumA contains a 4Fe-4S centre, and it can be inactivated upon oxidation to give a 3Fe-4S centre.

    Interpro description:

    A number of Fe-S cluster-containing hydro-lyases share a conserved motif, including argininosuccinate lyase, adenylosuccinate lyase, aspartase, class I fumarate hydratase (fumarase), and tartrate dehydratase (see. Proteins in this group represent a subset of closely related proteins or modules, including the Escherichia coli tartrate dehydratase alpha chain and the N-terminal region of the class I fumarase (where the C-terminal region is homologous to the tartrate dehydratase beta chain). The activity of archaeal proteins in this group is unknown.

    Proteins where this domain is known:
    PY05182   


    PF05683 - Fumerase_C (Pfam link)

    Interpro entry IPR004647 : Fe-S type hydro-lyases tartrate/fumarate beta region (Interpro link)

    Pfam description:
    This family consists of the C terminal region of several bacterial fumarate hydratase proteins (FumA and FumB). Fumarase, or fumarate hydratase (EC 4.2.1.2), is a component of the citric acid cycle. In facultative anaerobes such as Escherichia coli, fumarase also engages in the reductive pathway from oxaloacetate to succinate during anaerobic growth.

    Interpro description:

    A number of Fe-S cluster-containing hydro-lyases share a conserved motif, including argininosuccinate lyase, adenylosuccinate lyase, aspartase, class I fumarate hydratase (fumarase), and tartrate dehydratase (see. Proteins in this group represent a subset of closely related proteins or modules, including the Escherichia coli tartrate dehydratase beta chain and the C-terminal region of the class I fumarase (where the N-terminal region is homologous to the tartrate dehydratase alpha chain). The activity of the archaeal proteins in this group is unknown.

    Proteins where this domain is known:
    PY05182   


    PF05684 - DUF819 (Pfam link)

    Interpro entry IPR008537 : (Interpro link)

    Pfam description:
    This family contains proteins of unknown function from archaeal, bacterial and plant species.

    Interpro description:
    This family contains proteins of unknown function from archaeal, bacterial and plant species.

    Proteins where this domain is known:
    PY02568   


    PF05700 - BCAS2 (Pfam link)

    Interpro entry IPR008409 : (Interpro link)

    Pfam description:
    This family consists of several eukaryotic sequences of unknown function. The mammalian members of this family are annotated as breast carcinoma amplified sequence 2 (BCAS2) proteins. BCAS2 is a putative spliceosome associated protein.

    Interpro description:
    This family consists of several eukaryotic sequences of unknown function. The mammalian members of this family are annotated as breast carcinoma amplified sequence 2 (BCAS2) proteins. BCAS2 is a putative spliceosome associated protein.

    Proteins where this domain is known:
    PY01785   


    PF05739 - SNARE (Pfam link)

    Interpro entry IPR000727 : (Interpro link)

    Pfam description:
    Most if not all vesicular membrane fusion events in eukaryotic cells are believed to be mediated by a conserved fusion machinery, the SNARE machinery. The SNARE domain is thought to act as a protein-protein interaction module in the assembly of a SNARE protein complex.

    Interpro description:

    The process of vesicular fusion with target membranes depends on a set of SNAREs (SNAP-Receptors), which are associated with the fusing membranes. Target SNAREs (t-SNAREs) are localised on the target membrane and belong to two different families, the syntaxin-like family and the SNAP-25 like family. One member of each family, together with a v-SNARE localised on the vesicular membrane, are required for fusion.

    The Syntaxins are type-I transmembrane proteins that contain several regions with coiled-coil propensity in their cytosolic part, the SNARE motif. SNAP-25 is a protein consisting of two coiled-coil regions, which is associated with the membrane by lipid anchors. SNARE motifs assemble into parallel four helix bundles stabilised by the burial of these hydrophobic helix faces in the bundle core. Monomeric SNARE motifs are disordered so this assembly reaction is accompanied by a dramatic increase in alpha-helical secondary structure. The parallel arrangement of SNARE motifs within complexes bring the transmembrane anchors, and the two membranes, into close proximity. Recently, it was shown that the two coiled-coil regions of SNAP-25 and one of the coiled-coil regions of the syntaxins are related. This domain is found in both Syntaxin and SNAP-25 families as well as in other proteins.

    Proteins where this domain is known:
    PY01811    PY03484    PY03571    PY07007    PY07410   


    PF05746 - DALR_1 (Pfam link)

    Interpro entry IPR008909 : DALR anticodon binding (Interpro link)

    Pfam description:
    This all alpha helical domain is the anticodon binding domain in Arginyl and glycyl tRNA synthetase. This domain is known as the DALR domain after characteristic conserved amino acids.

    Interpro description:

    The aminoacyl-tRNA synthetases catalyse the attachment of an amino acid to its cogn