Protein
From Hwiki
This article is about a class of organic compounds. For information about the dietary role of protein, see protein in human nutrition. For the rock band Protein, see Protein (band).
A protein (from the Greek protas meaning "of primary importance") is a complex, high-molecular-mass organic compound that consists of amino acids arranged in a linear chain linked by peptide bonds. Proteins were discovered by Jöns Jakob Berzelius in 1838 and are among the most actively-studied molecules in biochemistry. Like other biological macromolecules such as polysaccharides, lipids, and nucleic acids, proteins are essential components of all living organisms. Many proteins are enzymes that catalyze biochemical reactions within the cell, a critical function in metabolism and biosynthesis. Other proteins play principally structural or mechanical roles, such as those that comprise a cell's cytoskeleton (effectively a system of scaffolding that maintains the cell's shape and size). Proteins are also important components of cell signaling, the immune response, cell adhesion, the cell cycle, and essentially every process within a living cell.
All proteins are polymers whose specific amino acid sequence is specified by a gene encoded in DNA. The first step in synthesizing a protein from a gene is called transcription, in which a protein enzyme known as RNA polymerase reads the sequence of nucleotides that make up the gene and synthesizes a messenger RNA molecule. In the second step, known as translation, specialized cellular machinery called the ribosome reads three-base segments called codons from the messenger RNA that specify the order in which new amino acids are added to the growing protein. In eukaryotes transcription occurs in the nucleus and translation in the cytoplasm, but prokaryotes do not separate the processes into different cellular compartments. Although the genetic code specifies 20 "standard" amino acids, the residues in a newly synthesized protein are sometimes chemically altered in a process known as post-transcriptional modification before the protein can assume its functional role in the cell. It is very common for proteins to work together to achieve a particular function, and often physically associate with one another to form a complex.
Most plants and microorganisms can biosynthesize all of the 20 standard amino acids, while animals can only synthesize a subset and must obtain the rest, called essential amino acids, from their diet. Through the process of digestion, animals break down ingested proteins into free amino acids that can be used for further protein synthesis. The number of amino acids that an animal cannot synthesize depends on the species and the age of the organism.
Contents |
Biochemistry
Proteins are linear heteropolymers built from 20 different L-alpha-amino acids. All amino acids share common structural features including an alpha carbon to which an amino group, a carboxyl group, and a variable side chain are bonded. Amino acids in a protein are linked by a peptide bond formed by a dehydration reaction. Once linked in the protein chain, an individual amino acid is often called a residue. The peptide bond has two resonance forms that contribute some double bond character and inhibit rotation around its axis, forcing the alpha carbons to be roughly coplanar. The other two dihedral angles in the peptide bond are key determinants of the secondary structure assumed by the protein's backbone.
Due to the chemical structure of the individual amino acids, the overall polymer chain has directionality. The end of the protein with a free carboxyl group is known as the C-terminus or carboxy terminus, while the end with a free amino group is known as the N-terminus or amino terminus.
There is some ambiguity between the usage of the words protein, polypeptide, and peptide. Protein is generally used to refer to the complete biological molecule, while peptide is generally reserved for a short amino acid oligomer often lacking in well-defined structure. However, the boundary between the two is ill-defined and usually lies near 20-30 residues<ref name="Lodish">Lodish H, Berk A, Matsudaira P, Kaiser CA, Krieger M, Scott MP, Zipurksy SL, Darnell J. (2004). Molecular Cell Biology 5th ed. WH Freeman and Company: New York, NY.</ref>. Polypeptide can refer to any single linear chain of amino acids, usually regardless of length, but often implies an absence of defined tertiary structure.
Synthesis
Template:Main article Proteins are assembled from amino acids using information derived from genes. The genetic code is a set of three-nucleotide sequences called codons that specify particular amino acids to be added to a new protein chain. Because DNA contains four nucleotides, the total number of possible codons is 64; hence, there is some redundancy in the genetic code and some amino acids are specified by more than one codon. Genes encoded in DNA are first transcribed into pre-messenger RNA (mRNA) by proteins such as RNA polymerase. Most organisms then process the pre-mRNA (also known as a primary transcript) using various forms of post-transcriptional modification to form the mature mRNA, which is then used as a template for protein synthesis by specialized cellular machinery called the ribosome. In prokaryotes the mRNA may be moved from its region of synthesis in the nucleoid; in eukaryotes the mRNA is synthesized in the cell nucleus and then translocated across the nuclear membrane into the cytoplasm, where protein synthesis takes place. The rate of protein synthesis is higher in prokaryotes than eukaryotes and can reach up to 20 amino acids per second<ref name="Dobson">Dobson CM. (2000). The nature and significance of protein folding. In Mechanisms of Protein Folding 2nd ed. Ed. RH Pain. Frontiers in Molecular Biology series. Oxford University Press: New York, NY.</ref>.
The process of synthesizing a protein from an mRNA template is known as translation. The mRNA is loaded onto the ribosome and is read three nucleotides at a time by matching each codon to its base pairing anticodon located on a transfer RNA molecule, which carries the amino acid corresponding to the codon it recognizes. The enzyme aminoacyl tRNA synthetase "charges" the tRNA molecules with the correct amino acids. The growing polypeptide is often termed the nascent chain. Proteins are always biosynthesized from N-terminus to C-terminus.
The size of a synthesized protein can be measured by the number of amino acids it contains and by its total molecular mass, which is normally reported in units of daltons (synonymous with atomic mass units), or the derivative unit kilodalton (kDa). Yeast proteins are on average 466 amino acids long and 53 kDa in mass<ref name="Lodish" />. The largest known proteins are the titins, a component of the muscle sarcomere, with a molecular mass of almost 3,000 kDa and a total length of almost 27,000 amino acids<ref>Fulton AB, Isaacs WB. (1991). Titin, a huge, elastic sarcomeric protein with a probable role in morphogenesis. Bioessays 13(4):157-61.</ref>.
Chemical synthesis
Short proteins can also be synthesized chemically in the laboratory by a family of techniques known as chemical ligation. Chemical synthesis allows for the introduction of non-natural amino acids into polypeptide chains, such as attachment of fluorescent probes to amino acid side chains. These methods are slow and difficult enough to have little direct practical application, but are extremely useful in laboratory biochemistry and cell biology. In general chemical synthesis is only efficient for proteins up to about 300 amino acids.
Structure of Proteins
Most proteins fold into unique 3-dimensional structures. The shape into which a protein naturally folds is known as its native state. Although many proteins can fold unassisted simply through the structural propensities of their component amino acids, others require the aid of molecular chaperones to efficiently fold to their native states. Biochemists often refer to four distinct aspects of a protein's structure:
- Primary structure: the amino acid sequence
- Secondary structure: regularly repeating, non-cooperatively-folding, local structures stabilized by hydrogen bonds. The most common examples are the alpha helix and beta sheet, though random coil regions with no defined hydrogen bonding pattern or characteristic shape are also frequent<ref name="Branden">Branden C, Tooze J. (1999). Introduction to Protein Structure 2nd ed. Garland Publishing: New York, NY</ref>. Because secondary structures are local, many different individual secondary structures can be present in the same protein molecule.
- Tertiary structure: the overall shape of a single protein molecule; the spatial relationship of the secondary structures to one another. Tertiary structure is generally stabilized by nonlocal interactions, most commonly the formation of a hydrophobic core, but also through salt bridges, hydrogen bonds, disulfide bonds, and even post-translational modifications. The term "tertiary structure" is often used as synonymous with the term fold.
- Quaternary structure: the shape or structure that results from the interaction of more than one protein molecule, usually called protein subunits in this context, which function as part of the larger assembly or protein complex.
In addition to these levels of structure, proteins may shift between several related structures in performing their biological function. In the context of these functional rearrangements, these tertiary or quaternary structures are usually referred to as "conformations," and transitions between them are called conformational changes. Such changes are often induced by the binding of a substrate molecule to an enzyme's active site, or the physical region of the protein that participates in chemical catalysis.
Proteins can be informally divided into three main classes, which correlate with typical tertiary structures: globular proteins, fibrous proteins, and membrane proteins. Almost all globular proteins are soluble and many are enzymes. Fibrous proteins are often structural; membrane proteins often serve as receptors or provide channels for polar or charged molecules to pass through the cell membrane.
Structure determination
Discovering the tertiary structure of a protein, or the quaternary structure of its complexes, can provide important clues about how the protein performs its function. Common experimental methods of structure determination include X-ray crystallography and NMR spectroscopy, both of which can produce information at atomic resolution. Cryoelectron microscopy is used to produce lower-resolution structural information about very large protein complexes, including assembled viruses<ref name="Branden" />; a variant known as electron crystallography can also produce high-resolution information in some cases, especially for two-dimensional crystals of membrane proteins<ref>Gonen T, Cheng Y, Sliz P, Hiroaki Y, Fujiyoshi Y, Harrison SC, Walz T. (2005). Lipid-protein interactions in double-layered two-dimensional AQP0 crystals. Nature 438(7068):633-8.</ref>. Solved structures are usually deposited in the Protein Data Bank (PDB), a freely available resource from which structural data about thousands of proteins can be obtained in the form of Cartesian coordinates for each atom in the protein.
There are many more known gene sequences than there are solved protein structures. Further, the set of solved structures is biased toward those proteins that can be easily subjected to the experimental conditions required by one of the major structure determination methods. In particular, globular proteins are comparatively easy to crystallize in preparation for X-ray crystallography, which remains the oldest and most common structure determination technique. Membrane proteins, by contrast, are difficult to crystallize and are underrepresented in the PDB<ref>Walian P, Cross TA, Jap BK. (2004). Structural genomics of membrane proteins Genome Biol 5(4): 215.</ref>. Structural genomics initiatives have attempted to remedy these deficiencies by systematically solving representative structures of major fold classes. Protein structure prediction methods attempt to provide a means of generating a plausible structure for a proteins whose structures have not been experimentally determined.
Protein folding
The Levinthal paradox
The process by which the higher-order structures of proteins are formed is called protein folding and is largely a consequence of the primary structure. The mechanism of protein folding is not yet well understood, although it is clear that the process of folding is not a simple search of all possible conformations for a given amino acid sequence. The challenge inherent in protein folding is illustrated by the Levinthal paradox, which demonstrates that proteins cannot possibly assume their native state by exhaustive conformational search. Conservatively assuming that each residue in a protein can assume one of three possible secondary structures, a protein of 150 residues would have 3150 possible conformations. At an (extemely rapid) assumed search rate of one conformation per picosecond, it would take about 1048 years to fully search all possible conformations. In vivo, protein folding rates range from a few microseconds to several minutes<ref name="Branden" />, clearly illustrating that the process of assuming the native state must be a biased search. Although many proteins can fold unassisted, proteins or complexes within the cell called chaperones serve to accelerate folding and prevent misfolding or aggregation. Many of the so-called heat shock proteins, expressed only when the cell is exposed to elevated temperatures, are chaperones that specialize in preventing aggregation of misfolded or partially folded polypeptides<ref>Leroux MR, Hartl FU. (2000). Cellular Functions of Molecular Chaperones. In Mechanisms of Protein Folding 2nd ed. Ed. RH Pain. Frontiers in Molecular Biology series. Oxford University Press: New York, NY.</ref>.
In vivo folding
Some proteins may have multiple stable folded conformations, especially under slightly different environmental conditions - this effect is exploited by proteins such as the influenza viral coat protein hemagglutinin, which is "trapped" during the folding process in a metastable state and undergoes a dramatic conformational change on exposure to low pH<ref name="Branden" />. The folding funnel theory of protein folding suggests that a protein's native state lies at a free energy minimum on a multidimensional energy landscape. In general, only the native conformation or family of conformations has biological activity. However, an exception to this generality is found in the class of intrinsically unstructured proteins, which can fold in multiple structures with different activities.
Many proteins contain multiple structural domains that fold essentially independently. Such proteins may fold co-translationally - that is, the folding process of the N-terminal portion of the polypeptide begins while the C-terminal portion is still being synthesized - or post-translationally. Folding is not necessarily a continuous process and individual molecules of the same amino acid sequence may follow heterogeneous folding pathways, especially in proteins that contain disulfide bonds or multiple proline residues, whose unique side chain structure allows them to assume either the cis or the trans conformer around their peptide bonds. Specialized chaperone enzymes such as protein disulfide isomerase and prolyl isomerases assist in folding proteins with these features. Proteins that have misfolded after synthesis or have become denatured due to cellular stress are recognized by chaperones by their unusually high exposure of hydrophobic residues on their surfaces and are targeted for degradation by proteases<ref>Wickner S, Maurizi MR, Gottesman S. (1999). Posttranslational quality control: folding, refolding, and degrading proteins. Science 286(5446):1888-93. </ref>.
Function and regulation
Cellular functions
There are few if any cellular processes that do not require the involvement of proteins. Enzymes, or proteins that catalyze a chemical reaction, are usually highly specific to a particular substrate and effect most of the reactions involved in metabolism and catabolism as well as DNA replication, DNA repair, and RNA synthesis. Enzymes can also act on other proteins, especially those involved in signal transduction and in regulation of the cell cycle; post-translational modifications such as phosphorylation are used to modulate the activity levels of enzymes and the specificity of receptors. Receptors and other tight-binding proteins form another major functional class whose primary duties are to transport ligands through the cell and through the body, and to recognize extracellular stimuli and signaling molecules. A canonical example of a ligand-transporting protein is hemoglobin, which transports oxygen from the lungs to other organs in all vertebrates and has close analogs in every biological kingdom. A second category of binding proteins includes antibodies, components of the adaptive immune system that bind both foreign antigens in the body.
Proteins such as antibodies may be anchored in the cellular membrane or secreted into the extracellular environment by the cell that synthesized them. Because secreted proteins exist in an environment that differs from that in the cytoplasm, they can contain features such as disulfide bonds that cannot exist in the reducing environment of the cytoplasm. Many organisms, especially single-celled organisms, secrete proteins that are toxins to other organisms that may compete for food or resources.
Structural proteins, which are most often fibrous, confer stiffness and rigidity to otherwise fluid biological structures. The proteins actin and tubulin are globular and soluble as monomers but polymerize to form long, stiff fibers that comprise the cytoskeleton, which allows the cell to maintain its shape and size. Collagen and elastin are critical components of connective tissue such as cartilage, and keratin is found in hard or filamentous structures such as hair, nails, feathers, hooves, and some animal shells.
Membrane proteins are anchored in the cell membrane and often increase the permeability of the membrane to polar or charged molecules or ions that cannot diffuse across the hydrophobic membrane layer unassisted. Many ion channel proteins are specialized to select for only a particular ion; for example, potassium and sodium channels often discriminate for only one of the two ions.
Other classes of protein functions include chaperones, which assist in the folding of other proteins especially under cellular stress, and motor proteins such as myosin, kinesin, and dynein, which are capable of generating mechanical forces. These proteins are crucial for cellular motility of single-celled organisms and the sperm of many sexually reproducing multicellular organisms. They also generate the forces exerted by contracting muscles.
Regulatory mechanisms
Various molecules and ions are able to bind to specific sites known as binding sites in proteins, which exhibit chemical specificity in which types of substrates they will bind to. The particle that binds is called a ligand. The strength of ligand-protein binding is a property of the binding site known as affinity. If the ligand is a molecule acted upon in a chemical reaction, it is known as a substrate and its binding site on the protein is an active site.
Since proteins are involved in nearly every function performed by a cell, the mechanisms for controlling these functions therefore often depend on controlling protein activity. One very common means of protein regulation is through covalent modulation, a form of post-translational modification in which functional groups are attached to specific amino acid side chains on the protein and result in increasing or decreasing its activity, lifetime, or both. The single most common protein modification is acetylation of the N-terminus, which increases the lifespan of proteins in the cell<ref name="Lodish" />. A very common regulatory modification is phosphorylation, which is often used to activate or deactivate enzymes. It is catalyzed by enzymes called kinases and reversed by phosphorylases. Individual proteins can be targeted for proteasomal degradation by the attachment of a polypeptide called ubiquitin.
Allosteric modulation is a means of regulation by which a ligand, usually a small organic compound, binds to a protein at a binding site located outside the active site and affects the activity of the enzyme, often by inducing a conformational change. Some enzymes are also regulated by their own reaction products in a negative feedback loop.
The regulation of a particular enzyme activity does not necessarily require interacting with the enzyme itself. Many proteins are subject to transcriptional regulation in which the rate of protein synthesis rather than the level of activity of existing proteins is regulated. The level of mRNA production may be altered, for example by the activity of the enhanceosome complex, or mRNAs may be targeted for degradation before they are used as templates. This type of regulation is especially common in prokaryotes that can synthesize certain nutrients, such as amino acids, but will preferentially obtain them from the environment if they are available. When supplies of a given nutrient are sufficient for the growth and maintenance of the cell, transcription of the enzymes needed for synthesis of that nutrient is downregulated<ref name="Lodish" />.
Methods of study
A protein's structure and activity is sensitive to its environment, especially to changes in solution conditions such as temperature, pH, ionic strength, and relative concentrations of electrolytes. Most enzymes are only active in or near their native state. A protein whose tertiary structure is not close to its native state is said to be denatured. Denatured proteins generally have no well-defined tertiary structure and little secondary structure. Common denaturants include detergents like sodium dodecylsulfate and small molecules like urea and guanidine. Many proteins require the presence of ions, especially magnesium, in solution to remain stable and are not soluble in distilled water<ref name="Voet">Voet D, Voet JG. (2004). Biochemistry Vol 1 3rd ed. Wiley: Hoboken, NJ.</ref>. Membrane proteins can be solubilized by appropriate surfactants but are rarely soluble in water due to their largely hydrophobic exposed surfaces.
Denaturation is sometimes a reversible process. Christian Anfinsen won a Nobel Prize in Chemistry for his discovery that ribonuclease can be denatured and refolded in vitro. However, many proteins cannot be refolded after denaturation in vitro - likely because the exposure of their hydrophobic core residues leads to nonspecific inter-protein association and aggregation, or due to the requirement of a molecular chaperone for proper folding<ref name="Branden" />.
Keeping proteins in solution and in their native state can be a challenge during the process of purifying protein from a cell or tissue extract. Common methods of isolating and purifying a particular protein include precipitation of unrelated proteins by salting out; various forms of chromatography; and ultracentrifugation. The purification process is often monitored by gel electrophoresis, by spectroscopy (if the protein has a distinctive spectroscopic feature), or by activity assays (if the protein has a distinctive enzymatic activity). The secondary structure of a protein is often studied by a spectroscopic method called circular dichroism. The amino acid sequence of an unknown protein can be determined by mass spectrometry.
Through a genetic engineering application known as site-directed mutagenesis, researchers can alter the sequence and hence the structure, "targeting", susceptibility to regulation and other properties of a protein. The genetic sequences of different proteins may be spliced together to create "chimeric" proteins that possess some properties of both proteins. This technique is often used in the context of a reporter gene assay in which one easily identified protein, such as green fluorescent protein, is used as a label for the expression of another.
The emerging field of protein engineering attempts to redesign proteins with novel folds or modified functions. Many of the associated methodologies were developed for or derived from protein structure prediction techniques. Protein-protein interactions can be experimentally explored by a complex system known as two-hybrid screening and are often studied computationally in protein-protein interaction prediction.
Nutrition
Template:Further Most microorganisms and plants can biosynthesize all 20 standard amino acids, while animals must obtain some of the amino acids from the diet.<ref name="Voet" /> Key enzymes in the biosynthetic pathways that synthesize certain amino acids - such as aspartokinase, which catalyzes the first step in the synthesis of lysine, methionine, and threonine from aspartate - are not present in animals. The amino acids that an organism cannot synthesize on its own are referred to as essential amino acids. (This designation is often used to specifically identify those essential to humans.) If amino acids are present in the environment, most microorganisms can conserve energy by taking up the amino acids from the environment and downregulating their own biosynthetic pathways. Bacteria are often engineered in the laboratory to lack the genes necessary for synthesizing a particular amino acid, providing a selectable marker for the success of transfection, or the introduction of foreign DNA.
In animals, amino acids are obtained through the consumption of foods containing protein. Ingested proteins are broken down through digestion, which typically involves denaturation of the protein through exposure to acid and degradation by the action of enzymes called proteases. In humans most protein digestion takes place in the duodenum rather than the stomach; almost all of the ingested protein has been absorbed when the jejunum is reached and little is left in the feces.Template:Fact The inactive precursor protein pepsinogen is converted to the active protease pepsin on contact with hydrochloric acid present in the stomach. Pepsin is the only proteolytic enzyme in the human digestive system that can digest collagen.Template:Fact Some ingested amino acids are not used directly for protein biosynthesis, but rather are converted to carbohydrates through gluconeogenesis, which is also used under starvation conditions to generate glucose from the body's own proteins, particularly those found in muscle.
History
The first mention of the word protein was from a letter sent by Jöns Jakob Berzelius to Gerhardus Johannes Mulder on 10. July 1838, where he wrote:
The name protein that I propose for the organic oxide of fibrin and albumin, I wanted to derive from [the Greek word] πρωτειος, because it appears to be the primitive or principal substance of animal nutrition.
Investigation of proteins and their properties had been going on since about 1800 when scientists were finding the first signs of this, at the time, unknown class of organic compounds. In 1819 leucine was the first amino acid described, in 1936 threonine was the last described. In 1902 the peptide bond was described independently by Franz Hofmeidter and Emil Fischer, Fischer coined the terms peptide and polypeptide. Arne Tiselius was the first worker to show that a single protin has both positive and negatively charded elements and performed the first separation of proteins by electrophoresis. Fred Sanger sequenced the first protein in 1949 using partial acid hydrolysis, and Linus Pauling provided the first evidence for secondary structures in protein. James Batcheller Sumner , John Howard Northrop and Wendell Meredith Stanley provided the first protein crystals and demonstrated that enzymes are proteins. Irving Langmuir showed that hydrophobic interactions determine 3D structure, Max Perutz and John Kendrew determined the first 3D protein structures by X-ray crystallography.
See also
- Biological value
- Crystallography
- Denatured protein
- Protein design
- Intein
- List of recombinant proteins
- List of proteins
- Prion
- Proteinoid
- Protein structure prediction
- Protein targeting
- Proteome
- Ribosome
- Structural genomics
References
<references />
External links
- The Protein Databank
- UniProt the Universal Protein Resource
- Human Protein Atlas
- iHOP - Information Hyperlinked over Proteins
- Proteins: Biogenesis to Degradation - The Virtual Library of Biochemistry and Cell Biology
- MIT's Laboratory for Protein Molecular Self-Assembly
- Numerous publications on synthetic biomimetic protein-based biomaterials
- Amino acid metabolism
- Protein Images
- Online Protein viewer with a local PDB database
- NCBI Entrez Protein database
- NCBI Protein Structure database
- AOAC International
Template:Protein topicsaf:Proteïen ar:بروتين zh-min-nan:Nn̄g-pe̍h-chit bs:Bjelančevine bg:Белтък ca:Proteïna cs:Bílkovina da:Protein de:Protein et:Valk es:Proteína eo:Proteino fa:پروتئین fr:Protéine gl:Proteína ko:단백질 io:Proteino id:Protein is:Prótín it:Proteine he:חלבון lv:Olbaltumviela lb:Protein lt:Baltymas hu:Fehérje mk:Протеин nl:Proteïne ja:蛋白質 no:Protein nn:Protein ug:ئاقسىل pam:Protina pl:Białko pt:Proteína ru:Белки simple:Protein sk:Bielkovina sl:Beljakovina sr:Протеин sh:Protein su:Protéin fi:Proteiini sv:Protein ta:புரதம் th:โปรตีน vi:Protein tr:Protein zh:蛋白质
