Using PyMol for protein visualisation

Roshni S.
5 min readFeb 8, 2023

--

The MRG15-PALB2 peptide complex

Proteins are essentially very long chains of amino acids — thousands and thousands of them. Being the complex structures that they are, proteins were difficult to analyse prior to computational tools.

Softwares like PyMol allow researchers to better visualise how molecules/proteins fit together; giving us a better understanding of the roles they play in our bodies (or that of any organism!).

In this article, I will be discussing a few things I learned while using PyMol to visualise the structure of the MRG15 — PALB2 complex (7S4A) and the WD40 domain of human PALB2 (2W18).

Let’s dive in 🚀

What is PALB2?

PALB2 is a protein that controls the interactions between BRCA1 & 2, (breast cancer susceptibility genes), which is extremely important for genome stability, homologous recombination, and DNA repair.

PALB2 also interacts with MRG15 (Morf-related gene on chromosome 15,); this interaction is what drives DNA transcription/replication, chromosome maintenance, and tumourigenesis.

This means that mutations/variants of PALB2 predisposes individuals to various types of cancers, including breast and ovarian cancers.

The Protein Data Bank (PDB)

Before we can open and use PyMol, we need to find protein structures from a resource such as the Protein Data Bank (PDB).

7S4A

A few things we can note of the protein structure:

The MRG15-PALB2 complex, viewed by clicking C > by chain > by chain

There are 4 distinct chains, and the 2 chains on each side are lightly asymmetrical with one another. Any structure with more than one amino acid chain (like this one) is called a quaternary protein structure. The MRG chains are shown in green and magenta, while the PALB2 peptides are shown in cyan and yellow.

In the sequence window, we can see that some parts of the amino acid chains are greyed-out/missing; this highlights that there were issues with the transcription of the protein.

Click “S” to view the sequence window

These transcription issues are also visible through the breaks in the loops of the protein structure.

We can also look at the different features found in the secondary structure of this complex.

Colour the secondary structures by clcking C > by ss > any available colour combination
  • α-helices are the structures that resemble ribbons curled in a helical shape. They are the main secondary structure found in many proteins/complexes, including this one.
  • β pleated sheets are represented here as flat arrows following the direction of the amino acid chain from the N-C terminus (beginning to end of the chain).
  • Loops are like thin wires that change the direction of a polyeptide chain, allowing it to fold in on itself and form a more compact structure.

The shapes of these structures are molded & held together by hydrogen bonds, which can also be viewed by clicking A > find > polar contacts > within selection. The bonds appear as yellow dashed lines.

Hydrogen bonds form between backbone oxygens and amide hydrogens to form secondary structures. Depending on the spacing and positions of the residues joined by hydrogen bonds, different structures (α-helices, β pleated sheets, et.c) are created.

Now that we’ve gone over some of the basics of PyMol and proteins, let’s use what we’ve learned to look at another structure! This time, we’ll be visualising the WD40 repeat domain of human PALB2.

2W18

Click C > by chain > chainbows

Domains are regions of a protein that act independently when carrying out the protein’s functions. The WD40 domain is one of the two domains found in human PALB2.

Something that’s immediately noticeable about this domain is it’s shape; repeating units in the protein form a circular beta-propeller structure (β pleated sheets

Also, there is only one chain for this structure, because there is only one macromolecule involved.

You may also notice red stick-like structures surrounding the protein chain:

The structures are highlighted in blue

These structures are glycerol ligands. Ligands are chemical substances that forms irreversible bonds with biomoleucles (such as proteins); these bonds alter the ligands in order to perform various cellular functions.

Glycerol is a type of ligand known as a triol ligand; it has three hydroxyl (OH) groups:

The chemical formula for glycerol is C3 H8 O3

We can also see the glycerol ligand in the form of repeats at the end of the sequence; the repeat sequence is GOL (glycine, pyrrolysine, leucine).

With PyMol, we can also measure the distance between these ligands (or any two atoms for that fact)! Clicking Wizard > Measurement & selecting the particles that you want to measure displays the distance in angstroms.

Here, we can see that the distance between these two glycerol ligands is 18.3 angstroms.

Conclusion

Working with PyMol was a new experience for me, but it’s incredibly intuitive platform helped me learn a lot about protein structure and visualisation! I hope you learned something as well — I encourage you to try and design your own experiments using PyMol!

Additional resources

Thank you for reading my article! I’m Roshni, a 15 y/o biotech nerd who always willing to learn: feel free to reach out to me on Linkedin!

--

--