Journal

Artificial intelligence: A Boon for the Pharmaceutical Industry?

fr
Proteins are biological macromolecules made up of one or more chains of amino acid sequences. They are present in all living cells. Most of them have a complex three-dimensional structure that determines their properties. They catalyse chemical reactions, strengthen the tissues of the human body (this is the role of collagen), play an important role in the functioning of the immune system, store oxygen (the role of myoglobin, a protein in the heart muscle), and so on. They can also surround the molecule of a drug to help it to enter a human cell. Families of new proteins are therefore set to play a growing role in pharmacology, for therapeutic uses, to produce biomaterials, etc. Biologists have long been convinced that determining the structure of proteins is an essential preliminary step in genetic and pharmacological research.

In 2021, a real technical breakthrough has come to their aid. Artificial intelligence using deep learning techniques has made it possible to “predict”, at great speed, the three-dimensional structure of molecules, particularly proteins, based on initial knowledge of some of the sequences of amino acid residues that they are made of (i.e. based on a limited number of these sequences).[1]

This method was developed by DeepMind, a subsidiary of Google known for its AlphaGo algorithm for playing the board game Go. DeepMind designed the AlphaFold algorithm using artificial intelligence with a deep learning technique. By 2022, after two years of work in collaboration with the Bioinformatics Institute of the European Molecular Biology Laboratory (EMBL) in Cambridge, it had determined the three-dimensional structure of 200 million proteins. These proteins come from a very large number of organisms, including bacteria and covering 98% of human proteins. Together they make up a database which can be freely accessed by all researchers. The DeepMind researchers created an initial database of 200,000 proteins, whose composition and structure were known, for the algorithm to “learn” from.

The spatial conformation (shape) of proteins depends on the nature of their constituent amino acids and the physico-chemical interactions between them. Having learnt from the initial database, the algorithm can “predict” the structure and conformation of a protein based on its knowledge of a limited number of its constituent amino acids. These predictions are far from perfect (they are the result of a statistical calculation) and the structure of the molecules must be verified experimentally (using X-ray crystallography and cryo-electron microscopy[2]). According to EMBL, 80% of the predictions are either completely accurate (35%) or sufficiently accurate to be used in applications (45%). Determining the shape of proteins and predicting how they may change over time are important steps in considering potential applications for these molecules.

This innovation has given a boost to research into:

  • the role of proteins in certain diseases (they can stimulate the rapid reproduction of cancer cells);
  • new families of proteins for use in pharmacology;
  • the synthesis of new catalysts for the chemical industry (for example, to break down plastics).