Tuesday, November 26, 2013

Predicting Protein Functions from Primary Structure -- a 20 yr old idea of mine

Once upon a time, in the early 1990's, I was in graduate school studying molecular biology. I developed a thesis project, which I never pursued and which, to my knowledge, no one has ever pursued.

The project would have been one of the first forays into bioinformatics. The idea was to use an artificial neural net (ANN) to predict protein function from primary structure. Basically, one would create inputs of known proteins' amino acid sequences and the qualities of each of those amino acids, including whether they were:
  1. Polar / hydrophilic
  2. Non-polar / hydrophobic
  3. H-bonding
  4. Sulfur containing
  5. Charged at Neutral pH Negative / acidic
  6. Charged at Neutral pH Positive / basic
  7. Ionizable
  8. Aromatic (and potentially stacking)
  9. Aliphatic
  10. Forms covalent cross-link (disulfide bond)
  11. Cyclic
  12. C-Beta branching
  13. pK values
  14. pI values
  15. Ka values
One might also use secondary and tertiary structures when known as inputs. This would certainly contribute to more accurate outputs.

The outputs of the ANN would of course be the protein function, though it is perhaps not impossible that some structural predictions -- of alpha helices or beta sheets, for example -- could not be an output for such a system.

As for discovering the best ANN architecture, perhaps genetic algorithms could be used. There is no telling what is the optimal architecture, so some sort of evolution and selection process would likely be most efficient.

Of course, it may  be possible that there are programs other than ANNs that could do this better/more efficiently. I suggest ANNs because they are able to conceptualize and therefore make pattern predictions, which is what a program like this, with the outputs desired, require.

If anyone thinks this worth pursuing, I encourage you to do so. I just ask for a courtesy 10% of any profits. :-)

No comments: