Sunday, 12 September 2010

Automated large-scale protein modeling

A pipeline for the multiple automated comparative modeling can be easily built with the following software:

1) the best template (based on some filters such as the e-value and the resolution) is identified using the hhsearch program;
2) Modeller is used for the comparative modeling;
3) databases for the template search (nr90 and nr70) and for the modeling (a reformatted version of the PDB) are available from the hhsearch ftp site;
4) a set of python and bash utilities for the management of the jobs on a computer cluster.

All these building block are part of my pipeline that is going to be released.
I'll give more informations soon.

Some days ago a test experiment revealed that the pipeline can build 450 models in 9 hours (50 models per hour).
Not so fast, but my pipeline contains also come modules for the model assessment and the cluster resources are shared with a lot of different users.

With a dedicated (and larger) cluster, I suppose it would be possible to model the whole human proteome (ca. 78.000 peptides, source: Ensembl) in 1 or 2 weeks with this pipeline.

