tag:blogger.com,1999:blog-89424420689055287132024-03-13T18:12:36.418+01:00alchemoinformaticsMatteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.comBlogger29125tag:blogger.com,1999:blog-8942442068905528713.post-30717154728382564022011-05-28T19:37:00.000+02:002011-05-28T19:37:43.728+02:00pepMMsMIMIC paper is outThe <a href="http://mms.dsfarm.unipd.it/pepMMsMIMIC/">pepMMsMIMIC</a> paper now can be accessed from the <a href="http://nar.oxfordjournals.org/content/early/2011/05/27/nar.gkr287.abstract">Nucleic Acid Research website</a>.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com1tag:blogger.com,1999:blog-8942442068905528713.post-55766260196845768982011-04-26T10:08:00.000+02:002011-04-26T10:08:52.851+02:00A novel web-oriented peptidomimetic compound virtual screening tool.<div style="margin-bottom: 0cm; text-indent: 0.5cm;"><span style="font-size: small;"><span lang="en-GB"><a href="http://mms.dsfarm.unipd.it/pepMMsMIMIC/"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">pepMMsMIMIC</span></span></a><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;"> is a public, web-based virtual screening platform with the aim to suggest chemical </span></span></span><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">compounds whose essential elements (pharmacophore) mimic a natural peptide or protein in 3D space which hopefully retain the ability to interact with the biological target and produce the typical biological effect.</span></span></span></div><div align="LEFT" class="body-text1" style="text-indent: 0.5cm;"><span style="font-size: small;"><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">Starting from the 3D structure of any protein-protein/peptide complex, </span></span><a href="http://mms.dsfarm.unipd.it/pepMMsMIMIC/"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">pepMMsMIMIC</span></span></a><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;"> design process begins by identifying the key residues that are responsible for the protein-protein recognition process. In this process, the peptide complexity is reduced and the basic pharmacophore model is defined by its critical structural features (</span></span></span><i><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">peptide </span></span></span></i><i><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">annotation points</span></span></span></i><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">) in 3D space.</span></span></span></span></div><div align="LEFT" class="body-text1" style="text-indent: 0.5cm;"><span style="font-size: small;"><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">The </span></span></span></span><a href="http://mms.dsfarm.unipd.it/pepMMsMIMIC/"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">pepMMsMIMIC</span></span></a><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;"> paper has been accepted for publication in the NAR Web Server Issue 2011. I will post the Advance Access link in the near future.</span></span></div><div align="LEFT" class="body-text1" style="text-indent: 0.5cm;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">Here the abstract:</span></span></div><div align="LEFT" class="body-text1" style="text-indent: 0.5cm;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;"><br />
</span></span></div><blockquote><div class="abstract-text-western" style="text-indent: 0cm;"><span style="font-size: small;"><span lang="en-GB"><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">pepMMsMIMIC is a novel web-oriented </span></span></span></span></span><span style="font-size: small;"><span lang="en-GB"><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">peptidomimetic compound</span></span></span></span></span><span style="font-size: small;"><span lang="en-GB"><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;"> virtual screening tool based on a multi-conformers 3D- similarity search strategy. Key to the development of pepMMsMIMIC has been the creation of a library of 17 million conformers calculated from 3.9 million commercially available chemicals collected in the MMsINC</span></span></span></span></span><sup><span style="font-size: small;"><span lang="en-GB"><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">®</span></span></span></span></span></sup><span style="font-size: small;"><span lang="en-GB"><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;"> database. Using as input the three-dimensional structure of a peptide bound to a protein, pepMMsMIMIC suggests which chemical structures are able to mimic the protein-protein recognition of this natural peptide using both pharmacophore and shape similarity techniques. We hope that the accessibility of </span></span><a href="http://mms.dsfarm.unipd.it/pepMMsMIMIC/"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">pepMMsMIMIC</span></span></a><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;"> </span></span></span></span></span><span style="font-size: small;"><span lang="en-GB"><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">will encourage medicinal chemists to de-peptidize protein-protein recognition processes of biological interest, thus increasing the potential of </span></span></span></span></span><span style="font-size: small;"><span lang="en-GB"><i><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;">in silico</span></span></span></i></span></span><span style="font-size: small;"><span lang="en-GB"><span style="font-weight: normal;"><span class="Apple-style-span" style="font-family: inherit;"><span class="Apple-style-span" style="color: white;"> peptidomimetic compound screening of known small molecules to expedite drug development.</span></span></span></span></span></div></blockquote>Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com6tag:blogger.com,1999:blog-8942442068905528713.post-71059279993176118982011-04-19T12:12:00.002+02:002011-04-19T12:18:33.990+02:00MAISTAS: a tool for automatic structural evaluation of alternative splicing products<div style="color: #47382c; font: 14.0px Georgia; line-height: 18.0px; margin: 0.0px 0.0px 10.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="color: #eeeeee;"><br />
</span></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><b><span class="Apple-style-span" style="color: #eeeeee;">MAISTAS: a tool for automatic structural evaluation of alternative splicing products</span></b></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;"><br />
</span> </span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;">Matteo Floris 1, Domenico Raimondo 2, Guido Leoni 2, Massimiliano Orsini 1, Paolo Marcatili 2 and Anna Tramontano 3,4</span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;">*</span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;"><br />
</span> </span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">Author Affiliations</span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="color: #eeeeee;"><span class="Apple-style-span" style="font-size: x-small;"></span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">1 CRS4-Bioinformatics Laboratory, c/o Sardegna Ricerche Scientific Park, Pula, 09010 Cagliari, Italy</span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="color: #eeeeee;"><span class="Apple-style-span" style="font-size: x-small;"></span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">2 Department of Biochemical Sciences, Sapienza University of Rome, P.le A. Moro, 5 - 00185 Rome, Italy </span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="color: #eeeeee;"><span class="Apple-style-span" style="font-size: x-small;"></span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">3 </span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">Department of Physics, Sapienza University of Rome, P.le A. Moro, 5 - 00185 Rome, Italy.</span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="color: #eeeeee;"><span class="Apple-style-span" style="font-size: x-small;"></span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">4 Istituto Pasteur Fondazione Cenci Bolognetti, Sapienza University of Rome, P.le A. Moro, 5 - 00185 Rome, Italy.</span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="color: #eeeeee;"><span class="Apple-style-span" style="font-size: x-small;"></span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">*To whom correspondence should be addressed. Prof. Anna Tramontano, E-mail: anna.tramontano@uniroma1.it</span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;"><br />
</span> </span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="color: #eeeeee;"><span class="Apple-style-span" style="font-size: x-small;"></span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">Received October 26, 2010</span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="color: #eeeeee;"><span class="Apple-style-span" style="font-size: x-small;"></span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">Revision received March 17, 2011</span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="color: #eeeeee;"><span class="Apple-style-span" style="font-size: x-small;"></span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">Accepted March 22, 2011</span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: x-small;"><span class="Apple-style-span" style="color: #eeeeee;">Bioinformatics (2011) doi: 10.1093/bioinformatics/btr198 First published online: April 15, 2011 </span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;"><br />
</span> </span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><b><span class="Apple-style-span" style="color: #eeeeee;">Abstract</span></b></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;"><br />
</span> </span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;">Motivation: Analysis of the human genome revealed that the amount of transcribed sequence is an order of magnitude greater than the number of predicted and well characterized genes. A sizeable fraction of these transcripts is related to alternatively spliced forms of known protein coding genes. Inspection of the alternatively spliced transcripts identified in the pilot phase of the ENCODE project has clearly shown that often their structure might substantially differ from that of other isoforms of the same gene, and therefore that they might perform unrelated functions, or that they might even not correspond to a functional protein. Identifying these cases is obviously relevant for the functional assignment of gene products and for the interpretation of the effect of variations in the corresponding proteins. </span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;"><br />
</span> </span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;">Results: Here we describe a publicly available tool that, given a gene or a protein, retrieves and analyses all its annotated isoforms, provides users with three-dimensional models of the isoform(s) of his/her interest whenever possible and automatically assesses whether homology derived structural models correspond to plausible structures. This information is clearly relevant. When the homology model of some isoforms of a gene does not seem structurally plausible, the implications are that either they assume a structure unrelated to that of the other isoforms of the same gene with presumably significant functional differences, or do not correspond to functional products. We provide indications that the second hypothesis is likely to be true for a substantial fraction of the cases. </span></span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;"><br />
</span> </span></span></div></div><div style="font: 20.0px Georgia; line-height: 26.0px; margin: 0.0px 0.0px 0.0px 0.0px;"><div style="text-align: justify;"><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="color: #eeeeee;">Availability: </span></span></span><span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif; font-size: small;"><a href="http://maistas.bioinformatica.crs4.it/"><span class="Apple-style-span" style="color: #eeeeee;">http://maistas.bioinformatica.crs4.it/</span></a></span></div></div></div>Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-63668349084666805262011-04-13T09:10:00.000+02:002011-04-13T09:10:28.918+02:00Splicing isoforms modeling, peptidomimetics and molecular dynamic made easyThis new season is started with 3 new accepted papers. Here a brief introduction, I will give more details very soon for each of them:<br />
<br />
<br />
<ol><li>Maìstas (Bioinformatics, <i>first name</i>), a fully automatic pipeline aimed at building and assessing three-dimensional models for alternative splicing isoforms. The server builds, when possible, comparative structural models for all the splicing isoforms of a submitted gene or set of genes. The models are then analysed in terms of their suitability to exist in the monomeric state, i.e. when a warning appears in the model assessment, it cannot be excluded the possibility that other multimeric state may stabilize the structure. Moreover, the splicing isoform exonic coordinates are mapped on the final models.</li>
<li>pep:MMs:MIMIC (Nucleic Acid Research, Web Server Issue, <i>first name</i>), a web-oriented tool that, given a peptide three-dimensional structure, is able to automate a multiconformers three-dimensional similarity search among 17 million of conformers calculated from 3.9 million of commercially available chemicals collected in the MMsINC database.</li>
<li>ClickMD (Future Medicinal Chemistry), a web-based explicit solvent molecular dynamic simulator. ClickMD performs minimization, equilibration phase and a short run of classical MD. ClickMD works with PDB files of protein and peptides. You just needs a valid PDB file to start the MD simulation! You will receive an e-mail at the end of the simulation containing a link to a web page where you can download the MD results as: log files, trajectory files, energy and RMSD representations and graphs.<br />
<br />
<br />
<br />
</li>
</ol>Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com2tag:blogger.com,1999:blog-8942442068905528713.post-12339931737275502132011-01-04T16:17:00.000+01:002011-01-04T16:17:26.967+01:00Job position: modeling the interaction of genetic and environmental factors in autoimmune diseasesNot exactly drug design (not yet): a grant is available for modeling the interaction of genetic + environmental factors in autoimmune diseases. Deadline for application is Jan 11, 12AM Italy timezone.<br />
<br />
It will be cooordinated by CRS4 in collaboration with two clinical units for MS and DT1 in Cagliari Hospitals and Biomed Dept for Chron disease in Sassari. Info in Italian at <a href="http://www.unica.it/UserFiles/File/Selezioni/SciCardio%20N.%206.doc">http://www.unica.it/UserFiles/File/Selezioni/SciCardio%20N.%206.doc</a>.<br />
<br />
Please send directly an email to Enrico Pieroni if interested (<a href="mailto:ep@crs4.it">ep@crs4.it</a>).Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-41585075475249383412010-12-23T19:05:00.003+01:002010-12-23T19:19:05.948+01:00How to download MMsINC entries for Autodock.<a href="http://autodock.scripps.edu/">AutoDock</a> is a <blockquote>suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure</blockquote><br /><br />AutoDock 4 uses PDBQT formatted files not only for the receptor but also for the ligand.<br />The PDBQT format is described <a href="http://autodock.scripps.edu/faqs-help/faq/what-is-the-format-of-a-pdbqt-file">here</a>.<br /><br />Yesterday I have converted all the MMsINC entries (about 4M) to the PDBQT format.<br />It is so possible to download each entry as input format for Autodock experiments.<br /><br />Have a look at this <a href="http://mms.dsfarm.unipd.it/MMsINC/search/molecule.php?mmscode=MMs02218121">example page</a>. As you can see, at the top of the page there is a menu for the download of several format files. The first of them is Autodock input file. Simply press the "go" button to download it.<br /><br />Let me close this post with this nice photo.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_eK36skX7Mfk/TROSBvWRv4I/AAAAAAAAADo/948Ir_2gQLI/s1600/foto_ricci_poetto.JPG"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 240px;" src="http://1.bp.blogspot.com/_eK36skX7Mfk/TROSBvWRv4I/AAAAAAAAADo/948Ir_2gQLI/s320/foto_ricci_poetto.JPG" border="0" alt="" id="BLOGGER_PHOTO_ID_5553943324065382274" /></a><br /><br />Where: the place is the Poetto Beach in Cagliari (Sardinia, Italy).<br />What: Stefano Moro and me eating these "ricci di mare" (<a href="http://en.wikipedia.org/wiki/Sea_urchin">Sea urchin</a>).Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-26784723989954979602010-12-16T10:31:00.002+01:002010-12-16T10:56:55.390+01:00The hard life of a chemoinformatician (and the chemical space in a bag)Part 1: <span style="font-style:italic;">The hard life of a chemoinformatician</span><br /><br />Thoughts at the end of the Year 2010.<br />PhD going to be completed in the next 2 months.<br />My contract is coming to the end (February).<br />I would like to do more cheminfo than bioinfo.<br />But this is not easy, also because there are more opportunities in bioinfo than in cheminfo.<br /><br />Part 2: <span style="font-style:italic;">the chemical space in a bag</span><br /><br />Stefano Moro will be wellcome to my village during next weekend.<br />In his bag, there will be the whole public chemical space (the future MMsINC 2.0).<br />I will enjoy exploring that space during the holidays.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com3tag:blogger.com,1999:blog-8942442068905528713.post-68841865301027899992010-11-08T12:53:00.004+01:002010-11-08T13:03:13.403+01:00Who is using MMsINC?A good number of people are using <a href="http://mms.dsfarm.unipd.it/MMsINC/search/">MMsINC</a> (summary from May to October 2010), <br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_eK36skX7Mfk/TNfmgQaR-_I/AAAAAAAAADY/bqmN4VrTf6I/s1600/totali.gif"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 70px;" src="http://4.bp.blogspot.com/_eK36skX7Mfk/TNfmgQaR-_I/AAAAAAAAADY/bqmN4VrTf6I/s320/totali.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5537147708710714354" /></a><br /><br />but what is interesting is the geographical distribution of the visitors: of the last 500 visitors, 67% were from India.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_eK36skX7Mfk/TNfmvB7l-_I/AAAAAAAAADg/sCBW7ShzQKA/s1600/visite_per_paese.gif"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 233px;" src="http://3.bp.blogspot.com/_eK36skX7Mfk/TNfmvB7l-_I/AAAAAAAAADg/sCBW7ShzQKA/s320/visite_per_paese.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5537147962521943026" /></a>Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-73128633408128324762010-09-17T14:30:00.002+02:002010-09-17T14:40:22.999+02:00Pharao installation<a href="http://www.silicos.be/news.html">Few days ago</a> Silicos released the source code of the tool <a href="http://www.ncbi.nlm.nih.gov/pubmed/18485770">Pharao</a>.<br />Here some notes about the installation of Pharao.<br />I had some problems, and with the help of the Pharao developers (Gert Thijs) I understood how to install it properly.<br /><br />1) install <a href="http://www.cmake.org/cmake/resources/software.html">cmake</a><br />2) install the <a href="http://openbabel.org/wiki/Subversion">latest openbabel release via svn</a><br />3) follow the installation of openbabel <a href="http://openbabel.org/wiki/CMake">here</a><br />4) download and install <a href="http://www.silicos.be/download.html">pharao</a><br /><br />If everything worked fine, there should not be any problem.<br /><br />See also <a href="http://blueobelisk.shapado.com/questions/pharao-installation-anyone-else-installed-it">BO exchange</a>.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-27650060979880195012010-09-12T12:14:00.008+02:002010-09-12T14:46:55.709+02:00Automated large-scale protein modelingA pipeline for the multiple automated comparative modeling can be <span style="font-style:italic;">easily</span> built with the following software:<br /><br />1) the best template (based on some filters such as the e-value and the resolution) is identified using the <a href="http://toolkit.tuebingen.mpg.de/hhpred">hhsearch</a> program; <br />2) <a href="http://www.salilab.org/modeller/">Modeller</a> is used for the comparative modeling;<br />3) databases for the template search (nr90 and nr70) and for the modeling (a reformatted version of the <a href="http://www.rcsb.org/pdb/home/home.do">PDB</a>) are available from the hhsearch ftp site;<br />4) a set of python and bash utilities for the management of the jobs on a computer cluster.<br /><br />All these building block are part of my pipeline that is going to be released.<br />I'll give more informations soon.<br /><br />Some days ago a test experiment revealed that the pipeline can build 450 models in 9 hours (50 models per hour).<br />Not so fast, but my pipeline contains also come modules for the model assessment and the cluster resources are shared with a lot of different users.<br /><br />With a dedicated (and larger) cluster, I suppose it would be possible to model the whole human proteome (ca. 78.000 peptides, source: <a href="http://www.ensembl.org">Ensembl</a>) in 1 or 2 weeks with this pipeline.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-52995639496696528272010-08-23T17:10:00.006+02:002010-08-23T17:36:18.732+02:00MMsINC 2.0: coming soon.We are processing (mainly <a href="http://mms.dsfarm.unipd.it/mfanton.htm">Marco Fanton at Univ. of Padova</a>) lot of public sources for the next MMsINC release; please feel free to contact me if you have any SDF catalog, it will be our pleasuse to process and incorporate it (with the appropriate link to your Company or Institute).<br /><br /><br /><div align="center"><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.sardegnadigitallibrary.it/mmt/480/63135.jpg"><img style="float:center;margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 336px; height: 226px;" src="http://www.sardegnadigitallibrary.it/mmt/480/63135.jpg" border="0" alt="filigrana sarda" /></a><br><a href="http://www.sardegnadigitallibrary.it/index.php?xsl=626&id=63135">Sardinian gold jewel</a></div>Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-41029115984857047832010-08-19T14:32:00.003+02:002010-08-19T14:39:31.928+02:00"Ultrafast shape recognition" method implementationI have implemented the Ultrafast Shape Recognition method (<a href="http://rspa.royalsocietypublishing.org/content/463/2081/1307.short">Ballester and Richards, Proc. R. Soc. A., 2007</a>) for the next MMsINC release... pure Python implementation, no external libraries required, fast calculation. I'm looking for a small dataset for the validation of my script. If you are interested in the source code, please send me an email at matteo.floris@gmail.com.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com2tag:blogger.com,1999:blog-8942442068905528713.post-79339702019906155222010-07-01T17:04:00.001+02:002010-07-01T17:06:13.271+02:00InChI Version 1, Software Version 1.03 is out!<a href="http://www.iupac.org/inchi/release103.html">http://www.iupac.org/inchi/release103.html</a>: <br /><blockquote>InChI Version 1, Software Version 1.03 <br />– implemented for both Standard and <br />Non-standard (Customized) InChI/InChIKey</blockquote>Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-73026709756723891592010-06-23T08:54:00.003+02:002010-06-23T09:08:07.942+02:00Removing duplicates from large SDF filesMaybe there are better solutions, but this worked very well with a random set taken from Pubchem (5.000.000 structures, but I introduced random duplicates, for a total of 120.000.000 structures):<br />1) generate your preferred inchis for all the structures in your big SDF, and update the SDF with these inchis (you can use pybel for that)<br />2) extract PUBCHEM_COMPOUND_CID from the SDF:<br /><blockquote><br />grep PUBCHEM_COMPOUND_CID -A 1 big.sdf > PUBCHEM_COMPOUND_CID | grep -v "PUBCHEM_COMPOUND_CID" | grep -v "-" > CIDs.txt<br /></blockquote><br />3) then put inchis and CIDs in the same file:<br /><blockquote><br />paste inchi CIDs.txt > inchi_CIDs.txt<br /></blockquote><br />4) now you can sort this file:<br /><blockquote><br />sort inchi_CID.txt -o inchi_CID_sort.txt<br /></blockquote><br />so, all the duplicates are visible...<br />5) now, you could load all the inchis as keys of a python cPickle dictionary... if an inchi is unique in the inchi_CID_sort.txt file, the value of the key is 0, if it is a duplicate (last visited inchi == actual inchi) then the value of the key is 10.<br />6) now, the python script should parse the SDF in this way:<br />for each structure:<br />if the inchi of this structure has value 0 in the dictionary, save the molecule;<br />if the value is 10, save the molecule, but change the value to 11;<br />if the value is 11, skip this structure<br /><br />I would suggest to save the output file every 100.000 structures, the open a different output file at each iteration... at the end, a "cat" command will generate a big SDF without duplicates.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-79827610701198790132010-05-13T10:31:00.012+02:002010-05-13T10:50:51.877+02:00Which is the "real" RU-486? [2]From a Pubchem search (search term: RU-486, 20 results, 18 RO5):<br /><br /><br /><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=44327040"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=44327040" border="0" alt="" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=44372311"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=44372311" border="0" alt="" /></a><br /></div><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=44327059"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=44327059" border="0" alt="" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=18649237"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=18649237" border="0" alt="" /></a><br /></div><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=55245"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=55245" border="0" alt="" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=16758830"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=16758830" border="0" alt="" /></a><br /></div><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=11743390"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=11743390" border="0" alt="" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10503032"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=10503032" border="0" alt="" /></a><br /></div><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10387949"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=10387949" border="0" alt="" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=9910521"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=9910521" border="0" alt="" /></a><br /></div><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=7048710"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=7048710" border="0" alt="" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=7048709"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=7048709" border="0" alt="" /></a><br /></div><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=7048587"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=7048587" border="0" alt="" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6712024"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=6712024" border="0" alt="" /></a><br /></div><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6426861"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=6426861" border="0" alt="" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6604870"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=6604870" border="0" alt="" /></a><br /></div><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6604445"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=6604445" border="0" alt="" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6569219"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=6569219" border="0" alt="" /></a><br /></div><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=1756360"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=1756360" border="0" alt="" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=4196"><img style="margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 300px; height: 300px;" src="http://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?t=l&cid=4196" border="0" alt="" /></a><br /></div>Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-64383997120501898422010-05-07T18:56:00.006+02:002010-05-07T19:07:54.658+02:00Which is the "real" RU-486?This<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.ebi.ac.uk/chebi/displayImage.do?defaultImage=true&imageIndex=0&chebiId=419138"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 200px;" src="http://www.ebi.ac.uk/chebi/displayImage.do?defaultImage=true&imageIndex=0&chebiId=419138" border="0" alt="" /></a><br /><br />or this<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.ebi.ac.uk/chebi/displayImage.do;jsessionid=6DEF27B723C7A71662AE2E0AED3367C4?defaultImage=true&imageIndex=0&chebiId=363012"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 200px;" src="http://www.ebi.ac.uk/chebi/displayImage.do;jsessionid=6DEF27B723C7A71662AE2E0AED3367C4?defaultImage=true&imageIndex=0&chebiId=363012" border="0" alt="" /></a><br /><br />???<br />These images are from ChEBI: <a href="http://www.ebi.ac.uk/chebi/searchId.do?chebiId=363012">http://www.ebi.ac.uk/chebi/searchId.do?chebiId=363012</a> and <a href="http://www.ebi.ac.uk/chebi/searchId.do?chebiId=363012">http://www.ebi.ac.uk/chebi/searchId.do?chebiId=363012</a>.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-13012838470257433462010-04-22T14:11:00.001+02:002010-04-22T14:12:53.072+02:00ChEMBL_03 is available!<a href="http://www.ebi.ac.uk/chembldb/index.php">JPO announced the new release</a>. FTP data will be available in the next few days.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-70715763539623856612010-03-25T11:26:00.010+01:002010-03-25T11:49:06.656+01:00Any other case of "Different InChIs from the Same Molecule"?: an experiment with MMsINC 1.0Some days ago, an InChI bug was highlighted by Rich Apodaca on <a href="http://depth-first.com/articles/2010/03/11/significant-inchi-issue-two-different-inchis-from-the-same-molecule">his blog</a>:<br /><br />me and Marco Fanton (see <a href="http://mms.dsfarm.unipd.it/mfanton.htm">Marco Fanton web page</a>) decided to perform an experiment with the 3M of 3D structures from MMsINC 1.0: we have created a small pure-Python script that is able to reshuffle the atom order of a given SDF file (you can write me if you are interested in the code), then we generated "on-the-fly" 10 random permutations for each of the 3M structures, automatically calculated the standard InChI, and searched for any "new" duplicated InChI.<br />Results? Interesting:<br /><br />two molecules (<a href="http://mms.dsfarm.unipd.it/MMsINC/search/molecule.php?mmscode=MMs03263666">MMs03263666 </a>, <a href="http://mms.dsfarm.unipd.it/MMsINC/search/molecule.php?mmscode=MMs03263667">MMs03263667</a>) that are di-azo compounds (and this confirms the known bug)... and no other duplicates.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com3tag:blogger.com,1999:blog-8942442068905528713.post-24175726950816987522010-03-10T16:25:00.007+01:002010-03-10T17:12:29.936+01:00About MMsINC data and licenseAs a comment of a <a href="http://alchemoinformatics.blogspot.com/2010/02/mmsinc.html#comments">recent post</a>, Egon posted two short questions about MMsINC data and license.<br />I want to thank Egon for the two questions.<br /><br />QUESTION 1: <span style="font-weight:bold;">what's the license of the data in the database</span>?<br /><br />ANSWER: MMsINC data are property of the University of Padova.<br />Actually, data are not available for download, but the users can query them though the web interface.<br /><br />QUESTION 2: <span style="font-weight:bold;">how does the curation compare to that of ChEBI</span>?<br /><br />ANSWER:<br />a. the MMsINC data sources are larger than those included in ChEBI: first release contains a number of sources but the most relevant is Zinc (version 7), but the next release aims to process the greater part of public data (Pubchem, mainly).<br />b. structures are not checked or processed one by one, but by following a precise <a href="http://nar.oxfordjournals.org/cgi/content/abstract/gkn727v1">protocol described here</a> (open access NAR paper).<br /><br />Some additional notes about the quality of MMsINC data:<br /><br />the quality of MMsINC chemical data is higher than other public resources: <br /><br />- MMsINC is not just a collection of public data, but there is a long preprocessing work (see the Nucleic Acid Research DBIssue article for a <a href="http://nar.oxfordjournals.org/cgi/content/full/37/suppl_1/D284">full description of the pipeline</a>) and a data cleaning based on the InChIs.<br />- MMsINC is the only resource that collect the most probable ionic states and tautomers of all the structures (when possible and with the known limitation)<br />- MMsINC stores precalculated predictions of biological enrichment of each molecules (similarity to PDB ligand, to bioactive molecules, presence of "active" fragments)<br />- MMsINC contains a selection of descriptors important from a pharmaceutical and biochemical point of view<br /><br />I want also to cite Stefano (<a href="http://mms.dsfarm.unipd.it/">Prof. Stefano Moro, University of Padova, Italy</a>):<br /><br /><blockquote>MMsINC is not only a database: it is a chemogenomics work platform that places<br />its data and tools to work with it on an even footing. Although the data <br />formally is property of the University of Padova, it is more important to note<br />that we feel that simply providing files to download would belittle our<br />mission. Instead, we aim to bring this data and the science of chemoinformatics<br />together to provide the MMsINC service to our web community.</blockquote><br /><br />Let me thank <a href="http://migweb.crs4.it/system/files/lucas_cv.pdf">Luca Pireddu</a> for helping me translate Prof. Moro's quote.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com2tag:blogger.com,1999:blog-8942442068905528713.post-9635626459742156732010-03-06T14:22:00.002+01:002010-03-06T14:25:17.078+01:00OOChemistryThis is an interesting tool.<br />I want to report here the message from the developers:<br /><br /><blockquote>OOChemistry is an extension for OpenOffice.org which provides cross-platform OLE-like integration of OOo with JChemPaint chemical diagram editor. With OOChemistry you can draw structure, embed into document (text or presentation) and than double click and edit whenever you want on any platform having OpenOffice.org and Java Runtime (Windows, Linux, Mac OS X, other Unix flavours). It is only first alpha version and is not recommended for production use (e.g., compatibility with futher versions is not guaranteed).<br /><br />OOChemistry needs your help! Experience in Java, in development of projects dealing with JChemPaint/CDK, or in development of OpenOffice.org extensions will be highly appreciated. Of course, you can help not only in coding, but also in translation of interface and writing docs.</blockquote><br /><br /><a href="http://sourceforge.net/projects/oochemistry/develop/">Project page on SF</a><br /><br />It sounds very interesting.<br />I'll try the installation asap.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-41812022956474454452010-03-05T11:26:00.003+01:002010-03-05T11:30:59.207+01:00c-d-k.orgGood news: <a href="http://www.chemistry-development-kit.org/">c-d-k.org</a> is again available!; it is a good way to be introduced to the CDK functionalities.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com7tag:blogger.com,1999:blog-8942442068905528713.post-6461405247752702082010-03-03T10:12:00.002+01:002010-03-03T10:15:32.462+01:00About SMARTS patterns in PubChem fingerprints<a href="http://blueobelisk.stackexchange.com/questions/280/pubchem-structural-keys-why-those-smarts-patterns">See my recent post at BlueObelisk StackExchange</a>.<br />Wolf Ihlenfeldt's reply is very interesting for people that want to know more about PubChem fingerprints. <br />As he says, the SMARTS patterns (terminal part of the FPs)...<br /><br />"<span style="font-style:italic;">are the result of capturing and analyzing user queries on the old NCI Cancer Screening database Web interface. They are intended to capture features which are used in actual queries. They were not designed specifically for similarities or correlation with any properties, but it turns out that there are indications that these screens work about as well as others in that sector</span>".Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-7165105992972630532010-02-27T13:23:00.008+01:002010-03-01T13:26:56.798+01:00Clustering with rcdk and PythonI received an email from a collegue: "I have a list of ChEBI IDs with the corresponding SMILES; I'd like to do some clustering, based on a similarity measure between the structures".<br /><br />I want to propose here a method based on the <a href="cran.r-project.org/web/packages/rcdk/index.html"><span style="font-style:italic;">rcdk</span> package</a> developed by R. Guha. This package is really nice. You should read carefully <a href="http://cran.r-project.org/web/packages/rcdk/vignettes/rcdk.pdf">this article</a>.<br /><br />Let me suggest a variation. If you have a huge number of structures, I would suggest to create externally from R your matrix of similarities.<br /><br />This can be done with a Python script:<br />1) you can create your structural keys with an external tool (or, with the <span style="font-style:italic;">rcdk</span> and save the fingerprints in another file)<br />2) then, you can calculate the Tanimoto similarity by using functions from the Python <span style="font-style:italic;">sets</span> package:<br /><br /><pre><br />import sys, os<br />from sets import Set<br /><br />fp_A = list("110011")<br />fp_B = list("101011")<br /><br />set_a, set_b = Set([]), Set([])<br />i = -1<br />try:<br /> while 1:<br /> i = fp_A.index("1", i+1)<br /> set_a.add(i)<br />except ValueError: pass<br />i = -1<br />try:<br /> while 1:<br /> i = fp_B.index("1", i+1)<br /> set_b.add(i)<br />except ValueError: pass<br /><br />tanimoto = float( len(set_a.intersection(set_b)) ) / float( len(set_a.union(set_b)) )<br /></pre><br /><br />3) in this way, you can calculate the matrix of similarities between all your structures, save the matrix in a file, load it in R environment, and use <span style="font-style:italic;">rcdk</span> for the clustering.<br /><br />That's all.Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-78411260905256158552010-02-25T13:39:00.002+01:002010-02-25T13:43:40.266+01:00Chemoinformatics in R:Really interesting if you want to learn more about R programming applied to chemoinformatics: a Joint EBI-Industry Workshop on <span style="font-weight:bold;">Cheminformatics in R</span>.<br />Speakers of this short course:<br />- Rajarshi Guha, NIH Chemical Genomics Center (R-CDK and R-Pubchem) <br />- Steffen Neumann, AG Massenspektrometrie & Bioinformatik ( XCMS, Rdisop, CAMERA) <br />- H. Paul Benton, Imperial College London. <br />- David Broadhurst, Cork University Maternity Hospital.<br /><br /><a href="http://www.ebi.ac.uk/industry/Workshops/CheminformaticsR170510.html">Course page</a>Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0tag:blogger.com,1999:blog-8942442068905528713.post-77144482635075548492010-02-24T09:45:00.002+01:002010-02-24T11:26:30.326+01:00New data load for kinase SARfari screening dataAs JPO reported on the ChEMBL Blog, there is a new data load for the beta version of Kinase SARfari.<br /><br /><a href="http://chembl.blogspot.com/2010/02/beta-testing-new-version-of-kinase.html">See the post at ChEMBL blog</a>Matteo Florishttp://www.blogger.com/profile/13177555699430611910noreply@blogger.com0