Wednesday, 10 March 2010

About MMsINC data and license

As a comment of a recent post, Egon posted two short questions about MMsINC data and license.
I want to thank Egon for the two questions.

QUESTION 1: what's the license of the data in the database?

ANSWER: MMsINC data are property of the University of Padova.
Actually, data are not available for download, but the users can query them though the web interface.

QUESTION 2: how does the curation compare to that of ChEBI?

a. the MMsINC data sources are larger than those included in ChEBI: first release contains a number of sources but the most relevant is Zinc (version 7), but the next release aims to process the greater part of public data (Pubchem, mainly).
b. structures are not checked or processed one by one, but by following a precise protocol described here (open access NAR paper).

Some additional notes about the quality of MMsINC data:

the quality of MMsINC chemical data is higher than other public resources:

- MMsINC is not just a collection of public data, but there is a long preprocessing work (see the Nucleic Acid Research DBIssue article for a full description of the pipeline) and a data cleaning based on the InChIs.
- MMsINC is the only resource that collect the most probable ionic states and tautomers of all the structures (when possible and with the known limitation)
- MMsINC stores precalculated predictions of biological enrichment of each molecules (similarity to PDB ligand, to bioactive molecules, presence of "active" fragments)
- MMsINC contains a selection of descriptors important from a pharmaceutical and biochemical point of view

I want also to cite Stefano (Prof. Stefano Moro, University of Padova, Italy):

MMsINC is not only a database: it is a chemogenomics work platform that places
its data and tools to work with it on an even footing.  Although the data
formally is property of the University of Padova, it is more important to note
that we feel that simply providing files to download would belittle our
mission.  Instead, we aim to bring this data and the science of chemoinformatics
together to provide the MMsINC service to our web community.

Let me thank Luca Pireddu for helping me translate Prof. Moro's quote.


  1. "Actually, data are not available for download, but the users can query them though the web interface."

    Doesn't that mean that people can download the data? For a respected coder, it's not so difficult to extract the data even if querying is needed first.

    Do I understand correctly that if people do analysis on the data, they are not allowed to provide it as SI to journals (as required by some)?

  2. Egon, we've placed some limits on the rate and number of structures that can be downloaded....and we hope that they're difficult enough to overcome.

    About your question, I think the data used for the analysis can be provided as SI simply because all the structures in MMsINC are linked to the original data (from Zinc for this first release).