Tuesday, 9 February 2010

Database Indexing for a faster Substructure Search

In our first release of MMsINC (see recent post) we developed a strategy that can help in making faster a substructure search in huge databases.

Rules:
1) choose a fragmentation algorithm
2) fragment all the compounds in your database
3) store the fragments in your database

Then, when the user submits a query, you can apply your fragmentation tool to the query compoud.
If you are lucky, you can restrict the search space to the database compounds that share the same fragments of the query.
You can then apply the exact substructure search to this reduced set of compounds.

You need more disk space, but in this way you can save computation time.

No comments:

Post a Comment