While this is a small subset of what these quantum chemistry codes can do, we are planning on extending the features exposed by the image in the future, prioritizing those features requested by the users of the platform.
Machine Learning (Johannes and Alessandro)
From the point of view of the platform, there is no difference between running a quantum chemistry or a machine learning code. They both take as input a molecule and some parameters, and return a conforming output. The only difference is that machine learning code may not require the existence of a 3D structure, and can still operate when only line formats such as InChI or SMILES are available.
We have developed two images for codes that use machine learning to predict the result of a calculation: ANI and ChemML.
For the ANI image we have used the Pytorch implementation of the ANI potentials - TorchANI - and use it as an ASE calculator. We then simply leverage the algorithms in ASE to drive task such as geometry optimizations and normal modes calculations.
TorchANI image features:
- Single point / geometry optimization / frequency calculations
- Select the optimized potential: ani-1x or ani-1ccx
ChemML is a more classical machine learning code: it takes as input the SMILES string of an organic molecule, and returns as output a few quantities of interest that the model has been trained on.
Analytics (Bert, Johannes and Marcus)
If many small calculations (especially machine learning calculations on line format structures) are to be performed on a large set of molecules, it can be very inefficient to have a task monitor and a docker container for each individual calculation. For this reason, the user is able to submit a batch of calculations as a single task. In this task, there is only one task monitor and one docker container used, and the docker container is given the freedom to choose how to run the calculations (one at a time, one per core, etc.). Additionally, each calculation is checked to see if it has been performed before. If so, it will skip that calculation and use the output already present in the database. An example of this strategy can be seen in the example below, where ChemML results for a list of SMILES are compared in a plot.