Reproducibility in Quantum Chemistry

Scientific teams and organizations should embrace reproducible workflows where all data can be exported, shared, and peer reviewed. For computational chemistry, data should include the full flow of data from initial structures, through to final coordinates, energies, and software versions/binary environments used. The Jupyter and JupyterLab projects have many of the components needed, such as an extensible programming interface, visualization and analysis of data in a common format but it lacks specific workflows for quantum chemistry. This project adds those, and couples notebooks with a data server, and uses an extensible data format definition for static export suitable for long-term archiving of results.
The target audience ranges from a quantum chemistry code developer publishing new methods where this platform enables them to show the input, execution and output of a development snapshot through to end users running calculations on production code. The ability to specify the organization, container name, and version offers the ability to use known versions of codes, and even rerun when fixes are made. It also enhances the peer-review and publication process by offering a full record of what was done computationally, along with the results obtained and a recipe to replicate. As a community we must move towards the routine publication of all of these steps, and consider data standards along with software platforms to reduce the upfront costs of doing so.
The Space Telescope \cite{spacetelescopenotebooks} selected Jupyter as the primary analysis platform for many of the same reasons it has been used in this project. Fields of scientific research must converge on shared platforms where reproducibility is built in, and share the cost of improving them with customization for each field where it makes sense. The platform described can interface with public databases such as PubChem and QCArchive \cite{Smith_2020} to import existing data, and produce new data with structures suitable for wider dissemination. The data and metadata standards discussed seek to embrace federated storage of data, embracing the goals of FAIR data\cite{fair} to make all data produced more discoverable. The use of established open standards such as InChI, InChI key and SMILES link data produced in individual instances to the global data commons with minimal ambiguity.
One of the primary challenges in computational chemistry is to develop a software infrastructure capable of executing codes reproducibly such that others can look at every aspect of what you did, and build upon it. Even quite subtle details can hamper this, such as tolerances used in convergence criterion or software library versions. The system described in this paper uses Docker, along with conversion to other software containers, to package a code along with all of its dependencies. The driver scripts within these containers use Python to drive the execution of the code from JSON input, and to convert the output. This is then layered within an execution framework, database, and user management/file management system. Together they can be deployed  from tagged container versions in order to reproducibly create the same environment.
All of this infrastructure is not without cost, and so when considering the publication of data and its more permanent dissemination it is important to consider simpler approaches. This led to the development of integration with Binder, and the use of static repositories with exported data. The open specification of the formats, and the open source Python modules mean that anyone can access and process the data with a rather minimal Jupyter deployment. Even without any of the software, the data specification coupled with input specification offers starting points to manually run calculations with the Docker containers locally. Standalone web widgets enable viewing of processed molecular orbitals without any processing component, and could be extended further in the future to do more within the web browser's JavaScript environment.