The layout is designed to be compact, and follow the convention of using linear arrays where feasible to store sets of values. It also continues the index-based extension of atom properties with the shellToAtomMap. Chemical JSON is not meant to be used directly, but read into an appropriate data structure in the programming language of choice, this means that simple index-based look ups can be performed, but it may not be as human readable as desired. It will map to more efficient use of binary analogs such as BSON, jsonb, MessagePack and slightly less analogous HDF5 storage formats. The project has validated the approach and uses this format in C++, Python and JavaScript using in-memory representations, translation to other tools, and storage in MongoDB on the data server.
The data standards developed for the platform reuse community best practices, such as the work developing the IUPAC International Chemical Identifier \cite{Heller2015}, InChI key (a hashing standard built on InChI), and Simplified Molecular-Input Line-Entry System (SMILES) \cite{page}. These standards provide simplified, unambiguous string representations of chemical structures (describing both elemental makeup and connectivity) that can be used to as common identifiers. Generating these and augmenting other data enables simple linking of data to existing databases such as PubChem \cite{Kim_2018}, ChemSpider \cite{Williams_2011}, Wikipedia, and more.
The Chemical JSON format forms the core of the data model in the platform described. It has an informal standardization process that would benefit from more discipline in the future. The development team worked with MolSSI from an early point in their founding to encourage the standardization of a flexible JSON-based format, and have actively participated in discussions. The Chemical JSON format is a useful internal representation, the documentation of which should be improved, but the ultimate goal is to offer seamless export to other formats such as those supported by current libraries in existing formats and new ones such as QCSchema. It is clear that for large data binary formats will be essential, and a number of them can be mapped to from JSON such as MessagePack and HDF5 as they offer a familiar container structure.