2.2 | Protein-ligand complexes. 
The ligand topic is not new to CASP: in CASPs 6 through 10 predicting ligand binding sites was a sub-challenge in the function prediction category 29-32. Given the recent advances in the accuracy of protein modeling methods11,12, CASP organizers decided to include prediction of protein- and RNA- small molecule ligand complexes into the scope of CASP15 experiment hoping to boost development of methods in this area. Participants are provided with the sequence and stoichiometry of protein (or RNA) receptors, Simplified Molecular Input Line Entry System (SMILES) codes of bound ligands, and are asked to predict structures of protein- (RNA-) ligand complexes.
2.2.1 | Macromolecule-ligand complex prediction format(https://predictioncenter.org/casp15/ index.cgi?page=format#LG). One important requirement for the ligand prediction format was the need to encode atom connectivity in a robust and reliable manner, as the correct atom connectivity is required for symmetry correction, a necessary step in accurate ligand assessment. Unfortunately, the PDB format, which is commonly used in CASP, is not able to reliably encode connectivity for arbitrary ligands. The MDL molfile format 33 is a common format for ligands which was used in earlier ligand docking challenges such as D3R34-37. This is a text-based, fixed column format that besides atom coordinates also encodes the bonds. Unlike the PDB format, atoms are not named and only identified by their element and connectivity. The format allows reporting additional properties such as charge, valence, or isotope, but those were not required nor used here. Bonds between atoms are encoded explicitly, one by line, together with the bond type (single, double, triple, or aromatic) and stereochemistry. The format also includes header lines, a COUNTS line, which can help check the integrity of the file, and an M END line which indicates the end of the ligand data.
For CASP15, we devised a hybrid submission format where the receptor’s model (protein or RNA) and ligand model are submitted as separate files in the same spatial frame of reference. The receptor is submitted in the PDB format, while the ligand in MDL (see below for details). Similarly to the regular protein structure submission, a CASP ligand submission (LG format) starts with a CASP header including format specification code, target identifier, author identifier, and description of the modeling method. Two new keywords are introduced: the LIGAND keyword, which defines ligand name and the beginning of the ligand data, and the POSE keyword, which defines the pose number for the selected ligand. Participants are allowed to submit up to 5 poses of a given ligand for a selected receptor model.
An example of LG prediction is provided in Example 6 on the CASP15 format page.