Next: Syntax of Conformational Search, Previous: Conformational Search, Up: Conformational Search
The conformational search process is a sampling over degrees of freedom within a macromolecule. With the CONGEN command, the term, "degree of freedom" is used somewhat more freely than in the statistical mechanical sense. It means any operation that determines any number of atomic positions (including zero) and which can be iterated at least once over some variable. The reason for this generalization is to allow input, output, and energy evaluation operations into the course of the search in an simple and powerful way.
The sampling process is a series of nested iterations applied over all the degrees of freedom in the order specified by the user. All of the variables are sampled discretely, although there are provisions to solve for certain variables over an continuous range where constraints may be applied.1 Thus, the computer time required for a search grows exponentially with the number of degrees of freedom. It is easy to set up a run that could run for the age of universe.
There are several different methods for directing the search process. The simplest method is a depth-first search where the program tries every sample in turn using an algorithm that requires a minimum of temporary storage to keep track of its progress. There are also methods for sampling based on the quality of the partial conformations, and these techniques can result in better quality conformations being generated early in the search process. It is also possible to generate random structures. Overview of Directed Searching, for more information.
The program as described in the Biopolymers 1987 paper was originally designed to search the conformational space of a single polypeptide segment within a protein. The version described here provides that capability in a more general way, so that multiple segments can be searched, the local environment around a segment can be considered, or terminal segments can be sampled.
The degrees of freedom presently implemented can be divided into three categories, those which construct atoms within the system, those which do I/O, and finally, one for the evaluation of the conformations.
There are three degrees of freedom involved with construction; Backbone, Chain Closure, and Sidechain; and together, they can search over a polypeptide segment. In addition, by creating new sidechain topology files, the Sidechain degree of freedom can be adapted for any molecule. See Sidechain Topology, for more information. The backbone and chain closure degrees of freedom work together to construct the backbone for an internal polypeptide segment. The sidechain degree of freedom is used for making the sidechains. See the menu below for more description of these degrees of freedom.
There are two degrees of freedom for I/O, WRITE and RBEST. The WRITE degree of freedom writes to a CONGEN conformation file the position of all atoms constructed up to that point in the search along with the latest evaluation of the conformation, see Conformation File. This file can be read back by the RBEST degree of freedom, it can be scanned for a particular conformation using the XCONF command, merged with other conformation files using the MERGE CG command (see CONGEN Related), and scanned with the CMPLOOP command (see Support Programs for Conformational Search). The RBEST degree of freedom is used to read the best conformations from a CONGEN conformation file. By using this degree of freedom, real space renormalization (see H. Scheraga, Biopolymers (1983) 22, 1-14) can be implemented.
Finally, there is the EVL degree of freedom. EVL is used to evaluate the conformation currently being constructed. Any type of energy manipulation is possible, see Energy Manipulations, but typically, only energy evaluation is done. The EVL degree of freedom can also be used for comparing generated conformations against a known structure, so that the theoretical limits of the sampling can be assessed. Finally, the EVL option can invoke a user written evaluation function or it can assign a random number to the evaluation of the conformation.
Although CONGEN was written for searching protein segments, it can be applied to arbitrary molecules. The sidechain degree of freedom reads a topology file, see Sidechain Topology, which can be used to describe the conformational degrees of freedom in any molecules. The sidechain degree of freedom is capable to searching any subset of the degrees of freedom, and therefore, search protocols like those used for proteins can be executed where the central part of a small molecule can be done exhaustively, and the peripheral moieties can be iteratively searched.
Because long searches are common, CONGEN can periodically save the state of a search in a checkpoint file and restart the run from such a point. In addition, the status of the run can be periodically written to a file which can be typed by the user as the program is executing.
When CONGEN performs a search, it initializes the positions of all atoms involved in the degrees of freedom. This prevents collisions between newly constructed atoms and their prior positions if any. If you are planning several sequential searches, then you should initialize the position of all the atoms involved (using a COOR INIT command, see Function of Coordinate Manipulations).
In some cases, it is desirable to repeat a search over a particular degree of freedom. For example, consider the problem of finding structures which satisfy a set of NMR constraints. Because many of the constraints involve sidechain atoms, it is desirable to search sidechains after each backbone. However, since constraints bridge across multiple sidechains, it is desirable to rebuild all the sidechains after each new one is added. CONGEN will support this type of operation in general by examining if atoms in one degree of freedom are reconstructed by a later degree of freedom. If so, then these “overlapping” atoms will be removed just prior to the sampling of any degree of freedom which generates new atomic positions. Note that this approach can be quite inefficienct, but it might be improved if this capability proves to be useful.
There is a limited capability to treat part of the molecule as a rigid body while other parts are being searched. The backbone and sidechain degrees of freedom both have FIX options, see Backbone Degree of Freedom, and Sidechain Degree of Freedom, which specify that atoms be constructed with the same bond lengths, bond angles, and torsion angles that they had when the CONGEN command was invoked. This can be used to explore how two domains interact with one another when a linker joining them is flexible.
It is possible to include a cavity formation term in the energy function used by the conformational search. See Gepol Command, for more information.
[1] For example, the van der Waals avoidance in the sidechain construction will adjust a chi angle selection until a close contact is avoided. Also, the chain closure procedure generates torsion angles over the complete domain of angles. In addtion, backbone and sidechain degrees of freedom can construct atoms using fixed torsion angles.