Insight II



I       Classic File Formats

The following describes fixed, non-changing formats for certain key files which can be used. Refer to Appendix B and the separate Common File Formats document for current formats of these files, as these files are written and used by current software.


Cartesian Coordinate File (.car, .cor)

For each atom, this file contains an atom name, three Cartesian coordinates, the residue name and number, and the absolute atom number in the full atom sequence. The potential atom type fields and partial atomic charge fields are no longer used, but are reserved for compatibility with older version coordinate files. This information is now specified in the molecular data file.

Starting with the 2.7.0 versions of Insight and Discover, the PBC record is read in the Cartesian coordinate file instead of the molecular data file. The format used for the Cartesian coordinate file and the archive file is identical. Both files have 80 characters. The new .car and .arc files start with a Biosym file header, followed by a line indicating whether PBC information is available. The coordinate section starts with the third line. The actual PBC information is provided after the title and the date lines of the coordinate section.

Note that this file consists of fixed-length (80-character) records.

Format of Cartesian coordinate file:

First line: !BIOSYM archive 3

Second line: PBC=ON or PBC=OFF

(The coordinate section begins with the third line.)
Third line: 1-64 Title for the system.

Fourth line: !date ##/##/## time: ##:##:## (if any; otherwise blank)
Fifth line:
PBC information IF present:
1-3 `PBC'
4-13 a in angstroms
14-23 b in angstroms
24-33 c in angstroms
34-43 alpha in degrees
44-53 beta in degrees
54-63 gamma in degrees
64-80 space group

The space group name adapted from the
Hermann-Manguin notation. This notation is modified slightly to accommodate a standard computer character set by eliminating slashes, subscripts, and bars over numbers. A number with a bar over it is denoted by the number with the hyphen character (-) immediately following. Subscripted numbers/letters are represented by joining the two characters together (non-subscripted number/letters must have a space between them). The entire space group designation must be enclosed within parentheses.

Sixth line - Nth line:
1-4 Atom name.
6-20 x Cartesian coordinate for the atom (angstrom).
21-35 y Cartesian coordinate for the atom (angstrom).
36-50 z Cartesian coordinate for the atom (angstrom).
52-55 Name of residue containing atom.
56-60 Residue sequence number relative to the
beginningof the current molecule.
62-65 Potential function atom type (left justified)
(ignored; see Molecular Data File).
71-72 Element type.
74-79 Partial charge on the atom.

Final line for a given molecule:
1-3 The word end.

Final line for the entire molecular system input:
1-3 The word end.


Molecular Data File (.mdf)

The molecular data file contains atom names, potential energy function types, partial atomic charges, bonding information, torsion definition, and pseudo atoms definition. In fact, all of the information needed to define a molecular system is specified in this file, except the actual Cartesian coordinates and the periodic boundary conditions parameters, which are specified in the Cartesian coordinate (.car) file. Thus, the .mdf and .car files together completely define a system.

These files are prepared automatically by Insight. It is, of course, possible to modify other model building programs to generate these files and thus interface with
Discover.

Format of the molecular data file:

There are seven record types in this specification of the molecular data file:

1. Header Record

2. Comment Record

3. Atom Record

4. End Record

5. Torsion Record

6. Pseudo Atom Record

7. Pseudo Atom Set Record

All records begin with record identifiers (keywords). In general, the data is free format. Because this is a free format file, at least one blank between fields is required and each field must have a non-blank entry.

Header Record

The first record of a molecular data file must be:

!BIOSYM molecular_data

Insight interprets this as being an ASCII file containing molecular data records as outlined above.

Comment Record

Comment lines begin with an ! and may occur anywhere after the first record. By convention Insight inserts a system title and a date comment record after the version record.

Example:
! I am a Sample Molecular Data File Comment Record
! 3-FEB-1947

Atom Record

Atom records follow the version and any comment records. Atom records form the core of the description of the molecular systems. Atoms must be listed consecutively by molecule, residue, and charge group.

Table 8¯1 . Atom Record Definition

CONTENTS COMMENTS

ATOM Record Identifier

atom-name An atom name which must be unique within this residue. Any combination of up to 4 alphanumeric characters is valid.

potential-atom-type The forcefield atom type for this atom (maximum 3 characters). It should correspond to a valid type in some forcefield file.

group-name A group name which is unique within this residue. Any combination of up to 4 alphanumeric characters is valid. All atoms in the same group must occur consecutively in the file. Groups should have a neutral net charge and are used for nonbond cutoffs.

residue-name* A residue type name. Any combination of up to 4 alphanumeric characters is valid.

residue-number(label)* A residue instance number. Any combination of up to 4 alphanumeric characters is valid. All atoms in the same residue must be consecutive in the file.

partial-charge Partial atomic charge in electrons.

switching-atom-flag 1 if this atom is the central atom for switching of nonbond potential cutoffs, 0 otherwise. One and only one atom in each group has the switching flag turned on.

oop-flag 1 if this is the central atom in an out-of-plane interaction, 0 otherwise.

free-energy-flag (for future use)

number-of-bonds Number of atoms bonded to this atom.

bonded-atom/bond-order
If only the atom name is given, then this bond is an intra-residue bond. If the bond is between atoms in different residues, then the residue name and number (separated by an underscore (_), no spaces) must accompany the atom name. The residue name and number is separated from the atom name by a colon (:) (again, no spaces). For example, CA in ALA 3 would be ALA_3:CA.**
The bond-order is a number between 0 and 3
= 0 if bond order is not used
= 1.0 if bond order is 1 (single)
= 1.5 if bond order is 1.5 (aromatic)
= 2.0 if bond order is 2.0 (double)
= 3.0 if bond order is 3.0 (triple)
This field is repeated for each bond to this atom (number-of-bonds times).

Example:

ATOM C c' CO GLY 3 0.38 1 1 0 3 CA/1.0 O/2.0 GLY_4:N/1.0

* Note that the combination of residue-name and residue-number must be unique within the molecule.

** It is necessary to remove all spaces in residue names and numbers when specifying bonds so that the entire atom specification can be read in as a unit during a free format read.

End Record

Each molecule is separated from the previous molecule with an end card. The end of all molecules in the atom based data is signaled by an end system record.

Torsion Record

Names can be assigned to specific dihedral angles with the TORSION record. It is possible to globally assign a name to all occurrences of a set of four atom names or to a specific set within one residue.

Table 8¯2. Torsion Record Definition

CONTENTS COMMENTS

TORSION Record identifier

residue-name The residue name of the residue where the atoms names are found (any combination of up to 4 alphanumeric characters).
* means all residue types.

residue-number(label)
The residue number of the residue where the atoms names are found (any combination of up to 4 alphanumeric characters).
* means all numbers.

angle-name The assigned name for this dihedral which can be any combination of up to 4 alphanumeric characters.

atom-name (x4) Four atom names. Atoms must be connected to each other in the order given. The dihedral named is that angle formed by the bonds connecting the first with second and the third with the fourth atoms while looking down the bond connecting the second to the third (the first atom being closest to the viewer). Atom names have one of four possible forms:

1. molecule: residue_number:atom

2. residue: atom

3. *:atom

4 atom

where:
molecule is an integer molecule number
residue is the residue type name
_ must occur between residue and
number
number is the residue instance number
atom is the atom name

The first form specifies a single particular atom in the system. The second form identifies an atom in any instance of the specified residue. The third form specifies an atom in any residue. The final form indicates an atom in the same residue as the torsion. Note that by convention the torsion is assigned to the residue in which the second ERatom occurs.

Example:

TORSION * * phi *:C N CA C
TORSION VAL * chi1 N CA CB CG1

Pseudo Atom Record

A pseudo atom is defined as the weighted average position of a set of atoms. The method for calculating this position can be arithmetic averaging, center of mass.

One important use of pseudo atoms is when using distance information derived from NOE intensities when assignments could not be made.

Table 8¯3. Pseudo Atom Definition

CONTENTS COMMENTS

PSEUDO identifier Must be after the ATOM record and before the END record. Must also precede the corresponding PSEUDOSET record.

Number Indicates the count of pseudo atoms.

name Indicates user specified name of the pseudo atom. Any combination of up to 4 alphanumeric characters is valid. If no name has been given, the internal name generation scheme is:
:residuename_#:Xn is used where:

:residuename is four alphanumberic characters if all
the members of the pseudo atom
belong to the same residue.
Otherwise, the keyword XRES
is used. The keyword XRES is also
used when the pseudo atom set is
empty.
_ must occur between :residuename
and residue#.
residue# is the residue instance number of
:residuename.
:Xn X is mandatory and n is a 3-digit
integer representing the sequence
number of pseudo atoms generated
for that particular residue. If there is
only one pseudo atom for that
residue, X is appended.
Consequently, the number of
internally named pseudo atoms for
one particular residue is limited to
999. The total number of inter-
residue pseudo atoms is limited
to 999.

Criterion_code One letter code that represents the method used for calculating the pseudo atom position.
A = arithmetic
C = center of Mass
F = fixed point in space (the pseudo atom set is then empty)

X,Y,Z coordinates Specifies pseudo atom coordinates. These are optional since they are calculated. However, in the case of a fixed point in space pseudo atom, those values are used for the empty pseudo atom set. Real numbers.

Example:

PSEUDO 23 CYS_26:X A

or:

PSEUDO 10 XRES:X A

Pseudo Atom Set Record

A pseudo atom set is a group of atoms used for defining a pseudo atom. The pseudo atom set can be an empty set. The atom members of one pseudo atom set can also be members of another pseudo atom set. A pseudo atom can be composed of members from different residues (inter-residue pseudo atom). However, all the members of a pseudo atom set must belong to the same molecule.

Table 8¯4. Pseudo Atom Set Definition

CONTENTS COMMENTS

PSEUDOSET identifier Must precede the END record and follow the corresponding PSEUDO record. There can be more than one pseudoset record for a given pseudo record.

residuename_residue#:atomname residuename is the residue type name
the atom belongs to.
Any combination of up to
four alphanumeric
characters is valid.
residue# is the residue instance
number of residuename.
Any combination of up to
four alphanumeric
characters is valid.

atomname is the name of the atom.
It must be unique within
the specified residue
and molecule and must
be an atom name as
labeled in a previous
atom record.

: and _ Delimiters are colon (:)
and underscore (_).

Only three pseudo_set_atom_name specifications can be made per record. More than one PSEUDOSET record per corresponding PSEUDO record is allowed.

Example:

PSEUDOSET ALA_38:N ALA_38:CA ALA_38:C
PSEUDOSET ALA_38:0


Cartesian Coordinate Archive File (.arc)

This file is used internally by the program to archive and retrieve sets of Cartesian coordinates during a run. The file is formatted and can be examined by printing or editing it. The format for each set of coordinates is essentially identical to the format of the Cartesian coordinate file (.car). Note that this file consists of fixed-length (80-character) records.

First line: !BIOSYM archive 1

Second line: PBC=ON or PBC=OFF

(For the coordinate section, the lines below are repeated for each archived file.)
Third line:
1-64 title for the system
65-80 energy obtained from the most recent
minimization run (if any)
Fourth line:
!date ##/##/## time: ##:##:## (if any;
otherwise blank)

Fifth line: PBC information IF present:
1-3 "PBC"
4-13 a in angstroms
14-23 b in angstroms
24-33 c in angstroms
34-43 alpha in degrees
44-53 beta in degrees
54-63 gamma in degrees
64-80 space group

Sixth line-Nth line:
1-4 atom name
6-20 x Cartesian coordinate for the atom (angstroms)
21-35 y Cartesian coordinate for the atom (angstroms)
36-50 z Cartesian coordinate for the atom (angstroms)
52-55 name of residue containing atom
56-60 residue sequence number relative to the
beginning of the current molecule
62-65 potential function atom type (left justified)
(ignored; see Molecular Data File)
66-75 partial charge on the atom
(ignored; see Molecular Data File)
76-80 absolute atomic sequence number relative to the
beginning of the entire molecular system

Final line for a given molecule:
1-3 the word "end"

Final line for the entire molecular system input:
1-3 the word "end"




Last updated December 17, 1998 at 04:29PM PST.
Copyright © 1998, Molecular Simulations Inc. All rights reserved.