Common File Formats



1       Current File Formats


Version 98.0, December 1998

This document describes the file formats used by more than one Insight II product or module in the 98.0 software release. The information is presented in alphabetical order, by filename extension, or by descriptive name where there is no mandatory filename extension.

This book lists only the file formats that are used by more than one Insight II product or module. You should always look in the File Formats appendix of each product's user guide for information about formats that are unique to that product.

The appendix of this book describes fixed, non-changing formats that were used by earlier versions of Insight II software and that can still be used. Each of these classic formats has a current version described in this main section of the book.

Table 1 lists the products or modules that use the formats described in this book.

Table 1. Release 98.0 File Formats and Their Modules

format page modules
align   4   Consensus, Homology  
arc   6   Insight II, Discover, others  
car   9   Insight II, Discover, others  
cor   18   Insight II, Discover  
elements.dat   19   Insight II  
fhis   39   See his.  
frm   21   Insight II  
grf   33   Insight II  
hessian   37   Discover, DGII, DMol, Turbomole, Zindo  
hessianx   37   See hessian.  
his   39   Analysis, Decipher, Discover, Insight II  
ltpl   47   Insight II  
ludi_pseudo_protein   48   Ligand_Design  
mdf   49   Insight II, Discover, others  
msf   68   Insight II  
pdb   69   Insight II  
pdbx   70   X-PLOR  
pks   71   Felix, NMR_Refine  
plb   72   DGII, NMR_Refine  
ppm   75   Felix, NMR_Refine  
pre   76   Discover  
pro_angle.dat   76   Consensus, Homology, NMR_Refine, Xsight  
pro_bond.dat   78   Consensus, Homology, NMR_Refine, Xsight  
proj   81   DGII, NMR_Refine  
pro_misc.dat   83   Consensus, Homology, NMR_Refine, Xsight  
psf   86   X-PLOR  
rlb   86   Insight II, Ligand_Design  
rstrnt   99   Consensus, Discover, Discover_3, Felix, NMR_Refine  
rtf   111   Insight II, Modeler  
scs_tor   112   Analysis, Apex-3D, Decipher, Search_Compare  
sd   118   Analysis, Apex-3D, Converter  
seq   118   Consensus, Homology  
sub   119   Insight II  
tab   120   Insight II  
tbl   120   Insight II, Discover, others  
usr   129   Insight II  
xdr_tor   112   See scs_tor.  
xhessian   37   See hessian.  


Sequence Alignment Files (.align)

An alignment of two or more amino acid sequences can be read into Consensus or Homology from a .align file with the Get Sequences Alignment command. The filename should end in the extension .align, since only files with this extension are listed in the value-aid in the Get Sequences Alignment command. The file is a text file containing lines of no more than 1000 characters each. Shorter lines (typically 80 characters or less) can be used, if desired, to make the file easier to read and edit. Each line begins with a protein name (up to six characters long) followed by a colon (":"). The remainder of the line contains the amino acid sequence of the named protein. In addition to the single-letter amino acid codes listed above, the sequence may contain gap characters ("-") and break characters ("|"); the latter indicate breaks between protein chains. The text lines are organized into blocks of lines, each block containing exactly one line for each protein in the alignment. Blocks are separated from one another by one or more blank lines; therefore no blank lines are allowed within a block. The proteins must be listed in the same order in all blocks in the file.

The protein name must conform to the rules for object names in Insight II. This means that the name must contain only letters, digits, underscores ("_"), and dollar signs ("$"), and that the first character must be either a letter or a dollar sign. The Consensus program corrects an illegal protein name read from an alignment file if only the first character is illegal. If the first character is a digit or an underscore, but the second character is a letter or dollar sign, then the illegal first character is simply deleted from the name. If both the first and second characters are digits or underscores, then a leading dollar sign is prepended to the name.

The colon at the end of each protein name can be followed by any number of space characters to separate the name from the sequence. In the first block these intervening spaces are used to establish the alignment of the N-termini of the sequences. Tab characters cannot be used for this purpose.

In an alignment between a long sequence and a short sequence fragment, the shorter sequence might not span all blocks in the file. In such cases, each block not containing the shorter sequence must still have a line for that sequence; these "place-holding" lines contain only the name of the protein, terminated by a colon.

Alignment files can also contain comments beginning with either a pound sign ("#") or exclamation mark ("!"). A comment can be placed either on a line by itself or at the end of a line containing sequence information. Comments are completely ignored when the file is read. A line containing only a comment is not recognized as a blank line for the purpose of separating blocks of sequence lines.

Sample .align File

Here is an example of a sequence alignment in the correct format for an alignment file:


# Comments like this one are completely ignored. 
! The exclamation mark also denotes a comment.
# Note how leading space characters are used in the first
# block to establish the proper alignment of the N-termini of
# the four sequences. Also notice that a "place-holding" line
# is required in the third block because the short sequence IER
# does not extend that far.
IER: mtqspsslsas-vgdrvtitcqas------qdiikylnwyqqtpgka
PCM1: VMTQSPSSLSVSA-GERVTMSCKSSQSLLNSGNQKNFLAWYQQKPGQP
JBF: EIVLTQSPAITAASL-GQKVTITCSASSS-------VSSLHWYQQKSGTS
F91: IQMTQT-TSSLSASLGDRVTISCRASQD------ISNYLNWYQQKPDGT
IER: pklliyeasnlqagvpsrfsgsgsgtdytftisslqped PCM1: PKLLIYGASTRESGVPDRFTGSGSGTDFTLTISSVQAEDLAVYYCQNDHS
JBF: PKPWIYEISKLASGVPARFSGSGSGTSYSLTINTMEAEDAAIYYCQQWT-
F91: VKLLVYYTSRLHSGVPSRFSGSGSGTDYSLTISNLEHEDIATYFCQQGST
IER: # place holder
PCM1: YP-LTFGAGTKLEIKRADAAPTVSIFPPSSEQLTSGGAS
JBF: YPLITFGAGTKLELKRADAAPTVSIFPPSSEQ
F91: TP-RTFGGGTKLEIKRRADAAPTVSIFPPS


Cartesian Coordinate Archive File (.arc)

The .arc file has the same format as the .car file (see Cartesian Coordinate File (.car) on page 6), with two exceptions. The first difference is that the .arc file may contain more than one coordinate header section. This means that an .arc file may contain multiple instances ("frames") of the same molecular system. The second difference is that the .arc file is generally written as a single record, instead of a series of single records where each line of data contains a carriage return as the last character in the line. Two utilities are available for converting the single-record .arc file to a multiple record file and vice versa. The descriptions of the dirtoseq and seqtodir utilities are found in Part 1 of the Discover 2.9.x/98.0/3.0.0 online user documentation.

The format is:

File format line
HELIX header line (if present)
PBC header line
title/energy line
date line
PBC record (if present)
HELIX record for molecule A (if present)
atom record for atom 1
.
.
.
atom record for atoms i
end
HELIX record for molecule B (if present)
atom record for atom i + 1
.
.
.
atom record for atoms j
end
HELIX record for molecule C (if present)
atom record for atom j + 1
.
.
.
atom record for atoms k
end
.
.
.
end
end
title/energy line
date line
PBC record (if present)
HELIX record for molecule A (if present)
atom record for atom 1
.
.
.
atom record for atoms i
end
HELIX record for molecule B (if present)
atom record for atom i + 1
.
.
.
atom record for atoms j
end
HELIX record for molecule C (if present)
atom record for atom j + 1
.
.
.
atom record for atoms k
end
.
.
.
end
end

Sample .arc File

The example shows an .arc file for a 2D-periodic, helical system with three frames. Ordinarily, an .arc file consists of a single long line with no carriage returns. Carriage returns have been inserted into the following example to make it more understandable.


!BIOSYM archive 3
HELIX
PBC=2D
Frame 1
!DATE Thu Jul 19 18:39:47 1993
PBC 18.6200 18.6200 90.0000 (P 1)
HELIX 143.3598 7.6194 90.0000 90.0000 0.0000 0.0000
C1 3.108016491 0.653186858 -8.526236534 ETHE 1 c3 C -0.300
H11 2.814816952 -0.348720580 -8.761003494 ETHE 1 h H 0.100
H12 2.517393827 1.015321732 -7.710808277 ETHE 1 h H 0.100
C2 4.596541882 0.674196541 -8.131963730 ETHE 1 c2 C -0.200
H21 4.748424053 0.049042296 -7.276969910 ETHE 1 h H 0.100
H22 4.889741421 1.676103950 -7.897196770 ETHE 1 h H 0.100
end
HELIX 121.2043 17.6194 90.0000 90.0000 0.5000 0.4500
C1 11.610512733 -0.825278699 -10.467529297 ETHE 1 c2 C -0.200
F11 11.762401581 -1.450428009 -9.612532616 ETHE 1 f F 0.100
H11 11.903706551 0.176631495 -10.232768059 ETHE 1 h H 0.100
C2 12.459976196 -1.346121073 -11.640320778 ETHE 1 c2 C -0.200
H21 12.166784286 -2.348031998 -11.875079155 ETHE 1 h H 0.100
H22 12.308086395 -0.720973670 -12.495317459 ETHE 1 h H 0.100
end
end
Frame 2
!DATE Thu Jul 19 18:39:47 1993
PBC 18.6200 18.6200 90.0000 (P 1)
HELIX 143.3598 7.6194 90.0000 90.0000 0.0000 0.0000
C1 3.108016491 0.653186858 -8.526236534 ETHE 1 c3 C -0.300
H11 2.814816952 -0.348720580 -8.761003494 ETHE 1 h H 0.100
H12 2.517393827 1.015321732 -7.710808277 ETHE 1 h H 0.100
C2 4.596541882 0.674196541 -8.131963730 ETHE 1 c2 C -0.200
H21 4.748424053 0.049042296 -7.276969910 ETHE 1 h H 0.100
H22 4.889741421 1.676103950 -7.897196770 ETHE 1 h H 0.100
end
HELIX 121.2043 17.6194 90.0000 90.0000 0.5000 0.4500
C1 11.610512733 -0.825278699 -10.467529297 ETHE 1 c2 C -0.200
F11 11.762401581 -1.450428009 -9.612532616 ETHE 1 f F 0.100
H11 11.903706551 0.176631495 -10.232768059 ETHE 1 h H 0.100
C2 12.459976196 -1.346121073 -11.640320778 ETHE 1 c2 C -0.200
H21 12.166784286 -2.348031998 -11.875079155 ETHE 1 h H 0.100
H22 12.308086395 -0.720973670 -12.495317459 ETHE 1 h H 0.100
end
end
Frame 3
!DATE Thu Jul 19 18:39:47 1993
PBC 18.6200 18.6200 90.0000 (P 1)
HELIX 143.3598 7.6194 90.0000 90.0000 0.0000 0.0000
C1 3.108016491 0.653186858 -8.526236534 ETHE 1 c3 C -0.300
H11 2.814816952 -0.348720580 -8.761003494 ETHE 1 h H 0.100
H12 2.517393827 1.015321732 -7.710808277 ETHE 1 h H 0.100
C2 4.596541882 0.674196541 -8.131963730 ETHE 1 c2 C -0.200
H21 4.748424053 0.049042296 -7.276969910 ETHE 1 h H 0.100
H22 4.889741421 1.676103950 -7.897196770 ETHE 1 h H 0.100
end
HELIX 121.2043 17.6194 90.0000 90.0000 0.5000 0.4500
C1 11.610512733 -0.825278699 -10.467529297 ETHE 1 c2 C -0.200
F11 11.762401581 -1.450428009 -9.612532616 ETHE 1 f F 0.100
H11 11.903706551 0.176631495 -10.232768059 ETHE 1 h H 0.100
C2 12.459976196 -1.346121073 -11.640320778 ETHE 1 c2 C -0.200
H21 12.166784286 -2.348031998 -11.875079155 ETHE 1 h H 0.100
H22 12.308086395 -0.720973670 -12.495317459 ETHE 1 h H 0.100
end


Cartesian Coordinate File (.car)

The .car file contains the Cartesian coordinates of a molecular system, as well as related positional information. This information is dynamic and can change during the course of a calculation.

Note that .car files are used by several Insight II products; therefore, some of the information present may be ignored by some programs or used only by certain programs. In particular, the ability of the .car file to support helical and 2D periodic systems is accomplished by the presence of special lines that are present only if the .car file contains 2D periodic and/or helix information. This information is currently relevant only to the Polymer product, and in that context, only to the Discover 98.0/3.0.0 program. In addition, since the Discover program does not handle infinite helices, it does not read .car files containing helix information. If a .car file contains 2D periodicity without helix information, however, the Discover 98.0/3.0.0 program (but not the Discover 2.9.x program) can read it and can also write files for these systems.

There are several differences between the file format described in the Discover 2.9.0 and Insight 2.2.0 documentation and the format presented here, to enable atom names, potential types, and residue names to be longer than in previous versions.

The .car file consists of one file header (which includes statements indicating what kinds of information are present in the file), one coordinate section header, optional 2D or 3D periodicity records, one coordinate section for each molecule in the file, and one end-of-file statement. All lines in the file are exactly 80 characters long. The coordinate section(s) includes:

1.   Optional helix record.

2.   Atom records.

3.   End-of-section statement.

The overall structure of a .car file for nonhelical systems is shown in Table 2 and that of a file containing helix information is shown in Table 2. Descriptions and examples of the major parts follow. The Insight and Polymer documentation explains the orientation of the various axes and angles referred to in the .car file description. Finally, examples of .car files for several kinds of systems are shown.

Table 2. Structure of a Cartesian Coordinate File for Systems Not Characterized by Helical Symmetry (Page 1 of 2)

NOTE: Information relating to 2D periodicity is indicated by bold type; italic type indicates contents that are replaced by real data; plain type indicates actual contents. Coordinate conventions are described in the Insight and Polymer documentation.

line columns contents comments
First:     !BIOSYM archive 3    
Second:     PBC=ON, PBC=OFF, PBC=2D   one of these three choices must be present  
The coordinate header begins with the third line:  
Third:   1-64   title for the system   if available--this line may be blank but must be present  
  65-80   energy  
Fourth:     ! DATE day month date time year   or just !DATE  
Fifth line: PBC information if PBC=ON:  
  1-3   PBC    
  4-13   a   cell vector a in angstroms  
  14-23   b   cell vector b in angstroms  
  24-33   c   cell vector c in angstroms  
  34-43   alpha   cell angle in degrees  
  44-53   beta   cell angle in degrees  
  54-63   gamma   cell angle in degrees  
  64-80   space group name    
Fifth line: PBC information if PBC=2D:  
  1-3   PBC    
  4-13   k   plane vector k in angstroms  
  14-23   l   plane vector l in angstroms  
  24-33   gamma   plane angle in degrees  
  34-50   plane group name    
Sixth-Nth (This section is repeated for each molecule in the system.):  
  1-5   atom name    
  7-20   x Cartesian coordinate of atom   in angstroms  
  22-35   y Cartesian coordinate of atom   in angstroms  
  37-50   z Cartesian coordinate of atom   in angstroms  
  52-55   type of residue containing atom    
  57-63   residue sequence name   relative to beginning of current molecule, left justified  
  64-70   potential type of atom   left justified  
  72-73   element symbol    
  75-80   partial charge on atom    
Final line for a given molecule:  
  1-3   end    
Final line for the entire molecular system input:  
  1-3   end    

Table 3. Structure of Cartesian Coordinate File Containing Helix Information (Page 1 of 2)

NOTE: Information relating to helices and 2D periodicity is indicated by bold type; italic type indicates contents that are replaced by real data; plain type indicates actual contents. Coordinate conventions are described in the Insight and Polymer documentation. PBC=2D can be read by the Discover 98.0/3.0.0 program but not the Discover 2.9.x program.

line columns contents comments
First:     !BIOSYM archive 3    
Second:     HELIX    
Third:     PBC=OFF, PBC=2D   one of these two choices must be present  
The coordinate header begins with the fourth line:  
Fourth:   1-64   title for the system   if available--this line may be blank but must be present  
  65-80   energy  
Fifth:   1   ! DATE day month date time year   or just !DATE  
Sixth line: PBC information if PBC=2D:  
  1-3   PBC    
  4-13   k   plane vector k in angstroms  
  14-23   l   plane vector l in angstroms  
  24-33   gamma   plane angle in degrees  
  34-50   plane group name    
Seventh (This helix record is part of the section that repeats for each molecule in the system.):  
  1-5   HELIX    
  6-15   sigma   in degrees  
  16-25   d   in angstroms  
  26-35   kappa   angle between l axis and helix axis in degrees  
  36-45   lambda   angle between k axis and helix axis in degrees  
  46-55   Tk   fractional position of helix axis along k axis  
  56-65   Tl   fractional position of helix axis along l axis  
Eighth-Nth (This section repeats for each molecule in the system.):  
  1-5   atom name    
  7-20   x Cartesian coordinate of atom   in angstroms  
  22-35   y Cartesian coordinate of atom   in angstroms  
  37-50   z Cartesian coordinate of atom   in angstroms  
  52-55   type of residue containing atom    
  57-63   residue sequence name   relative to beginning of current molecule, left justified  
  64-70   potential type of atom   left justified  
  72-73   element symbol    
  75-80   partial charge on atom    
Final line for a given molecule:  
  1-3   end    
Final line for the entire molecular system input:  
  1-3   end    

File Header

The first record of a .car file must be:


!BIOSYM archive #

The ! must be the first character in the file. The Discover program interprets this line as indicating an ASCII file containing coordinate records as outlined in this section. The string archive indicates that the contents of the file are those of a .car file; the # identifies the file format. For example, 3 indicates that the file format is as specified here for the Discover program, versions 2.9.5/3.2 and later.

If helix information is not present in the .car file, then the second line indicates whether the file contains PBC information. If helix information is present, then the second line of the .car file consists of the word HELIX and the third line indicates whether the file contains PBC information. Note that helical symmetry is not currently compatible with 3D periodicity. So if the second line is HELIX and PBC=ON is found, an error message is generated.

Valid file headers:


!BIOSYM archive 3
HELIX
PBC=2D
!BIOSYM archive 3
HELIX
PBC=OFF
!BIOSYM archive 3
PBC=ON
!BIOSYM archive 3
PBC=OFF
!BIOSYM archive 3
PBC=2D

Coordinate Section Header

The coordinate section header consists of two lines. Both lines must be present in the .car file. The first line, which may be blank, usually contains a title and the energy. The second line must contain the characters !DATE, optionally followed by the full date when the file was written (see Tables 2 and 2).

Periodicity Records

A periodicity record is present in the .car file only if the entry PBC=ON or PBC=2D is present in the file header.

For 3D periodicity, indicated by the file header line PBC=ON, the PBC section is the same for the Insight and Discover programs (see Table 2). This line contains the word PBC, followed by the a, b, and c unit cell lengths, the values of the , , and angles, and the space group name. Please see the Discover 2.9.x/98.0/3.0.0 User Guide for a discussion of valid space group names.

For 2D periodicity, indicated by the file header line PBC=2D, the PBC section (Tables 2 and 2) contains the word PBC, the k and l values, the value of , and the plane group name. These coordinates are explained in the Insight and Polymer documentation. Currently, only the (P1) plane group is supported, and only the Discover 98.0/3.0.0 program (not the Discover 2.9.x program) reads such files.

Example PBC record for 2D periodicity:


PBC   18.6200   18.6200   90.0000 (P 1)

Coordinate Section

Coordinate records are present for each molecule in the system, so this section is repeated as a whole for each molecule.

Helix Records

If any helix information is present in a .car file, the second line of the file must be the word HELIX.

If a molecule has helical symmetry (i.e., it is an infinite helix), then an extra line is present in the relevant coordinate section, just before the atom records for that molecule. Note that each helical molecule has its own helix record. The new line in the coordinate section contains the word HELIX, followed by the and d values, the and angles, and the Tk and Tl positions (see Table 2). These coordinates are explained in the Insight and Polymer documentation.

Example helix record:


HELIX  143.3598    7.6194   90.0000   90.0000    0.5000    0.4500

Atom Records

The atom records (see Table 2) contain information that identifies the complete Insight name of each real atom, some general properties of the atom, and its location in 3D space. "Ghost" atoms are generated from real ones according to symmetry and periodicity information (see, for example, Periodic Boundary Conditions, in the Discover 2.9.x/98.0/3.0.0 User Guide).

End Records

The coordinate section for each molecule in the system must end with the word "end" in the first three columns of the last line of the section. In addition, the entire file must end with the word "end", also in the first three columns of the last line of the file.

Sample .car Files

The following examples indicate the correct format for .car files (although the data are not necessarily physical).

The underlined row of numbers at the top of each file simply indicates column numbers--they are not part of a .car file.

Example 1: Nonperiodic, Nonhelical System


         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
!BIOSYM archive 3
PBC=OFF

!DATE Thu Jul 19 18:39:47 1993
C1 3.108016491 0.653186858 -8.526236534 ETHE 1 c2 C -0.200
H11 2.814816952 -0.348720580 -8.761003494 ETHE 1 h H 0.100
H12 2.517393827 1.015321732 -7.710808277 ETHE 1 h H 0.100
C2 4.596541882 0.674196541 -8.131963730 ETHE 1 c2 C -0.200
H21 4.748424053 0.049042296 -7.276969910 ETHE 1 h H 0.100
H22 4.889741421 1.676103950 -7.897196770 ETHE 1 h H 0.100
end
C1 11.610512733 -0.825278699 -10.467529297 ETHE 1 c2 C -0.200
H11 11.762401581 -1.450428009 -9.612532616 ETHE 1 h H 0.100
H12 11.903706551 0.176631495 -10.232768059 ETHE 1 h H 0.100
C2 12.459976196 -1.346121073 -11.640320778 ETHE 1 c2 C -0.200
H21 12.166784286 -2.348031998 -11.875079155 ETHE 1 h H 0.100
H22 12.308086395 -0.720973670 -12.495317459 ETHE 1 h H 0.100
end
end

Example 2: 3D-Periodic, Nonhelical System


         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
!BIOSYM archive 3
PBC=ON

!DATE Thu Jul 19 18:39:47 1993
PBC 18.6200 18.6200 18.6200 90.0000 90.0000 90.0000 (P 1)
C1 3.108016491 0.653186858 -8.526236534 ETHE 1 c2 C -0.200
H11 2.814816952 -0.348720580 -8.761003494 ETHE 1 h H 0.100
H12 2.517393827 1.015321732 -7.710808277 ETHE 1 h H 0.100
C2 4.596541882 0.674196541 -8.131963730 ETHE 1 c2 C -0.200
H21 4.748424053 0.049042296 -7.276969910 ETHE 1 h H 0.100
H22 4.889741421 1.676103950 -7.897196770 ETHE 1 h H 0.100
end
C1 11.610512733 -0.825278699 -10.467529297 ETHE 1 c2 C -0.200
H11 11.762401581 -1.450428009 -9.612532616 ETHE 1 h H 0.100
H12 11.903706551 0.176631495 -10.232768059 ETHE 1 h H 0.100
C2 12.459976196 -1.346121073 -11.640320778 ETHE 1 c2 C -0.200
H21 12.166784286 -2.348031998 -11.875079155 ETHE 1 h H 0.100
H22 12.308086395 -0.720973670 -12.495317459 ETHE 1 h H 0.100
end
end

Example 3: 2D-Periodic, Nonhelical System

These files are not read by the Discover 2.9.x program, but are read by the Discover 98.0/3.0.0 program.


         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
!BIOSYM archive 3
PBC=2D

!DATE Thu Jul 19 18:39:47 1993
PBC 18.6200 18.6200 90.0000 (P 1)
C1 3.108016491 0.653186858 -8.526236534 ETHE 1 c2 C -0.200
H11 2.814816952 -0.348720580 -8.761003494 ETHE 1 h H 0.100
H12 2.517393827 1.015321732 -7.710808277 ETHE 1 h H 0.100
C2 4.596541882 0.674196541 -8.131963730 ETHE 1 c2 C -0.200
H21 4.748424053 0.049042296 -7.276969910 ETHE 1 h H 0.100
H22 4.889741421 1.676103950 -7.897196770 ETHE 1 h H 0.100
end
C1 11.610512733 -0.825278699 -10.467529297 ETHE 1 c2 C -0.200
H11 11.762401581 -1.450428009 -9.612532616 ETHE 1 h H 0.100
H12 11.903706551 0.176631495 -10.232768059 ETHE 1 h H 0.100
C2 12.459976196 -1.346121073 -11.640320778 ETHE 1 c2 C -0.200
H21 12.166784286 -2.348031998 -11.875079155 ETHE 1 h H 0.100
H22 12.308086395 -0.720973670 -12.495317459 ETHE 1 h H 0.100
end
end

Example 4: Nonperiodic, Helical System


         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
!BIOSYM archive 3
HELIX
PBC=OFF

!DATE Thu Jul 19 18:39:47 1993
HELIX 143.3598 7.6194 90.0000 90.0000 0.0000 0.0000
C1 3.108016491 0.653186858 -8.526236534 ETHE 1 c2 C -0.200
H11 2.814816952 -0.348720580 -8.761003494 ETHE 1 h H 0.100
H12 2.517393827 1.015321732 -7.710808277 ETHE 1 h H 0.100
C2 4.596541882 0.674196541 -8.131963730 ETHE 1 c2 C -0.200
H21 4.748424053 0.049042296 -7.276969910 ETHE 1 h H 0.100
H22 4.889741421 1.676103950 -7.897196770 ETHE 1 h H 0.100
end
HELIX 121.2043 17.6194 90.0000 90.0000 0.0000 0.0000
C1 11.610512733 -0.825278699 -10.467529297 ETHE 1 c2 C -0.200
H11 11.762401581 -1.450428009 -9.612532616 ETHE 1 h H 0.100
H12 11.903706551 0.176631495 -10.232768059 ETHE 1 h H 0.100
C2 12.459976196 -1.346121073 -11.640320778 ETHE 1 c2 C -0.200
H21 12.166784286 -2.348031998 -11.875079155 ETHE 1 h H 0.100
H22 12.308086395 -0.720973670 -12.495317459 ETHE 1 h H 0.100
end
end

Example 5: 2D-Periodic, Helical System

These files are not read by the Discover 2.9.x program, but are read by the Discover 98.0/3.0.0 program.


         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
!BIOSYM archive 3
HELIX
PBC=2D

!DATE Thu Jul 19 18:39:47 1993
PBC 18.6200 18.6200 90.0000 (P 1)
HELIX 143.3598 7.6194 90.0000 90.0000 0.0000 0.0000
C1 3.108016491 0.653186858 -8.526236534 ETHE 1 c2 C -0.200
H11 2.814816952 -0.348720580 -8.761003494 ETHE 1 h H 0.100
H12 2.517393827 1.015321732 -7.710808277 ETHE 1 h H 0.100
C2 4.596541882 0.674196541 -8.131963730 ETHE 1 c2 C -0.200
H21 4.748424053 0.049042296 -7.276969910 ETHE 1 h H 0.100
H22 4.889741421 1.676103950 -7.897196770 ETHE 1 h H 0.100
end
HELIX 121.2043 17.6194 90.0000 90.0000 0.5000 0.4500
C1 11.610512733 -0.825278699 -10.467529297 ETHE 1 c2 C -0.200
H11 11.762401581 -1.450428009 -9.612532616 ETHE 1 h H 0.100
H12 11.903706551 0.176631495 -10.232768059 ETHE 1 h H 0.100
C2 12.459976196 -1.346121073 -11.640320778 ETHE 1 c2 C -0.200
H21 12.166784286 -2.348031998 -11.875079155 ETHE 1 h H 0.100
H22 12.308086395 -0.720973670 -12.495317459 ETHE 1 h H 0.100
end
end


Output Coordinate File (.cor)

The output coordinate file is automatically output at the end of a Discover 2.9.x minimization run and contains the minimized set of coordinates for the system. However, if you want to output a .cor file with the Discover 98.0/3.0.0 program, you must specify it with the writeFile or the Analyze/Output command (see the Discover 2.9.x/98.0/3.0.0 User Guide).

If you want to continue a minimization from the point at which a previous minimization finished, then rename the output .cor file to an input .car file before performing the minimization. Alternatively, the .cor file name can be specified in the begin command (see the Discover 2.9.x/98.0/3.0.0 User Guide for versions 2.9.x and 98.0/3.0.0, respectively, of the Discover program).

The format of the .cor file is identical to that of the .car file.


Elements Data File (elements.dat)

The elements data file (elements.dat) is used by Insight II to retrieve various types of information about the periodic table elements. The data file contains the values for van der Waals and covalent radii; minimum, maximum, and common valences; and atomic weights for all the elements supported by Insight II. The data file also contains the bond lengths for bonds between different elements.

There are three kinds of record types in the element data file:

1.   Comment Record

2.   Element Record

3.   Bond Record

Comment Record

Comment lines begin with a # and may occur anywhere in the file.

Example:


# code vdw radius cov. radius min val max val common weight 

Element Record

Element records contain the various type of information about the elements (Table 4).

Table 4. Element Record Definition

Contents Comment
element   Record identifier  
element code   One or two letter element name  
vdw radius   van der Waals radius of the element  
covalent radius   Covalent radius of the element  
minimum valence   Minimum number of bonds allowed for the element  
maximum valence   Maximum number of bonds allowed for the element  
common valence   Number of bonds in the most common state  
atomic weight   Atomic weight of the element  

Example:


element H   1.10        0.32    1.0    1.0    1.0        1.008 
element C 1.55 0.77 4.0 4.5 4.0 12.011
element N 1.40 0.75 2.5 5.0 3.0 14.007

Bond Record

The bond record specifies the bond lengths between various elements. All the bond records must come after the element records. The bond lengths for all lone pairs are assumed to be 1.1 Å.

Table 5. Bond Record Definition

Contents Comment
bond   Record identifier  
element code   One- or two-letter element name  
element code   One- or two-letter element name  
bond length   Bond length in angstroms  

Example:


bond H  N        1.03 
bond C C 1.54
bond C O 1.43
bond L S 1.1


Free-Format Files (.frm)

Free-format files, with the .frm extension, should be stored in and used from the $BIOSYM/data/insight directory. If you do not have write permission in this directory, you can setenv $INSIGHT_DATA to another location before you start up Insight.

Format files consist of sequentially executed commands from the set described below. All but the FORMAT and BOND_TABLE commands are a single line. The commands in the group beginning FORMAT_ are followed by any number of field specifiers and terminated by an END_FORMAT command. All commands and field specifiers must appear exactly as listed here; no abbreviations or lower case letters are allowed.

IGNORE_FOR number

On input: Skip the given number of lines. Useful for skipping a fixed length header section (a line of the header can be read as the TITLE field).

On output: No function on output.

IGNORE_TO [start] string

On input: Skip lines until a line is encountered which contains the given string. The optional start parameter may be used to indicate in what column testing for the match should begin, or may be * to check for a match anywhere in the line. Note that column numbering begins at 1 for the leftmost column. If no start column is supplied, matching begins in column 1. This command might be used to get down to the ATOM section of a pdb file.

On output: No function on output.

IGNORE_WHILE [start] string

On input: Skip lines while they match the string given. The optional start parameter functions as in IGNORE_TO.

On output: No function on output.

MARKER [start] string

On input: Skip one line from the input file

On output: Output the given string starting in the column specified or column 1 if no start is given. An asterisk (*) given for the start column is interpreted as column 1 in this command. Used for such things as END markers that separate atom and connectivity sections of the file.

FORMAT_FOR number

On input: Read the specified number of lines using the format that follows. The number field is often a symbolic variable such as $NUM_ATOMS filled by an earlier read.

On output: Write the specified number of lines using the given format. When fields corresponding to atom data are included, the data come from the list of specified atoms starting at the beginning and advancing one atom each time the format is applied. If a symbolic variable like $NUM_ATOMS is used, then it is evaluated to the number of atoms in the object specified in the put command.

FORMAT_TO [start] string

On input: Read input lines using the format until a line containing the given string is encountered. The line with the matching string is not processed. This type of read can be used in conjunction with the MARKER command for files with sections separated by markers such as END.

On output: Write out the information in the format for every atom specified in the put command.

FORMAT_WHILE [start] string

On input Read input lines until encountering a line whose initial characters do not match the given string. Stops so that this non-matching line is the next line to be read. This type of read is designed for pdb-style files where sections are delimited by different keywords. The length and presence of the string does not affect the columns of the format specification.

On output: Write out the information in the format for every atom in the list. The string does not automatically appear in the line being written but may be output using a MARKER field in the format.

FORMAT_TO_EOF

On input: Read input lines using the format until the end of the input file is encountered. This type of read should be the last in a format description file, since all subsequent reads will fail.

On output: Write out the information in the format for every atom in the list.

END_FORMAT

On input: Marks end of a format specification.

On output: Marks end of a format specification.

BIDIRECTIONAL_BONDS

On input: Lets the system know that all bonds are listed twice, once in each direction. Insight .mdf files use this convention.

On output: Lets the system know that all bonds are listed twice, once in each direction. Insight .mdf files use this convention.

Fields

Fields are described by a keyword corresponding to some piece of information about the atom or molecule, a start position in the input line where this information is found, and a field length indicating how many characters are to be read/written for this field. The start position and/or field length may be an asterisk indicating space/comma-delimited fields. Floating-point fields such as ATOM_X may have an optional number of decimal places in their length specifier. The format is length.decimal_length.

The regular field types are described in Table 6.

Table 6. Field Type

Field Name Description Type Examples
ATOM_NAME   Name of atom (5 characters max)   Can be referenced later to add more information to an already defined atom. It is therefore important that atom names be unique.   (string)   CD1,N    
ATOM_NUMBER   User defined atom number, not necessarily the same as the Insight sequence number. Can be referenced later, for example when processing a connectivity section. If no atom numbers are defined they are set to the sequence numbers.   (integer)   1,12,32  
ATOM_X   X coordinate of atom.   (float)    
ATOM_Y   Y coordinate of atom.   (float)    
ATOM_Z   Z coordinate of atom.   (float)    
CELL_A,CELL_B,CELL_C   Unit cell dimensions.   The presence of any of these fields causes subsequent atom coordinates to be read or written as fractional space coordinates.   (float)    
ALPHA,BETA, GAMMA   Unit cell angles in degrees.   The presence of any of these fields causes subsequent atom coordinates to be read or written as fractional space coordinates.   (float)    
BOND_ORDER   Bond order for a corresponding bond. The codes used to indicate various types of bonds may be defined in an optional BOND_TABLE.   (integer)    
BOND_FROM_NAME   Name of the first of a pair of atoms to be connected. It has the special significance of advancing to the next bond record when encountered during output.   (string)    
BOND_FROM_NUMBER   Number of the first of a pair of atoms to be connected. It has the special significance of advancing to the next bond record when encountered during output.   (integer)    
BOND_NUMBER   Sequential number for each bond entry list. No function on input.   (integer)    
BOND_TO_NAME   Name of the second of a pair of atoms that are to be connected.   (string)   HD1  
BOND_TO_NUMBER   Number of the second of a pair of atoms to be connected.   (integer)    
CHARGE   Atom charge.   (float)    
ELEMENT_NAME   Element type of an atom. Converted to an element code by Insight.   (string)   C,H,Ca,Br  
ELEMENT_NUMBER   Periodic table index of the element type of an atom.   (integer)    
GROUP   Charge group name.   (string)    
OCCUPANCY   Occupancy factor for atom.   (float)    
POTENTIAL_TYPE   Potential atom type for an atom (7 characters max).   (string)   c=,hs  
RESIDUE_TYPE   The type of the current monomer/residue (4 characters max).   (string)   GLY,ARGn  
RESIDUE_NUMBER   The monomer/residue sequence number of the current monomer BOND_FROM_NAME residue including optional chain code alternate sequence indicator. This is also called the monomer/residue name in Insight parlance (7 characters max).   (string)   1,A12,C172A  
SPACE_GROUP   The name of the crystallographic space group for the molecule. The presence of any of these fields causes subsequent atom coordinates to be read or written as fractional space coordinates.   (string)   P 1, C m c 2_1, R 3b  
TEMP_FACTOR   Temperature factor for atom.   (float)    
TITLE   Title for the system.   (string)    
There are several special fields:  
DEFINE_ATOM   Used to indicate that this format should cause a new atom to be created each time the format is executed. This would usually be true for the first format of a format file but then not used in subsequent sections, such as connectivity, that refer to already defined atoms.   Note: It is a common error to neglect to include DEFINE_ATOM in at least one format.  
NEXT_LINE   This means we are to skip to the next line and read it using the fields defined in the rest of the format. On output a new line is started, and output of the subsequent fields is on that line. There can be any number of NEXT_LINEs in a format.  
NEXT_ATOM   On input:   This means we are to skip to the next atom and save further information in it. This could involve creating a new atom if we are executing a DEFINE_ATOM format but it is more geared to situations where you might have 5 charges per line.   On output:   Skip to the next atom and take any further output from it. Terminate execution of format if we run off the end of the atom list.  
NUM_ATOMS   Variable to hold the number of atoms. Once read it can be used as $NUM_ATOMS in a FORMAT_FOR command. When part of a format during output, the number of selected atoms is written.  
NUM_BONDS   Variable used to hold the total number of connections defined. It is often used to control the number of reads in the connectivity section. On output it is the sum of all the connections for all selected atoms, each bond being counted only once.  
SPACES n   For doing the equivalent of the X format in FORTRAN. Advances the current character position in the input/output line by the number of spaces specified. This has no effect if the next field has an absolute start column. SPACES is useful for implementing FORTRAN formats in the following style:   (I4,1X,A4,1X,1X,3(F9.5,1X).....   is written as:   ATOM_NAME * 4
SPACES 1
ATOM_NUMBER * 4
SPACES 2
ATOM_X * 9
SPACES 1  

Format Descriptions

A format description consists of one of the format commands, followed by a variable number of field definitions and terminated by an END_FORMAT command. When processing a format there is a notion of current position which is important for delimited reading. The initial position is at the start of the input line. As the program reads each of the defined fields, it starts at either the current position, if the start of the field is *, or moves to the given column if one is supplied. If the field length is *, then reading continues until a delimiter is encountered (if a non-digit is found in an integer or float field, reading stops there as if it were a delimiter). If the field length is explicitly given then reading continues for exactly that many characters or until end of line, whichever happens first.

Bond Code Table

To accommodate a wide variety of bond order representations, the free format utility allows definition of a bond code table. This table allows you to associate the bond order codes of the file format being read/written with the Insight II bond orders. A table giving the code for any or all of the Insight II bond orders (SINGLE, DOUBLE, TRIPLE, PART_DOUBLE) is given in the following example:


BOND_TABLE
PART_DOUBLE 1.5
TRIPLE 4
END_TABLE
When using this example during input, a BOND_ORDER field with the value of 1.5 is interpreted as a partial double bond. On output the BOND_ORDER for a triple bond is written as 4.

If no bond table is given, the default bond order codes are:


1=SINGLE
2=DOUBLE
3=TRIPLE
4=PART_DOUBLE

Fractional Coordinates

The free format utility reads and writes atom coordinates as fractional space coordinates if any of the following fields are encountered in the format file: CELL_A, CELL_B, CELL,C, ALPHA, BETA, GAMMA, or SPACE_GROUP. If none of these field types are encountered, Cartesian space coordinates are assumed when reading and writing ATOM_X, ATOM_Y, and ATOM_Z fields.

Atom Creation vs. Referencing Existing Atoms

When to create a new atom is based on whether a format contains a DEFINE_ATOM statement. Every execution of a format containing DEFINE_ATOM creates a new entry in the atom list. Subsequent values from the fields of the format are saved in that atom. When no DEFINE_ATOM is present in a format, then each time an ATOM_NAME or ATOM_NUMBER field is read it is taken as a reference to an existing atom. This is why uniqueness of atom names and numbers can be important.

Important in this system is the notion of current atom. When you begin processing a file there are no atoms defined and hence the current atom is null. After processing a format in which atoms were defined, the atom list is non-empty. Before starting to process a subsequent format, set the current atom to the first atom of the atom list. Then after every application of the format you advance the current atom to the next in the list. This automatic stepping down the atom list provides for an implicit correspondence between different sections of an input file. The most common example is a file that has an atom definition section, and then a connectivity section where the lines correspond sequentially to the atoms in the atom section. When atom names or numbers are explicitly specified, an attempt is made to find that atom in the existing atom list and make it the current atom.

These rules are:


If there is a DEFINE_ATOM in the format
{
create a new atom and make it the current atom
}
else if there is an atom number/name in format
{
find the specified atom and make it the current atom
}
Add the fields read to the current atom

Advance the current atom to next in the list

Sample Free-Format Files

syblike Example

Following is the sample format file syblike.frm, for doing free_format input/output of syblike files.


# SYBLIKE.FRM
# Format file for doing free_format input/output of syblike files
#
IGNORE_WHILE "*"
# interpret bond order of 5 as partial double
BOND_TABLE
PART_DOUBLE 5
END_TABLE
# number of atom records
FORMAT_FOR 1
NUM_ATOMS 1 4
MARKER 6 "MOL"
TITLE 12 100
END_FORMAT
# atom records
FORMAT_FOR $NUM_ATOMS
DEFINE_ATOM
ATOM_NUMBER 1 4
ELEMENT_NUMBER 5 4
ATOM_X 9 9
ATOM_Y 18 9
ATOM_Z 27 9
ATOM_NAME 36 4
END_FORMAT
IGNORE_WHILE "*" # number of bond records
FORMAT_FOR 1
NUM_BONDS 1 4
MARKER 6 "MOL"
END_FORMAT
# bond records
FORMAT_FOR $NUM_BONDS
BOND_NUMBER * 4
BOND_FROM_NUMBER * 4
BOND_TO_NUMBER * 4
SPACES 9
BOND_ORDER * 4
END_FORMAT
MARKER "0 MOL"

chemdlike Example

Following is the sample format file chemdlike.frm, for free format input/output of chemdlike files:


# CHEMDLIKE.FRM
# chemdlike format file for free format input/output
#
BIDIRECTIONAL_BONDS
LINE_LENGTH 85
IGNORE_WHILE "*" # cell parameters
FORMAT_FOR 1
CELL_A 39 8
CELL_B 47 8
CELL_C 55 8
END_FORMAT
FORMAT_FOR 1
ALPHA 22 8
BETA 30 8
GAMMA 38 8
END_FORMAT
# number of atoms
FORMAT_FOR 1
NUM_ATOMS 1 4
TITLE 10 60
END_FORMAT
MARKER " Title2 not used"
FORMAT_FOR $NUM_ATOMS
DEFINE_ATOM
# special marker string to put 0's in all bond_to fields that
# will not be filled with actual bonds
MARKER 42 " 0 0 0 0 0 0 0 0"
ATOM_NUMBER 1 4
ATOM_NAME 6 4
ATOM_X 12 9
ATOM_Y 22 9
ATOM_Z 32 9
BOND_TO_NUMBER 42 4
BOND_TO_NUMBER * 4
BOND_TO_NUMBER * 4
BOND_TO_NUMBER * 4
BOND_TO_NUMBER * 4
BOND_TO_NUMBER * 4
BOND_TO_NUMBER * 4
BOND_TO_NUMBER * 4
CHARGE 75 7.3
# marker for atom group field
MARKER 85 "1"
END_FORMAT

pdblike Example

Following is the sample format file pdblike.frm, for doing free format input/output of pdblike files:


# PDBLIKE.FRM                                  Revised 7/13/89
# Format file for doing free format input/output of pdblike files
#
# NOTE: Since the connectivity section may contain lines with
# fewer bonds than the possible four, there may be messages about
# inability to find atoms to connect to.
IGNORE_TO "ATOM" FORMAT_WHILE "ATOM"
DEFINE_ATOM
MARKER 1 "ATOM"
ATOM_NUMBER 7 5
ATOM_NAME 14 3
RESIDUE_TYPE 18 3
RESIDUE_NUMBER 23 4
ATOM_X 31 8.3
ATOM_Y * 8.3
ATOM_Z * 8.3
END_FORMAT
MARKER "TER" # PDB files specify bonds twice, once in each direction, so we
# need to set the bidirectional bonds flag
BIDIRECTIONAL_BONDS
FORMAT_WHILE "CONECT"
MARKER 1 "CONECT"
BOND_FROM_NUMBER 7 5
BOND_TO_NUMBER * 5
BOND_TO_NUMBER * 5
BOND_TO_NUMBER * 5
BOND_TO_NUMBER * 5
END_FORMAT

mdllike Example

Following is the sample format file mdllike.frm, for doing free format input/output of mdllike files:


# MDLLIKE.FRM
# Format file for doing free format input/output of mdllike files
FORMAT_FOR 1
TITLE 1 80
END_FORMAT
#Molecule Header
MARKER ""
#Comments
MARKER "File Written using Insight Free Format Output"
#number of atoms and bonds
FORMAT_FOR 1
NUM_ATOMS 1 3
NUM_BONDS * 3
END_FORMAT
#atom records
FORMAT_FOR $NUM_ATOMS
DEFINE_ATOM
ATOM_X * 10.4
ATOM_Y * 10.4
ATOM_Z * 10.4
SPACES 1
ELEMENT_NAME * 3
#NOTE: we cannot do the charges because they are coded in a non-
#standard way
END_FORMAT;
#bond records
FORMAT_FOR $NUM_BONDS
BOND_FROM_NUMBER * 3
BOND_TO_NUMBER * 3
BOND_ORDER * 3.0
END_FORMAT


Standard Graph Definition File (.grf)

This section provides a description of the file format needed for the creation of standard graphs using the Graph/Get command.

As with graph files, you may include any number of comment lines at the top of the file. You may define as many graphs as you like, but remember that only nine graphs fit on the screen without overlapping one another.

Each graph may define multiple plots and may define the title of the graph. Each plot may define the color to be used, the point connection, and the symbol to use if points are to be displayed.

Each element of the graph definition is identified by a string. GRAPH indicates a new graph. The string TITLE is optional. If you want to give the graph a title, enter TITLE on the line following GRAPH. Follow TITLE with a space and then the actual title.

PLOT indicates a new plot. As mentioned above, a graph definition may contain several plot definitions. For each plot you may optionally specify:

If you specify BAR ON and CONNECTION OFF and specify a point symbol, the points symbols are not displayed. This is because individual points are simply not drawn if a bar display is used.

Following the optional display definitions, the x, y, and optionally z, functions are defined using the keywords X FUNCTION, Y FUNCTION, and Z FUNCTION, followed by the name of the function for that axis.

Below is the order in which each graph element definition should occur and which elements are optional:

GRAPH <required>
TITLE <optional>
PLOT <required>
COLOR <optional>
BAR <optional>
DEPENDENT AXIS <required, but only if BAR is ON>
SCALE <optional, but only if BAR is specified>
CONNECTION <optional>
POINT SYMBOL <optionally specified if CONNECTION is OFF>
SCALE <optional, but only if POINT SYMBOL is specified>
X FUNCTION <required>
Y FUNCTION <required>
Z FUNCTION <optional>

Blank lines may occur only in the comments at the top. Graph and plot definitions may not contain or be separated by any blank lines. If any required elements are missing or in the wrong order an error declaring a bad file format is displayed.

For standard graphs, all functions given in the .grf must be contained within the graph data file (.tbl). If a specific function in the standard graph definition file (.grf) cannot be located in the graph data file (.tbl), an error does not occur, but an informational message is displayed and the plot is not created.

Sample Graph Definition File

This sample of a standard graph file defines four graphs.

The first is 2D, with only one plot and a title. Notice that you may optionally specify a Z Function.

The second is 3D, with only one plot. This graph definition accepts the default color and point connection attributes.

The third graph defines two plots, and each is 2D. The first plot defines RED to be the color. The second plot defines the color to be BLUE (RED and BLUE are hues; see the Graph/Color commands description), and specifies that only the points should be displayed (CONNECTION is OFF) using the TRIANGLE symbol.

The next graph defines two plots, the first 3D and the second 2D. In the first, only points are displayed using the BOX symbol. The color uses an RGB specification; in this case yellow. In the second plot, both lines and points are displayed (if not specifically turned off, CONNECTION is ON). The color is light blue, the point symbol is a STAR, the scale of the points is 4.0.

The last graph defines a single 2D plot. The color is yellow, with a bar display, and Y specified as the independent axis.


GRAPH	! First graph !
TITLE Sample 1
PLOT ! Only plot in first graph !
X FUNCTION function_1
Y FUNCTION function_2
GRAPH ! Second graph !
PLOT ! Only plot in second graph!
X FUNCTION time
Y FUNCTION energy
Z FUNCTION None
GRAPH ! Third graph !
PLOT ! First plot in third graph !
COLOR RED
X FUNCTION function_a
Y FUNCTION function_b
PLOT ! Second plot in third graph !
COLOR BLUE
CONNECTION off
POINT SYMBOL TRIANGLE
X FUNCTION function_c
Y FUNCTION function_c
GRAPH ! Fourth graph !
TITLE Sample
PLOT ! First plot in fourth graph !
COLOR 255,255,0
CONNECTION OFF
POINT SYMBOL BOX
X FUNCTION function_a
Y FUNCTION function_b
Z FUNCTION function_c
PLOT ! Second plot in fourth graph !
COLOR 0,255,255
POINT SYMBOL STAR
SCALE 4.0
X FUNCTION function_1
Y FUNCTION function_2
GRAPH ! Last graph !
PLOT ! Only plot in last graph !
COLOR YELLOW
BAR ON
DEPENDENT AXIS Y
X FUNCTION function_3
Y FUNCTION function_4


Hessian Files (.hessian, .hessianx, .xhessian)

The .hessian file contains the data for sets of gradients.

Following a successful completion, the finite-difference data are used to generate a second-derivative matrix. This is mass weighted and diagonalized to generate the harmonic vibrational spectrum. The second-derivative matrix (not mass-weighted) is appended to the .hessian file. Following the data for the last displacement, the flag matrix appears, followed by the lower triangle of elements of the second-derivative matrix. These data are in 5f12.7 format:


HESSIAN
H(1,1)
H(2,1) H(2,2)
H(3,1) H(3,2) H(3,3)
The data continue to H(3N,3N), where N is the number of atoms.

The Discover program can output .hessian files, and the quantum programs produce and/or use files having "hessian" as part or all of their suffix.

The .hessian suffix indicates an ASCII Hessian in Discover format, and .hessianx, an ASCII Hessian in Turbomole format. Zindo, DMol and Turbomole can read both .hessian and .hessianx formats as input. Files of type .xhessian (XDR format) are no longer produced by the quantum programs (however, the quantum programs can still read them).

The following Hessian files are produced by quantum runs:

product calculation type Hessian file type
DMol   optimization   .hessian  
  frequency   .hessian  
Turbomole   optimization   .hessian  
  frequency   .hessianx  
Zindo   optimization   .hessian  
  frequency   .hessian  

Sample .hessian FIle


$hessian
1 1 0.6780639398 0.0000000000 0.0000000000 -0.1259825011 0.0000000000
1 2 0.0000000000 -0.2760407194 0.0000000000 -0.0947402056 -0.2760407194
1 3 0.0000000000 0.0947402056
2 1 0.0000000000 0.2160004237 0.0000000000 0.0000000000 -0.0719526695
2 2 0.0000000000 0.0000000000 -0.0720238769 0.0000000000 0.0000000000
2 3 -0.0720238769 0.0000000000
3 1 0.0000000000 0.0000000000 1.2506493403 0.0000000000 0.0000000000
3 2 -1.0175861358 -0.0926737292 0.0000000000 -0.1165316022 0.0926737292
3 3 0.0000000000 -0.1165316022
4 1 -0.1259825011 0.0000000000 0.0000000000 0.0877428007 0.0000000000
4 2 0.0000000000 0.0191198500 0.0000000000 -0.0434219786 0.0191198500
4 3 0.0000000000 0.0434219786
5 1 0.0000000000 -0.0719526695 0.0000000000 0.0000000000 0.0239681673
5 2 0.0000000000 0.0000000000 0.0239922511 0.0000000000 0.0000000000
5 3 0.0239922511 0.0000000000
6 1 0.0000000000 0.0000000000 -1.0175861358 0.0000000000 0.0000000000
6 2 1.1102928577 -0.0172186375 0.0000000000 -0.0463533610 0.0172186375
6 3 0.0000000000 -0.0463533610
7 1 -0.2760407194 0.0000000000 -0.0926737292 0.0191198500 0.0000000000
7 2 -0.0172186375 0.2759933671 0.0000000000 0.1240272754 -0.0190724977
7 3 0.0000000000 -0.0141349086
8 1 0.0000000000 -0.0720238769 0.0000000000 0.0000000000 0.0239922511
8 2 0.0000000000 0.0000000000 0.0240124468 0.0000000000 0.0000000000
8 3 0.0240191791 0.0000000000
9 1 -0.0947402056 0.0000000000 -0.1165316022 -0.0434219786 0.0000000000
9 2 -0.0463533610 0.1240272754 0.0000000000 0.1525595219 0.0141349086
9 3 0.0000000000 0.0103254413
10 1 -0.2760407194 0.0000000000 0.0926737292 0.0191198500 0.0000000000
10 2 0.0172186375 -0.0190724977 0.0000000000 0.0141349086 0.2759933671
10 3 0.0000000000 -0.1240272754
11 1 0.0000000000 -0.0720238769 0.0000000000 0.0000000000 0.0239922511
11 2 0.0000000000 0.0000000000 0.0240191791 0.0000000000 0.0000000000
11 3 0.0240124468 0.0000000000
12 1 0.0947402056 0.0000000000 -0.1165316022 0.0434219786 0.0000000000
12 2 -0.0463533610 -0.0141349086 0.0000000000 0.0103254413 -0.1240272754
12 3 0.0000000000 0.1525595219
$hessian (projected)
1 1 0.6780518841 0.0000000000 0.0000000000 -0.1259711247 0.0000000000
1 2 0.0000000000 -0.2760403797 0.0000000000 -0.0947424832 -0.2760403797
1 3 0.0000000000 0.0947424832
2 1 0.0000000000 0.2159909630 0.0000000000 0.0000000000 -0.0719448964
2 2 0.0000000000 0.0000000000 -0.0720230333 0.0000000000 0.0000000000
2 3 -0.0720230333 0.0000000000
3 1 0.0000000000 0.0000000000 1.2506493402 0.0000000000 0.0000000000
3 2 -1.0175861358 -0.0926737292 0.0000000000 -0.1165316022 0.0926737292
3 3 0.0000000000 -0.1165316022
4 1 -0.1259711247 0.0000000000 0.0000000000 0.0877339193 0.0000000000
4 2 0.0000000000 0.0191186027 0.0000000000 -0.0434183083 0.0191186027
4 3 0.0000000000 0.0434183083
5 1 0.0000000000 -0.0719448964 0.0000000000 0.0000000000 0.0239642810
5 2 0.0000000000 0.0000000000 0.0239903077 0.0000000000 0.0000000000
5 3 0.0239903077 0.0000000000
6 1 0.0000000000 0.0000000000 -1.0175861358 0.0000000000 0.0000000000
6 2 1.1102928578 -0.0172186375 0.0000000000 -0.0463533610 0.0172186375
6 3 0.0000000000 -0.0463533610
7 1 -0.2760403797 0.0000000000 -0.0926737292 0.0191186027 0.0000000000
7 2 -0.0172186375 0.2759938209 0.0000000000 0.1240265791 -0.0190720439
7 3 0.0000000000 -0.0141342123
8 1 0.0000000000 -0.0720230333 0.0000000000 0.0000000000 0.0239903077
8 2 0.0000000000 0.0000000000 0.0240163628 0.0000000000 0.0000000000
8 3 0.0240163628 0.0000000000
9 1 -0.0947424832 0.0000000000 -0.1165316022 -0.0434183083 0.0000000000
9 2 -0.0463533610 0.1240265791 0.0000000000 0.1525603398 0.0141342123
9 3 0.0000000000 0.0103246234
10 1 -0.2760403797 0.0000000000 0.0926737292 0.0191186027 0.0000000000
10 2 0.0172186375 -0.0190720439 0.0000000000 0.0141342123 0.2759938209
10 3 0.0000000000 -0.1240265791
11 1 0.0000000000 -0.0720230333 0.0000000000 0.0000000000 0.0239903077
11 2 0.0000000000 0.0000000000 0.0240163628 0.0000000000 0.0000000000
11 3 0.0240163628 0.0000000000
12 1 0.0947424832 0.0000000000 -0.1165316022 0.0434183083 0.0000000000
12 2 -0.0463533610 -0.0141342123 0.0000000000 0.0103246234 -0.1240265791
12 3 0.0000000000 0.1525603398
$end


Dynamics Trajectory History Files (.his and .fhis)

Special Information for the Discover 2.9.x Program

There are two forms of the Discover 2.9.x history file, .his and .fhis.

.his is the file to which the dynamics history is periodically written during a Discover 2.9.x dynamics calculation. It is a binary file, and for a reasonable-length dynamics run it can become fairly large. It contains coordinates and other pertinent information for the system being simulated. The frequency with which this file is updated can be modified with the initialize and restart dynamics commands of the Discover 2.9.x program (see the Discover User Guide).

The .his file is written using FORTRAN unformatted I/O with the records described in Table 7. For each record, the types of the variables and the lengths of the arrays, if applicable, are given. The first frame contains extra information about the atom types, movable atoms, etc.; subsequent frames contain only the changing information--coordinates, velocities, etc.

Table 7. Format of .his File (Page 1 of 4)

record type array contents
1   integer     control variable: 0 for first frame
not 0 for subsequent frames  
2   character*4   20   character string giving version information  
  real*8     control variable: the Discover version (Vershn)  
3   character*4   20   title  
4   character*4   20   title  
5   integer     number of forcefield atom types (NAtTyp)  
  character*4   NAtTyp   names of forcefield atom types  
  real*8   NAtTyp   atomic masses of forcefield atom types  
6   integer     number of residue names (NNmRes)  
  character*4   NNmRes   names of residues  
7   integer     number of atoms in the system (NAtoms)  
  integer   NAtoms   index of atom's forcefield atom type  
  character*4   NAtoms   name of atom (for Vershn < 2.9.0)  
  character*5   NAtoms   name of atom (for Vershn 2.9.0)  
8   integer     reserved  
  integer     number of moveable atoms in the system (NAtMov)  
  integer   NAtMov   index of moveable atoms (for Vershn 2.6, not present prior to that)  
9   integer     number of molecules (NMol)  
  integer   NMol   number of atoms per molecule  
  integer   NMol   number of residues per molecule  
10   integer     total number of residues (NRes)  
  integer   2,NRes   first and last atoms in each residue  
  integer   NRes   index into names of residues  
11A   integer     number of bonds (NBonds)  
11B   integer   2,NBonds   I and J atoms for each bond (this record exists only if NBonds is > 0)  
12   real*8   6   unit cell parameters: a, b, c, alpha, beta, gamma  
  real*8   3,3   lattice vectors  
  real*8   3,3   transformation matrix from crystal to Cartesian coordinates  
  real*8   3,3   transformation matrix from Cartesian to crystal coordinates  
  real*8   3,3,196   matrices for space group symmetry operators  
  real*8   3,196   translation vector for each operator in the space group  
  real*8   3,3,196   rotation matrix for each operator  
  integer     number of symmetry operations in the space group  
  real*8   4   reserved  
  integer     reserved  
  real*8   6   reserved  
  integer   6   reserved  
13   integer     number of component energies (NEner)  
  real*8     time step in fs  
  integer     frequency (in steps) for writing the frames  
  integer     starting step number  
14   real*8     total energy (kcal mol-1)  
  real*8     total potential energy (kcal mol-1)  
  real*8     total kinetic energy (kcal mol-1)  
  real*8   NEner   component energies (kcal mol-1)  
  real*8   NMol   potential energy per molecule (kcal mol-1)  
  real*8   NMol,NEner   component energies per molecule (kcal mol-1)  
  real*8   NMol   van der Waals dispersion energy per molecule  
  real*8   NMol   van der Waals repulsion energy per molecule  
  real*8   NMol   van der Waals energy per molecule (kcal mol-1)  
  real*8   NMol   coulombic energy per molecule (kcal mol-1)  
  real*8     pressure in bar  
  real*8     reserved  
  real*8   3x3   pressure tensor in bar  
  real*8   3x3   reserved  
  real*8   3x3   kinetic energy contribution to pressure  
  real*8   3x3   reserved  
  real*8   3x3   virial contribution to the pressure  
  real*8   3x3   reserved  
15   real*4   3,NAtoms   Cartesian coordinates of the atoms in angstroms  
16   real*4   3,NAtoms   Cartesian velocities of the atoms in angstroms per timestep  
Subsequent frames repeat the following records:  
N   integer     Control variable: 0 for first frame
not 0 for subsequent frames  
N+1   real*8     total energy (kcal mol-1)  
  real*8     total potential energy (kcal mol-1)  
  real*8     total kinetic energy (kcal mol-1)  
  real*8   NEner   component energies (kcal mol-1)  
  real*8   NMol   potential energy per molecule (kcal mol-1)  
  real*8   NMol,NEner   component energies per molecule (kcal mol-1)  
  real*8   NMol   van der Waals dispersion energy per molecule  
  real*8   NMol   van der Waals repulsion energy per molecule  
  real*8   NMol   van der Waals energy per molecule (kcal mol-1)  
  real*8   NMol   coulombic energy per molecule (kcal mol-1)  
  real*8     pressure in bar  
  real*8     reserved  
  real*8   3x3   pressure tensor in bar  
  real*8   3x3   reserved  
  real*8   3x3   kinetic energy contribution to pressure  
  real*8   3x3   reserved  
  real*8   3x3   virial contribution to the pressure  
  real*8   3x3   reserved  
N+2   real*8   6   unit cell parameters: a, b, c, alpha, beta, gamma  
  real*8   3,3   lattice vectors in angstroms  
In the following two records, prior to version 2.6 data was written for all atoms (N = NAtoms); for version 2.6 or later only the coordinates and velocities for moving atoms are present (N = NMovAt).  
N+3   real*4   3,N   Cartesian coordinates in angstroms  
N+4   real*4   3,N   Cartesian velocities in angstroms per timestep  

.fhis is a formatted ASCII version of the .his file. The .fhis file is created from the .his file by the utility formhis and can be reconverted into an unformatted history file with the utility uformhis. The .fhis file is a text file that can be viewed and edited. It is also independent of a particular machine's representation of numbers and so can be transferred between dissimilar computers. The file is written using FORTRAN formatted I/O. Table 8 shows the FORTRAN format used in creating the .fhis file. A format that is enclosed in parentheses and preceded by a number indicates that the information is on more than one line, each of which has the indicated format. The number indicates the number of lines.

Table 8. Format of .fhis File (Page 1 of 4)

record format contents
1   I1   control variable: 0 for first frame
not 0 for subsequent frames  
2   20A4,F4.2   character string identifying the version control variable: the Discover version (Vershn)  
3   20A4   title  
4   20A4   title  
5   9I5   number of forcefield atom types (NAtTyp)
number of residue names (NNmRes)
number of atoms (NAtoms)
reserved
number of moveable atoms (NAtMov)
number of molecules (NMol)
number of residues (NRes)
number of bonds (NBonds)
number of space group symmetry operations (NSymOp)  
6   NAtTyp(A4,F10.6)   name and atomic mass for each forcefield atom type  
7   NNmRes(A4)   name of each residue  
8   NAtoms(I3,A4)   index of forcefield type and name for each atom (for Vershn 2.9.0 the format is (I3,A5))  
9   NMol(2I5)   number of atoms and residues for each molecule  
10   NRes(3I5)   first and last atom and index of name for each residue  
11   NBonds(2I5)   I and J atoms for each bond  
Record 12 is present only if Vershn is greater than or equal to 2.6  
12   NAtMov(I5)   indices of the moving atoms  
13   11(3E14.8)   unit cell parameters: a, b, c, alpha, beta, gamma
unit cell vectors (3x3 matrix)  
transformation matrix cell coordinates to Cartesian coordinates (3x3 matrix)   transformation matrix Cartesian coordinates to cell coordinates (3x3 matrix)  
Records 14-16 are present only if the calculation uses periodic boundary conditions (PBC), in which case NSymOp is greater than 0.  
14   NSymOp(9F5.2)   matrices for space group symmetry operators  
15   NSymOp(3F5.2)   translation vector for each operator in the space group  
16   NSymOp*3(3E14.8)   rotation matrix for each operator  
17   3E14.8   reserved (3 long real*8 vector)  
18   E14.8   reserved  
19   2(3E14.8)   reserved (6 long real*8 vector)  
20   7I3   reserved (7 long integer vector)  
21   3I10,F6.2   number of component energies (NEner)
frequency (in steps) for writing frames
initial step number
timestep in fs  
22   3E14.8   total energy, potential energy, and kinetic energy  
23   NEner(E14.8)   component energies  
24   NMol*NEner(E14.8)   component energies for each molecule (the index of the molecules runs fastest; thus, the list of the first component energies for each molecule comes first, then for the second component energy)  
25   NMol(5E14.8)   the total, dispersion, repulsion, van der Waals, and electrostatic energies for each molecule  
26   E14.8   the pressure for PBC calculations (in bar)  
27   E14.8   reserved (real*8 number)  
28   3(3E14.8)   pressure tensor (3x3 matrix) in bar  
29   3(3E14.8)   reserved (3x3 matrix)  
30   3(3E14.8)   kinetic energy contribution to the pressure (3x3 matrix)  
31   3(3E14.8)   reserved (3x3 matrix)  
32   3(3E14.8)   virial contribution to the pressure (3x3 matrix)  
33   3(3E14.8)   reserved (3x3 matrix)  
34   Natoms(3E14.8)   x, y and z coordinates for each atom  
35   NAtoms(3E14.8)   x, y and z velocities for each atom in angstroms per timestep  
Subsequent frames repeat the following records:  
N   I1   control variable: 0 for first frame
1 for subsequent frames  
N+1   3E14.8   total energy, potential energy, and kinetic energy  
N+2   NEner(E14.8)   component energies  
N+3   NMol*NEner(E14.8)   component energies for each molecule (the index of the molecules runs fastest; thus, the list of the first component energies for each molecule comes first, then for the second component energy)  
N+4   NMol(5E14.8)   the total, dispersion, repulsion, van der Waals, and electrostatic energies for each molecule  
N+5   E14.8   the pressure for PBC calculations (in bar)  
N+6   E14.8   reserved (real*8 number)  
N+7   3(3E14.8)   pressure tensor (3x3 matrix) in bar  
N+8   3(3E14.8)   reserved (3x3 matrix)  
N+9   3(3E14.8)   kinetic energy contribution to the pressure (3x3 matrix)  
N+10   3(3E14.8)   reserved (3x3 matrix)  
N+11   3(3E14.8)   virial contribution to the pressure (3x3 matrix)  
N+12   3(3E14.8)   reserved (3x3 matrix)  
N+13   2(3E14.8)   unit cell parameters: a, b, c, alpha, beta, gamma  
N+14   3(3E14.8)   unit cell vectors  
In the following two records, prior to Discover version 2.6, data was written for all atoms (N = NAtoms); for version 2.6 and later, the coordinates and velocities are present only for moving atoms (N = NMovAt).  
N+15   N(3E14.8)   x, y and z coordinates for each atom  
N+16   N(3E14.8)   x, y and z velocities for each atom in angstroms per timestep  
For Discover version 2.6 and later there are 18 component energies:  
  1   bond  
  2   angle  
  3   torsion  
  4   out-of-plane  
  5   bond-bond  
  6   bond-angle  
  7   angle-angle  
  8   bond-torsion  
  9   angle-torsion  
  10   angle-angle-torsion  
  11   1-3 bond-bond  
  12   out-of-plane-out-of-plane  
  13   torsion-torsion  
  14   total van der Waals  
  15   van der Waals repulsion  
  16   van der Waals attraction (dispersion)  
  17   electrostatic  
  18   10-12 hydrogen bond  

Special Information for the Discover 98.0/3.0.0 Program

The Discover 98.0/3.0.0 program typically sends information during dynamics runs to .arc, .out, tbl, and/or user-named files--See the Insight online help and the discussion of the output command of the Discover 2.9.7/98.0/3.0.0 User Guide for information on controlling what the Discover 98.0/3.0.0 program includes in these files and how often information is output during a dynamics run.

However, the Discover 98.0/3.0.0 program can read and write history files. These files are in the same format as those of Discover 2.9.x and can be read by the Insight program. (Some fields are set to 0.0, however, since certain data are not stored by the Discover98.0/3.0.0 program.) The history file is written by using a print command during a minimization or dynamics simulation.

The readFile command may be used to read a particular frame of a history file into the Discover program. In this way a history file might be converted into an archive file, for instance, by using the writeFile archive command. The return value of the readFile command, when it is applied to a history or archive file, is the potential energy of that frame. This would allow you to, for instance, construct scripts that sort the frames in an archive or history file based on energy.


Layout Template File (.ltpl)

The layout template file contains descriptions of one or more layout templates. These templates describe the relative sizes and positions of windows in a window layout.

A system layout template file that contains simple default templates is read when Insight II starts up. These templates are therefore automatically available in every session. If you create some layout templates that you want to make generally available to all users at your site, you can add them to this file and they will be present each time Insight II is run. The pathname of this file is:

$BIOSYM/data/insight/insight.ltpl

The layout template file is a free-format file. Blank lines are ignored, and any line that begins with the character "!" is considered a comment and is also ignored.

The first line must contain the header !BIOSYM layout_template 1.

The next lines contain definitions of layout templates, each with the following format.

For Free_format templates the next lines contain template entry definitions, each with the following format.

For Stacked templates, the next line contains the keyword Stack_offset: followed by the number of pixels the window is to be offset from the top left corner of the preceding window.

Sample Layout Template File


!BIOSYM layout_template 1

Layout_template:SIDE_BY_SIDE
Layout_template_type:Free_format

Layout_template_entry:
Left:0.000000
Right:50.000000
Top:0.000000
Bottom:100.000000

Layout_template_entry:
Left:50.000000
Right:100.000000
Top:0.000000
Bottom:100.000000

Layout_template:STACKED
Layout_template_type:Stacked
Stack_offset:30


Exclusion Shell File (.ludi_pseudo_protein, fort.12)

This file describes the exclusion shell that Ludi constructs from the active analogs. No fragment will be fit outside of this shell. The file is in PDB format and can be read into Insight II by turning the Load_Pseudo_Protein parameter on in the Ludi/Load command. The fort.12 file produced by the Ludi/Run background job is automatically renamed to <run_name>.ludi_pseudo_protein when the background job completes.


Molecular Data File (.mdf)

The molecular data (.mdf) file contains static information about a molecular system. This is information that does not change during the course of a calculation.

Note that .mdf files are used by several Insight II products; therefore, some of the information present may be ignored by some programs or used only by certain programs.

The molecular data file has been changed minimally since the previous versions of the Discover and Insight programs. The primary change is that the potential type identifier can now be longer (up to seven characters).

Note that the order of connections listed in the .mdf file is important for atoms whose out-of-plane (oop) flag is 2 or for which chirality information is given. Therefore, these connections must not be reordered.

The .mdf file consists of one header, one end statement, and three main sections:

1.   Topology section (see page 51)

2.   Symmetry section (see page 55)

3.   Atomset section (see page 60)

The sections begin and end with the character #. The order of the sections is not important, and unneeded sections can be omitted. Records within the sections begin with keyword identifiers that start with @.

In addition, comment records, beginning with !, are allowed.

The overall structure of an .mdf file is shown in Table 9. Descriptions and examples of each major part follow.

Table 9. Structure of an .mdf File

section record where
described
contents
<first line>   !header   page 49   file identifier  
<any line>   !comment   page 51   optional comments  
#topology     page 51   general topology of the molecule  
  @column   page 51   column headers for types of data contained in atom records  
  @molecule name   page 52   molecule identifier  
  atom records   page 52   atom name, element, forcefield atom type, charge group name, isotopic number, formal charge, atomic charge, switching atom flag, out-of-plane flag, chirality flag, occupancy, X-ray temperature factor, number of connections, connectivity (order and types of data as defined by @column record)  
#symmetry     page 55   symmetry information  
  @periodicity   page 56   translational periodicity of the system  
  @group   page 58   symmetry group name associated with periodicity of system  
  @matrix   page 58   matrix representations of the symmetry operators  
  @helix   page 59   indicates that helical symmetry is present  
#atomset     page 60   named subsets of atoms for specific purposes  
  @degree n   page 60   number of atoms associated (used for general case)  
  @list   page 61   list of associated atoms (used, for example, to define and name pseudoatoms, backbone atoms, or subsets)  
  @quartet   page 63   states that four atoms are associated (used, for example, to define and name torsional or dihedral angles)  
<last line>   #end   page 63   end-of-file marker  

Header Record

The first record of a molecular data file must be:


!BIOSYM molecular_data #

The ! must be the first character in the file. The Discover program interprets this line as indicating an ASCII file containing molecular data records as outlined in this section. The string molecular_data indicates that the contents of the file are those of an .mdf file; the # is replaced by an actual number, which identifies the file format for the Discover program. The number 4, for example, indicates that the file format is as specified here for the Discover program, versions 2.9.5/3.2 and later.

Comment Record

Comment lines begin with an ! and may occur anywhere after the first record. By convention, the Insight program inserts a system title and a date as a comment record after the version record.

Topology Section

The topology section contains tabular information about atoms in a molecule or system of associated molecules. Its first line is:


#topology

Next, the column headings are defined. The molecule name and atomic data follow.

Column Record

The column headings for the table of atomic information are defined at the beginning of the topology section, in an @column record. All @column records must precede the first molecule or atom record.

Column records have the following syntax:

@column # type specifier

where @column is a keyword identifying the record, # is the number of a column containing a certain type of atomic data, and specifier (for example, the name of a forcefield) further defines the type, when necessary.

The types of atomic data are shown in Table 9. Column headings must all be listed, in the order given.

Example:


@column 1 element
@column 2 atom_type cvff
...

Molecule Name Record

The syntax of the molecule name record is:

@molecule name type

where @molecule is an identifying keyword, name is a molecule name for identification purposes, and type is the optional type of molecule for classification purposes. If type is present, all molecules of the same type must be topologically identical.

Examples:


@molecule crambin
@molecule wat4 water
@molecule h2o5 water
@molecule benz1 c6h6

Atom Records

The atom records have no identifying keyword--they are identified by the fact that they immediately follow the molecule name record. Only unique atoms are included in the topology section. Symmetrically or translationally equivalent atoms are not included, although bonds to such atoms may be indicated.

Atom records consist of the fields shown in Table 10. The values allowed for flag settings are also shown.

Table 10. Types and Order of Atomic Data and Flag Settings

type flag setting description
<first column>   --   complete atom name in standard Insight format of residue:atom_name  
element   --   the chemical symbol of the atom  
atom_type   --   forcefield atom type (followed by forcefield name in @column definition)  
charge_group   --   name of the charge group (followed by the forcefield name in @column definition)  
isotope   --   isotopic number (0 indicates use of default)  
formal_charge   --   formal charge as a string (e.g., 1+, 2-, 1/2-)  
charge   --   floating-point value of the atomic charge (followed by the forcefield name in @column definition)  
switching_atom     flag for the switching atom in a group  
  0   indicates is it not a switching atom  
  1   switching atom for the group  
oop_flag     flag for out-of-plane atoms (followed by forcefield name in @column definition)  
  0   indicates it is not an oop atom  
  1   oop atom, use the order of atom types in the forcefield to determine the improper torsion  
  2   oop atom, use the order of atoms in connectivity record in .mdf file to determine the improper torsion  
chirality_flag     chirality of the connections  
  0   neither chiral nor prochiral  
  1   prochiral, priorities 0 0 1 2 in connectivity record  
  2   prochiral, priorities 0 1 1 2 in connectivity record  
  3   prochiral, priorities 0 1 2 2 in connectivity record  
  4   chiral, priorities 0 1 2 3 in connectivity record  
  8   not determined  
  9   unable to determine  
occupancy   --   partial occupancy factor  
xray_temp_factor   --   isotropic temperature factor from experiment (X-ray)  
n_connections   --   number of bonds to that atom  
<last columns>     connectivity records--the syntax is described below (if the chirality flag is set or the oop flag is 2, the order of these records is important)  

The syntax for atom records consists of one record for each of the first 11 data types listed in the table, followed on the same line by the connectivity records, which consist of several records.

Examples of atom records:

1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890


ACE_1:CA            C  c3      meA  0  0  -0.3000 1 0 8  1.0000  0.0000 4 connectivity record
N-M_3:N N n nme 0 0 -0.5000 1 1 8 1.0000 0.0000 3 connectivity record
(The underlined numbers are used merely to indicate the column numbers--they are not part of the file.)

Connectivity Records

The syntax of each connectivity record is:

resname_resnumber:atom%cellxyz#symop/bondorder,wedgebond

The number of connectivity records equals the number of atoms (including ghost atoms) that that atom is connected to. The meaning and default values of each of these is shown in Table 11. Except for atom and /bondorder, all other portions of a record may be omitted if the default values are satisfactory.

Table 11. Connectivity Record Items

item default value definition
resname   current name   residue name  
_resnumber:   _current number:   alphanumeric residue "number"  
atom   --   atom name (must be present)  
%cellxyz   %000   cell offsets to be applied to the atom (3 signed numbers with no intervening spaces)  
#symop   #1   integer index of the symmetry operation to be applied to the atom  
/bondorder   /1.0   floating-point number indicating the bond order of the connection  
,wedgebond   ,0   optional number indicating the stereochemistry of bonds in a 2D molecule (i.e., a "sketch") (0 for a non-wedged bond; -1 and 1 for the narrow and wide ends, respectively of a wedge-up bond; and -2 and 2 for the narrow and wide ends of a wedge-down bond)  

The order in which the atoms are listed in the connectivity record should correspond to the chirality flag. The priority ordering used for determining chirality and prochirality is reflected by the listing order from lowest to highest priorities.

Examples of connectivity records:

full form equivalent short form using default values

LEU_6:N%000#1/1.0 N/1.0
LEU_6:N%000#1/2.0 N/2.0
LEU_6:N%+0-1+0#1/1.0 N%0-10/1.0
LEU_6:N%000#5/2.0 N#5/2.0
ALA_7:N%000#1/1.0 ALA_7:N/1.0

(for all these examples except the last one, the current (default) value of resname_resnumber: is LEU_6:)

Example atom and connectivity records for helix:


eth_1:C1	C  c   a    0	 0  -0.1200  1 0 1 4 C2 C3%001
eth_1:C2 C c a 0 0 -0.1200 1 0 1 4 C1 C2
eth_1:C3 C c b 0 0 -0.1200 1 0 1 4 C2 C1%00-1
This example is for an isolated 3-atom helix containing all single bonds. Atom C1 is connected to atom C2 and to a helical image of atom C3 so that C1 lies between C3 and the image of C3. Likewise, C3 is bonded to atom C2 and to the helical image of C1. The locations of the image atoms are generated by means of helix information stored in the .car or .arc files.

Symmetry Section

The symmetry section contains information about the geometry-independent periodicity and symmetry of a molecular system. Its first line is:


#symmetry

This is followed by records describing the periodicity, the symmetry group associated with the periodicity, explicit matrix representations of the operators of the group, and whether helical symmetry is present. Some of these records are optional.

Periodicity Record

The periodicity record is optional and defines the translational periodicity of the molecular system and the system of axes used to set up the periodicity.

The syntax of the periodicity record is:

@periodicity type axes

where @periodicity is the keyword identifier, type is the number of dimensions in which translational periodicity occurs, and axes is a description of how the periodic vectors relate to the Cartesian coordinate system. The type can have the values shown in Table 12.

Table 12. Types of Periodicity

value of periodicity type definition
0   no periodicity (default)  
2   2D periodicity  
3   3D periodicity  

The axes entry consists of the letters x, y, and z. The number of letters used is equal to the periodicity type, and each letter can appear no more than once in the string. If no axes entry is present, the default alignment is assumed. The order and number of letters in the axes entry has the following significance:

The first letter (x, y, or z) specifies which cell vector is to be aligned with its associated Cartesian axis.

The second letter specifies which cell vector is to lie in the plane formed by its associated Cartesian axis and the Cartesian axis associated with the first letter.

The third letter specifies that the remaining cell vector lies somewhere in the space formed by the Cartesian axes.

Thus the axes entry zyx means that the c cell vector lies along the z axis, the b cell vector lies in the z, y plane, and the a cell vector lies somewhere in z, y, x space. Note that currently only the default values (Table 12) are supported.

The two axes letters together specify the Cartesian plane associated with the basal plane.

The first axes letter specifies the Cartesian axis upon which the basal plane k vector lies. The second axes letter specifies which Cartesian axis forms the plane in which the l vector lies.

Thus the axes entry yx means that the k basal plane vector lies along the y axis, the l basal plane vector lies in the x, y plane, and the basal plane is perpendicular to the z axis. Only the default values (Table 13) are currently supported.

Table 13. Default Values of axes Entries

Currently, the Discover and Insight programs support only the default axes specifications.

value of periodicity type default axes specification meaning
0   x   (no periodicity)  
2   xy   k = x axis; l is in x, y plane  
3   xyz   a = x axis; b is in x, y plane  

Examples:


@periodicity 3 xyz


@periodicity 2 xy


@periodicity 3

The last entry implies that the default axes specification is used.

Group Record

The group record contains the symmetry group associated with the periodicity of the system. Note that if the periodicity record is present and its type is nonzero, then the group record must be present.

The syntax of the group record is:

@group name or @group matrix #

where @group is the keyword identifier, name is the symmetry group name, matrix is a keyword indicating that matrix representations of the operators follow, and # is the number of matrices.

The symmetry group name is that associated with the periodicity type. For example, if the periodicity type is 3, then the group name is the name of a space group (see the Discover 2.9.x/98.0/3.0.0 User Guide). If the periodicity type is 2, then the group name is the name of a plane group.

Likewise, the matrix representations that follow are those for the operators associated with the given periodicity type.

Examples:


@group (P21 21 2)
@group matrix 4

Matrix Records

If a group is given the special name matrix, then the representation matrices of the complete set of operators of the space group must follow, one 4 X 4 matrix for each @matrix record.

The syntax of a matrix record is:

@matrix # name
a b c d
e f g h
i j k l
m n o p

where @matrix is the keyword identifier, # is the number of the operator and runs from 1 continuously to the number of operators in the group, name is an optional name for the operator. The single letters on the next four lines are the individual floating-point elements of the 4 X 4 representation matrix of the operator.

All matrices, including the identity operator, must be specified.

Example:


! Space group 5 (C 2)
@matrix 1
1. 0. 0. 0.
0. 1. 0. 0.
0. 0. 1. 0.
0. 0. 0. 1.
@matrix 2
-1. 0. 0. 0.
0. 1. 0. 0.
0. 0. -1. 0.
0. 0. 0. 1.
@matrix 3
1. 0. 0. 0.
0. 1. 0. 0.
0. 0. 1. 0.
0.5 0.5 0. 1.
@matrix 4
-1. 0. 0. 0.
0. 1. 0. 0.
0. 0. -1. 0.
0.5 0.5 0. 1.

Helix Record

The helix record indicates that helical symmetry is present in the system.

The syntax of the helix record is:

@helix

where @helix is the keyword identifier.

A helix record can be present only when the periodicity type is 0 or 2. Currently, the presence of this record means that all molecules in the system have helical symmetry.

Helix information is currently used only by the Polymer programs. The Discover program ignores the helix record, since it does not currently support infinite helices. The Insight program uses the helix record to display helical systems.

Examples:


#symmetry
@periodicity 2 xy
@helix
#symmetry
@helix

Atomset Section

The atomset section is used to define named lists of atoms so that different programs using the .mdf file can use the names to refer to sets of atoms. Its first line is:


#atomset

Each atomset record is introduced by a line having the following syntax:

@degree type name [other]

where @degree is an integer or synonymous word indicating how many subsequent atoms make up a single entry. For example, a list of bonds or distances have a degree of 2, and a list of dihedral angles has a degree of 4. A general list, with no association of atoms, has a degree of 1. The synonyms list, pair, triplet, and quartet are used for degrees 1 through 4, respectively.

The type field specifies a general type for the set of atoms and is used in determining how the set is to be used (Table 14).

Table 14. Atom Set Types

value of atom set type definition
backbone   atoms in the backbone or main chain of a polymer  
torsion   quartets of atoms bonded together to form torsion angles  
subset   general list of atoms corresponding to subsets in Insight program  
pseudoatom   list of atoms defining a pseudoatom  

The name field is the identifying name given to the set of atoms. Depending on the type of set, this can be the name of a torsion, pseudoatom, backbone, or general subset.

Depending on the type of set, other information may be either required or optional. This information is detailed below with the descriptions of the syntaxes for each type of set.

Following each set definition is a list of zero or more atoms that belong to the set. The list can continue for more than one line without any explicit continuation characters, up to the next line beginning with a @ or # symbol or the end of the file.

Atom specifications are in the standard Insight format. However, if either the molecule or molecule and residue portions of an atom specification are identical to those of the preceding atom in the list, they may be omitted. These current values will then be used as defaults.

The general syntax of the second line is:

moleculename:residuename:atomname

Because of the ability to use default values, a line such as:


poly1:eth1:c1 c2 eth2:c1 c2 poly2:eth1:c1 c2

is equivalent to the following explicit list:


poly1:eth1:c1 poly1:eth1:c2 poly1:eth2:c1 poly1:eth2:c2 
poly2:eth1:c1 poly2:eth1:c2

Definition of Backbone Atoms

The complete syntax for defining backbone atoms is:

@list backbone name
atom1 atom2 atom3 ...

where name is used to identify the backbone.

The atom list contains zero or more atoms and can continue for more than one line without any explicit continuation characters--it is considered finished at the next @ or # symbol. The specification for the first atom is in the standard Insight format. If the molecule or residue portions of the specifications for the other atoms are missing, they default to the previous appropriate value used in the list. Wildcards are allowed.

Example of defining backbone atoms:


@list backbone 1
poly1:eth_1:C1 C2
poly1:sty_2:C1 C2

Definition of Subsets of Atoms

The complete syntax for defining subsets is:

@list subset name
atom1 atom2 atom3 ...

where name is used to identify the subset. The atom list is the same as for defining backbone atoms.

Example of defining subset atoms:


@list subset eth1
poly1:eth1:c1 c2 h1 h2 h3 h4

Definition of Pseudoatoms

The complete syntax for defining pseudoatoms is:

@list pseudoatom name A
atom1 atom2 atom3 ...

where name is the name for the pseudoatom in the form of a simple atom specification, and A (arithmetic average) indicates the method of calculating pseudoatom coordinates. The atom list is the same as for defining backbone atoms.

Examples of defining pseudoatoms:


@list pseudoatom xmol:xres:x A
*:*:*
@list pseudoatom water:xres:cm A
water:*:*
@list pseudoatom poly1:sty_2:XPHE A
poly1:sty_2:C3 C4 C5 C6 C7 C8
H3 H4 H5 H6 H7 H8
@list pseudoatom poly1:xres:x A
poly1:eth_1:C* sty_2:C1,C2

Definition of Torsions

The complete syntax for assigning names to torsions is:

@quartet torsion name
atom1 atom2 atom3 atom4

where name is the name of the torsion being defined and must include the molecule and residue names in the standard Insight format. Wildcards are allowed. The molecule and residue names can be omitted from the atom names, in which case they are assumed to be the same as in the torsion name. Relative residue numbers denoted by a signed integer may be used (e.g., -1:C or +1:N). Full molecule and residue names may be given, but must also be used in the torsion name.

Examples of defining torsions:


@quartet torsion *:*:phi
-1:C N CA C
@quartet torsion *:VAL_*:chi1
N CA CB CG1
@quartet torsion crn:tor:tors
crn:1:C 2:N 2:CA 2:HA

End Record

The end of any section is marked either by the next section header starting with # or by the end of the file. The special header #end can also be used to end a section without introducing another section.

Sample .mdf Files

Example 1: Nonperiodic, Nonhelical System


!BIOSYM molecular_data 4
!
!DATE: Fri Sep 27 13:50:15 1993 INSIGHT generated molecular data file
!
#topology
!
@column 1 element
@column 2 atom_type cvff
@column 3 charge_group cvff
@column 4 isotope
@column 5 formal_charge
@column 6 charge cvff
@column 7 switching_atom cvff
@column 8 oop_flag cvff
@column 9 chirality_flag
@column 10 occupancy
@column 11 xray_temp_factor
@column 12 connections
!
@molecule ACEALANM
!
ACE_1:CA C c3 meA 0 0 -0.3000 1 0 8 1.0000 0.0000 HA1 HA2 HA3 C
ACE_1:HA1 H h meA 0 0 0.1000 0 0 8 1.0000 0.0000 CA
ACE_1:HA2 H h meA 0 0 0.1000 0 0 8 1.0000 0.0000 CA
ACE_1:HA3 H h meA 0 0 0.1000 0 0 8 1.0000 0.0000 CA
ACE_1:C C c' pepC 0 0 0.3800 1 1 8 1.0000 0.0000 CA O/2.0 ALA_2:N
ACE_1:O O o' pepC 0 0 -0.3800 0 0 8 1.0000 0.0000 C/2.0
ALA_2:N N n pepN 0 0 -0.5000 1 1 8 1.0000 0.0000 ACE_1:C CA HN
ALA_2:CA C ca pepN 0 0 0.1200 0 0 8 1.0000 0.0000 N HA C CB
ALA_2:HN H hn pepN 0 0 0.2800 0 0 8 1.0000 0.0000 N
ALA_2:HA H h pepN 0 0 0.1000 0 0 8 1.0000 0.0000 CA
ALA_2:C C c' pepC 0 0 0.3800 1 1 8 1.0000 0.0000 CA O/2.0 N-M_3:N
ALA_2:O O o' pepC 0 0 -0.3800 0 0 8 1.0000 0.0000 C/2.0
ALA_2:CB C c3 meB 0 0 -0.3000 1 0 8 1.0000 0.0000 CA HB1 HB2 HB3
ALA_2:HB1 H h meB 0 0 0.1000 0 0 8 1.0000 0.0000 CB
ALA_2:HB2 H h meB 0 0 0.1000 0 0 8 1.0000 0.0000 CB
ALA_2:HB3 H h meB 0 0 0.1000 0 0 8 1.0000 0.0000 CB
N-M_3:N N n nme 0 0 -0.5000 1 1 8 1.0000 0.0000 ALA_2:C CA HN
N-M_3:CA C c3 nme 0 0 -0.0800 0 0 8 1.0000 0.0000 N HA1 HA2 HA3
N-M_3:HN H hn nme 0 0 0.2800 0 0 8 1.0000 0.0000 N
N-M_3:HA1 H h nme 0 0 0.1000 0 0 8 1.0000 0.0000 CA
N-M_3:HA2 H h nme 0 0 0.1000 0 0 8 1.0000 0.0000 CA
N-M_3:HA3 H h nme 0 0 0.1000 0 0 8 1.0000 0.0000 CA
!
#atomset
!
@quartet torsion *:ALA_2:omeg
CA C *:N *:CA
@quartet torsion *:ALA_2:phi
*:C N CA C
@quartet torsion *:ALA_2:chi1
N CA CB HB1

Example 2: 3D-Periodic, Nonhelical System

This .mdf file contains 3 molecules having 3D symmetry with explicit space group matrices.


!BIOSYM molecular_data 4
!DATE: Thu Jun 11 15:24:13 1993 INSIGHT generated molecular data file
!
#topology
!
@column 1 element
@column 2 atom_type cvff
@column 3 charge_group cvff
@column 4 isotope
@column 5 formal_charge
@column 6 charge cvff
@column 7 switching_atom cvff
@column 8 oop_flag cvff
@column 9 chirality_flag
@column 10 n_connections
@column 11 connectivity
!
@molecule WTR1 water
WTR_1:O1 O o* WTR 16 0 -0.8200 1 0 0 2 H1 H2
WTR_1:H1 H h* WTR 2 0 0.4100 0 0 0 1 O1
WTR_1:H2 H h* WTR 2 0 0.4100 0 0 0 1 O1
!
@molecule SF6 sulfur_hexafluoride
sf6_1:S S s a 0 1+ 1.5000 1 0 0 6 F1 F1#2 F1#3 F1#4 F2 F3
sf6_1:F1 F f a 0 1/6- -0.2500 0 0 0 1 S
sf6_1:F2 F f a 0 1/6- -0.2500 0 0 0 1 S
sf6_1:F3 F f a 0 1/6- -0.2500 0 0 0 1 S
!
@molecule poly1 ethylene-styrene
eth_1:C1 C c a 0 0 -0.1200 1 0 1 4 H1 H2 C2 sty_2:C2%010
eth_1:H1 H h a 0 0 0.0600 0 0 0 1 C1
eth_1:H2 H h a 0 0 0.0600 0 0 0 1 C1
eth_1:C2 C c b 0 0 -0.1200 1 0 1 4 H3 H4 C1 sty_2:C1
eth_1:H3 H h b 0 0 0.0600 0 0 0 1 C2
eth_1:H4 H h b 0 0 0.0600 0 0 0 1 C2
sty_2:C1 C c me1 0 0 -0.0600 1 0 4 4 H1 C2 eth_1:C2 sty_2:C3
sty_2:H1 H h me1 0 0 0.0600 0 0 0 1 C1
sty_2:C2 C c me2 0 0 -0.1200 1 0 1 4 H2 H3 C1 eth_1:C1%0-10
sty_2:H2 H h me2 0 0 0.0600 0 0 0 1 C2
sty_2:H3 H h me2 0 0 0.0600 0 0 0 1 C2
sty_2:C3 C cp ph1 0 0 0.0000 1 1 0 3 C1 C4/1.5 C8/1.5
sty_2:C4 C cp ph1 0 0 -0.1000 0 1 0 3 C3/1.5 C5/1.5 H4
sty_2:H4 H h ph1 0 0 0.1000 0 0 0 1 C4
sty_2:C8 C cp ph1 0 0 -0.1000 0 1 0 3 C3/1.5 C7/1.5 H8
sty_2:H8 H h ph1 0 0 0.1000 0 0 0 1 C8
sty_2:C5 C cp ph2 0 0 -0.1000 0 1 0 3 C4/1.5 C6/1.5 H5
sty_2:H5 H h ph2 0 0 0.1000 0 0 0 1 C5
sty_2:C6 C cp ph2 0 0 -0.1000 1 1 0 3 C5/1.5 C7/1.5 H6
sty_2:H6 H h ph2 0 0 0.1000 0 0 0 1 C6
sty_2:C7 C cp ph2 0 0 -0.1000 0 1 0 3 C6/1.5 C8/1.5 H7
sty_2:H7 H h ph2 0 0 0.1000 0 0 0 1 C7
!
#symmetry
@periodicity 3 xyz
@group matrix 4
@matrix 1 identity
1.000 0.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 0.000 1.000 0.000
0.000 0.000 0.000 1.000
@matrix 2 C4_1
0.000 1.000 0.000 0.000
1.000 0.000 0.000 0.000
0.000 0.000 1.000 0.000
0.000 0.000 0.000 1.000
@matrix 3 C4_2
-1.000 0.000 0.000 0.000
0.000 1.000 0.000 0.000
0.000 0.000 1.000 0.000
0.000 0.000 0.000 1.000
@matrix 4 C4_3
0.000 -1.000 0.000 0.000
-1.000 0.000 0.000 0.000
0.000 0.000 1.000 0.000
0.000 0.000 0.000 1.000

Example 3: Nonperiodic, Helical System

This .mdf file contains a single helical molecule with no translational periodicity.


!BIOSYM molecular_data 4
!DATE: Thu Jun 11 17:44:53 1993 INSIGHT generated molecular data file

#topology

@column 1 element
@column 2 atom_type cvff
@column 3 charge_group cvff
@column 4 isotope
@column 5 formal_charge
@column 6 charge cvff
@column 7 switching_atom cvff
@column 8 oop_flag cvff
@column 9 chirality_flag
@column 10 occupancy
@column 11 xray_temp_factor
@column 12 connections

@molecule TEST_6_11_HLX1
ETHE_1:C1 C c2 0 0 0 1.0000 0 1 8 0.0000 0.0000 H11 H12 C2 C2%001
ETHE_1:H11 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C1
ETHE_1:H12 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C1
ETHE_1:C2 C c2 0 0 0 1.0000 0 1 8 0.0000 0.0000 H21 H22 C1 C1%00-1
ETHE_1:H21 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C2
ETHE_1:H22 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C2

#symmetry
@helix
!
#atomset
@list backbone TEST_6_11_HLX1
TEST_6_11_HLX1:ETHE_1:C1 C2

Example 4: 2D-Periodic, Helical System

This .mdf file contains 4 helical molecules exhibiting 2D translational periodicity.


!BIOSYM molecular_data 4
!DATE: Thu Jun 11 17:42:58 1993 INSIGHT generated molecular data file

#topology

@column 1 element
@column 2 atom_type cvff
@column 3 charge_group cvff
@column 4 isotope
@column 5 formal_charge
@column 6 charge cvff
@column 7 switching_atom cvff
@column 8 oop_flag cvff
@column 9 chirality_flag
@column 10 occupancy
@column 11 xray_temp_factor
@column 12 connections

@assembly NEW_CELL

@molecule TEST_6_11_HLX
ETHE_1:C1 C c2 0 0 0 1.0000 0 1 8 0.0000 0.0000 H11 H12 C2 C2%001
ETHE_1:H11 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C1
ETHE_1:H12 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C1
ETHE_1:C2 C c2 0 0 0 1.0000 0 1 8 0.0000 0.0000 H21 H22 C1 C1%00-1
ETHE_1:H21 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C2
ETHE_1:H22 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C2

@molecule TEST_6_11_HLX01
ETHE_1:C1 C c2 0 0 0 1.0000 0 1 8 0.0000 0.0000 H11 H12 C2 C2%001
ETHE_1:H11 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C1
ETHE_1:H12 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C1
ETHE_1:C2 C c2 0 0 0 1.0000 0 1 8 0.0000 0.0000 H21 H22 C1 C1%00-1
ETHE_1:H21 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C2
ETHE_1:H22 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C2

@molecule TEST_6_11_HLX0101
ETHE_1:C1 C c2 0 0 0 1.0000 0 1 8 0.0000 0.0000 H11 H12 C2 C2%001
ETHE_1:H11 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C1
ETHE_1:H12 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C1
ETHE_1:C2 C c2 0 0 0 1.0000 0 1 8 0.0000 0.0000 H21 H22 C1 C1%00-1
ETHE_1:H21 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C2
ETHE_1:H22 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C2

@molecule TEST_6_11_HLX010101
ETHE_1:C1 C c2 0 0 0 1.0000 0 1 8 0.0000 0.0000 H11 H12 C2 C2%001
ETHE_1:H11 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C1
ETHE_1:H12 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C1
ETHE_1:C2 C c2 0 0 0 1.0000 0 1 8 0.0000 0.0000 H21 H22 C1 C1%00-1
ETHE_1:H21 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C2
ETHE_1:H22 H h 0 0 0 0.0000 0 0 8 0.0000 0.0000 C2

#symmetry
@periodicity 2 xy
@group (P1)
@helix
!
#atomset

@list backbone TEST_6_11_HLX
TEST_6_11_HLX:ETHE_1:C1 C2

@list backbone TEST_6_11_HLX01
TEST_6_11_HLX01:ETHE_1:C1 C2

@list backbone TEST_6_11_HLX0101
TEST_6_11_HLX0101:ETHE_1:C1 C2

@list backbone TEST_6_11_HLX010101
TEST_6_11_HLX010101:ETHE_1:C1 C2


Molecular Structure File (.msf)

Molecular structure files are written by QUANTA. Molecular structure files (MSFs) contain data and information about a molecule. An MSF contains three levels of information:

In addition, extra information such as solvent accessibility, thermal mobility, and electrostatic potential can be included in the MSF. Extra information is incorporated as a number associated with each atom and can be retrieved through a label. This label enables the selection and coloring of a molecule based on one of these parameters. The extra information can also hold pointers to surface files, symmetry information, or vectors for each atom. In this way, virtually any information about a molecule can be held in the MSF.

For files created in QUANTA 3.3 up to QUANTA98, the version number is "QUANTAR3.3". The version number for MSF files created in QUANTA98 is "QUANTA98". Insight II release 98.0 can read both the 3.3 and the 98 versions.

The MSF is a sequential binary file.

To learn more about the MSF format, see the QUANTA98 Basic Operations guide. An online version is available on the MSI Documentation CD or from the MSI web site:

http://www.msi.com/doc


Brookhaven Protein Databank File (.pdb)

For a complete description of the .pdb file format, contact the Brookhaven Protein Data Bank. Directions for postal, gopher, and ftp contact are in the Insight Products System Guide, Preparing for the Installation. You can visit the Protein Data Bank on the worldwide web at:

http://www.pdb.bnl.gov

The format of the PDB files has been revised since the last release of Insight. For a complete description of the new format, load this URL into your web browser:

http://www.pdb.bnl.gov/Format.doc/Contents_Guide_2.html

This section describes how Insight products handle the new or changed parts of the .pdb file format.

The file reader from Insight releases earlier than Insight II 98.0 ignore the new records and the changes to the existing record types do not impact the reader.

The Insight II 98.0 file reader handles the new parts of the PDB file format in the following ways.

Reading PDB Files

1.   Check for a REMARK line indicating this is a FORMAT 2.0 file.

2.   Look for segment identifier in columns 73-76 of the .pdb file. If present, a subset is created that collects the atoms for that segment.

3.   Read the element name from columns 77, 78.

4.   Read the formal charge from columns 79, 80.

Writing PDB Files

1.   Write out a REMARK line that says it is FORMAT 2.0.

2.   Write out the element name in columns 77, 78.

3.   Write out the formal charge in columns 79, 80.

4.   No segment identifiers are written out.


X-PLOR Coordinate File (.pdbx)

The .pdbx file contains the Cartesian coordinates of a molecular system, suitable for input to X-PLOR. Refer to Chatper 6 of X-PLOR, Version 3.1, A System for X-ray Crystallography and NMR (Axel Brunger, Yale Univ. Press, 1992) for a complete description and usage of this file format.

The .pdbx file is essentailly similar to the Brookhaven Protein Data Bank (PDB) format, with the following differences:

1.   X-PLOR does not use chain identifier information. Instead it uses the characters in columns 73-76 for the segment name.

2.   The insertion character is treated as part of the residue number. The residue number is a string consisting of a maximum of four characters.

3.   X-PLOR ignores any reference to atom numbers and generates its own numbering scheme.

4.   The REMARK record of PDB files is treated as a title record.

5.   No other type of PDB specification, such as HETAT, SCALE, or SEQU is interpreted at present. These additional records have to be removed before one reads PDB coordinates withX-PLOR.

6.   The PDB convention requires an END statement at the end of the coordinate file. X-PLOR uses the same convention.


NMR Peak Intensity/Integral (.pks)

Description of Sections

Table 15. Sections of .pks File

Section Example Description
Header   !BIOSYM nmr_peak_intensities 2    
Data      
mixing times

<float1> <float2>...<floatN>   mixing times associated with peak intensities  
peak intensities

<pkID> <floatw2> <floatw1> <floatlw2> <floatlw1> <float1> <float2> ... <floatN>   measured peak intensities  

Table 16. Variables in .pks File

Variable Description
<floati>

experimental mixing time  
<pkID>

peak specification (integer > 0)  
<floatw2>

peak position in spectrum along w2 axis in ppm  
<floatw1>

peak position in spectrum along w1 axis in ppm  
<float1w2>

line width along w2 axis  
<floatlw1>

line width along w1 axis  
<floatN>

peak intensity corresponding to N-th mixing time  

Rules:

Sample .pks File


!BIOSYM nmr_peak_intensities 2
!
#mixing_times
2.000000E-02 4.000000E-02 8.000000E-02 1.200000E-01
!
#peak_intensities
!Peak W2_Pos W1_Pos LineWdth2 LineWdth1 Intensities
!
120 5.315 2.091 8.575 17.070 4.0410E+05 2.3580E+05 6.4820E+05 4.1640E+05
121 5.256 3.133 8.575 17.790 1.9220E+06 2.6670E+06 3.3220E+06 3.8150E+06
122 5.123 1.009 8.575 17.910 8.4710E+06 1.1300E+07 2.3800E+07 3.0970E+07
123 4.324 2.344 10.020 17.600 5.8450E+05 2.0510E+06 4.7590E+06 7.6090E+06
124 4.081 4.400 9.328 16.770 3.3020E+05 -1.3270E+05 2.8710E+05 1.2310E+06
125 4.006 1.212 8.575 16.890 -6.1340E+05 7.5890E+05 1.3390E+06 2.7490E+06
126 3.336 1.003 9.005 17.560 -8.9400E+04 -7.3900E+05 2.0200E+06 2.5500E+06
127 4.081 8.570 8.575 16.890 4.7400E+06 1.2700E+07 2.3900E+07 3.1900E+07
128 5.235 1.789 9.320 25.200 4.6700E+05 7.3400E+05 9.7000E+05 2.0400E+06
129 2.323 0.763 8.575 17.070 2.4600E+06 4.4700E+06 6.6900E+06 9.0400E+06
130 3.456 1.234 8.575 17.790 -1.1000E+05 -4.1000E+04 3.1600E+05 1.3900E+06
131 5.289 2.569 8.575 17.910 3.16000E+05 6.6000E+06 1.5400E+07 2.1000E+07
132 1.312 1.132 10.020 17.600 5.3800E+06 8.3500E+06 1.1390E+07 1.6870E+07
133 1.789 1.766 9.328 16.770 3.3800E+06 8.3500E+06 1.1390E+07 1.6870E+07
134 3.232 2.737 8.575 16.890 9.4200E+05 2.6000E+06 8.6100E+06 1.2400E+07


Pseudoatom Library (.plb)

There are two pseudoatom library files (cvffa.plb and amber.plb) which contain pseudoatoms respectively corresponding to the monomers used in the cvffa and amber forcefields. These pseudoatom library files are located in the directory referred to by the $BIOSYM_LIBRARY environmental variable.

Description of Sections

Table 17. Sections in .plb File

Section Example Description
Header   !BIOSYM pseudo_atom_library 1    
Data      
plb_entry

  list of pseudoatoms for each monomer  
  <pseudoatom_name>   <atom_name1> <atom_name2> ... <atom_nameN>   Type <pseudo_type>   <prochirality> {Center: <center_name> Reference: <ref_name>}  

Table 18. Variables in .plb File

Variable Description
<pseudoatom_name>

name assigned to pseudoatom  
<atom_name1> ... <atom_nameN>

names of atoms which make up pseudoatom  
Type <pseudo_type>

pseudoatom type (used for bound corrections). Valid types include CH2, CH3, 2CH3, ArH2, 2ArH2, NH2, 2NH2, NH3  
<prochirality>

Prochiral or Not_Prochiral. If prochirality = Prochiral, the optional prochiral fields below are required  
<center_name>

name of the atom which is the prochiral center  
<ref_name>

name of the atom bonded to the prochiral center which is on the path to the pseudoatom  

Each plb_entry contains four lines with the above information, and consists of a list of pseudoatom entries separated by a line with the "!" character in the first column.

Sample .plb File


!BIOSYM pseudo_atom_library 1
!
! CVFF
!
#plb_entry
ALAN Alanine, positive N-terminus
HNX
HN1 HN2 HN3
Type NH3
Not_Prochiral
!
HBX
HB1 HB2 HB3
Type CH3
Not_Prochiral
!
!
#plb_entry
ALA Alanine, polypeptide residue
HBX
HB1 HB2 HB3
Type CH3
Not_Prochiral
!
!
.
.
.
#plb_entry
LEUN Leucine, positive N-terminus
HNX
HN1 HN2 HN3
Type NH3
Not_Prochiral
!
HBX
HB1 HB2
Type CH2
Not_Prochiral
!
HD1X
HD11 HD12 HD13
Type CH3
Prochiral Center: CG Reference: CD1
!
HD2X
HD21 HD22 HD23
Type CH3
Prochiral Center: CG Reference: CD2
!
HDX
HD11 HD12 HD13 HD21 HD22 HD23
Type 2CH3
Not_Prochiral
!
!
.
.
.


Proton Chemical Shifts (.ppm)

Description of Sections

Table 19. Sections in .ppm File

Section Example Description
Header   !BIOSYM nmr_chemical_shifts 1    
chemical shifts   <atom_spec> <float1> <float2> <float3>   Chemical shift information  

Table 20. Variables in .ppm File

Variable Description
<atom_spec>

reference to a hydrogen or pseudoatom  
<float1>

chemical shift in ppm  
<float2>

T1 leakage rate  
<float3>

Line width in Hz  

Rule:


1:ASN_1:HBR   1.1500 0.0000 0.0000

Sample .ppm File


!BIOSYM nmr_chemical_shifts 1
#chemical_shifts
! Atom Spec PPM T1 Leak Line Width
!
1:SERN_1:HA 4.390 1.000 20.000
1:SERN_1:HB* 4.080 1.000 20.000
1:SERN_1:HG 1.100 1.000 20.000
1:ASN_2:HN 8.570 1.000 20.000
1:ASN_2:HA 4.550 1.000 20.000
1:ASN_2:HB2 2.740 1.000 20.000
1:ASN_2:HB1 3.230 1.000 20.000
1:ASN_2:HD2* 6.710 1.000 20.000
1:PHE_3:HN 9.300 1.000 20.000
1:PHE_3:HA 4.040 1.000 20.000
1:PHE_3:HB* 3.570 1.000 20.000
1:PHE_3:HD* 7.230 1.000 20.000
1:PHE_3:HE* 7.390 1.000 20.000
1:PHE_3:HZ 7.270 1.000 20.000
...
1:AR+C_7:HD* 2.405 1.000 20.000


Dynamics Scratch File (.pre)

The dynamics scratch file is used as a buffer file for temporary storage of the thermodynamic state table generated during constant-pressure dynamics. This table is appended to the .out file after the energy table.


Protein Bond Angle Table (pro_angle.dat)

Much of the following information is taken straight from the files themselves and serves to explain the syntax and meaning of the values.

The pro_angle.dat file contains a protein bond angle table consisting of residue- specific bond angles and standard deviations. This information was derived from
R. A. Engh and R. Huber (Acta. Cryst., A47, 292-300, 1991)

File rules:

For example:


C       N       CA      *       GLY     GLY     120.6   1.7

describes a C-N-CA bond angle (where the carbonyl atom may be in any residue while the nitrogen and alpha carbon atoms are in a glycine) as having a mean bond angle of 120.6° with a standard deviation of 1.7°.

Description of Sections

Table 21. Sections in pro_angle.dat File

Section Example Description
Comment   !File rules   ! character implies comment  
Data   <atom1> <atom2> <residue1> <residue2> <residue3> <mean> <std_devn>   Atom and residue ID's, bond angle mean value, and standard deviation  

Table 22. Variables in pro_angle.dat File

Variable Description
<atom1>

Valid atom name  
<atom2>

Valid atom name  
<atom3>

Valid atom name  
<residue1>

Valid residue name  
<residue2>

Valid residue name  
<residue3>

Valid residue name  
<mean>

Bond angle mean value in degrees  
<std_devn>

Bond angle standard deviation in degrees  

Sample pro_angle.dat File


! Created Sept 7 1994. 
! Residue Specific bond Angles and standard deviations
! Information derived from
! R.A.Engh and R.Huber, Acta. Cryst., A47 292-300 (1991),
! File rules
! 1) Later lines take precedence over earlier ones with
! atoms of same names.
! 2) Atom name in column 1 is associated with residue name
! in column 4, 2 with 5 etc.
! 3) Only wildcarding allowed is a single * character
! this implies a match with any residue name
! 4) A zero entry implies this specific bond not present
! in the data base and will not be checked.
C N CA * * * 121.7 1.8
C N CA * GLY GLY 120.6 1.7
C N CA * PRO PRO 122.6 5.0
CA C N * * * 116.2 2.0
CA C N GLY GLY * 116.4 2.1
CA C N * * PRO 116.9 1.5
CA C N GLY GLY PRO 118.2 2.1
CA C O * * * 120.8 1.7
CA C O GLY GLY GLY 120.8 2.1
CB CA C * * * 110.1 1.9
CB CA C ALA ALA ALA 110.5 1.5
CB CA C ILE ILE ILE 109.1 2.2
CB CA C THR THR THR 109.1 2.2
CB CA C VAL VAL VAL 109.1 2.2
N CA C * * * 111.2 2.8
N CA C * GLY GLY 112.5 2.9
N CA C PRO * * 111.8 2.5
N CA C PRO GLY GLY 0.0 0.0
N CA CB * * * 110.5 1.7
N CA CB ILE ILE ILE 111.5 1.7
N CA CB THR THR THR 111.5 1.7
N CA CB VAL VAL VAL 111.5 1.7
N CA CB ALA ALA ALA 110.4 1.5
N CA CB PRO PRO PRO 103.0 1.1
O C N * * * 123.0 1.6
O C N * * PRO 122.0 1.4


Protein Bond Length Table (pro_bond.dat)

Much of the following information is taken straight from the files themselves and serves to explain the syntax and meaning of the values.

The pro_bond.dat file contains a protein bond length table consisting of residue-specific bond lengths and standard deviations. This information was derived from
R. A. Engh and R. Huber (Acta. Cryst., A47, 292-300, 1991).

File rules:

For example:


N       CA      GLY     1.451   0.016

states that the bond between a Glycine nitrogen and an alpha carbon has a mean value of 1.451 Å, with a standard deviation of 0.016 Å.

Much of the following information is taken straight from the files themselves and serves to explain the syntax and meaning of the values.

The pro_bond.dat file contains a protein bond length table consisting of residue-specific bond lengths and standard deviations. This information was derived from
R. A. Engh and R. Huber (Acta. Cryst., A47, 292-300, 1991).

File rules:

For example:


N       CA      GLY     1.451   0.016

states that the bond between a Glycine nitrogen and an alpha carbon has a mean value of 1.451 Å, with a standard deviation of 0.016 Å.

Description of Sections

Table 23. Sections in pro_bond.dat File

Section Example Description
Comment   !File rules   ! character implies comment  
Data   <atom1> <atom2> <residue> <mean> <std_devn>   Atom, residue ID's, bond length mean value, and standard deviation  

Table 24. Variables in pro_bond.dat File

Variable Description
<atom1>

Valid atom name  
<atom2>

Valid atom name  
<residue>

Valid residue name  
<mean>

Bond length mean value in Å  
<std_devn>

Bond length standard deviation in Å  

Sample pro_bond.dat File


! Created Sept 7 1994. 
! Residue Specific bond lengths and standard deviations
! Information derived from
! R.A.Engh and R.Huber, Acta. Cryst., A47 292-300 (1991),
! File rules
! 1) Later lines take precedence over earlier ones with
! atoms of same names.
! 2) Atom name in first column is associated with residue name
! in third column.
! 3) Only wildcarding allowed is a single * character
! this implies a match with any residue name
CA C * 1.525 0.021
CA C GLY 1.516 0.018
C O * 1.231 0.020
CB CA * 1.530 0.020
CB CA ALA 1.521 0.033
CB CA ILE 1.540 0.027
CB CA THR 1.540 0.027
CB CA VAL 1.540 0.027
N CA * 1.458 0.019
N CA PRO 1.466 0.015
N CA GLY 1.451 0.016
N C * 1.329 0.014
N C PRO 1.341 0.016
SG SG CYS 2.000 0.100


NMR Project (.proj)

The .proj file contains a history of the refinement steps performed on a given molecular system along with the input files used for each step. Each new step is appended to the end of the file so it may be used as a running account of the structure refinement process. Note that since this process may take place over a long period of time, each step in the file begins with a project header line which contains the version of the NMR software used.

Description of Sections

Table 25. Sections in .proj File

Section Example Description
Data     One or more sections (separated by a blank line) containing the
following information:  
  !BIOSYM project 1   <timestamp>

<run_description>

<comment>

Project files written:   <file1> <file2> ... <filen>

{<RUN> files written:   <run_file1> <run_file2> ... <run_filen>}

 
 

Table 26. Variables in .proj File

Variable Description
<timestamp>

day_of_week, month, day, HH:MM:SS, year  
<run_description>

description of the type of run, molecule name, etc.  
<comment>

comment for the project update conducted at the given time stamp  
<file1>, <file2>, ... <filen>

NMR database files used in the current step (e.g., file.rstrnt, file.ppm, file.asn, file.pks, etc.)  
<RUN>

Run type (e.g., RMA, DGII)  
<run_file1>,
<run_file2>, ... <run_filen>

List of specific input files created for use in the given run  

Sample .proj File


!BIOSYM project 1
Mon Nov 4 11:04:14 1991
RMA Run: test Molecule=CRAM7AVG NMR_project=test.
Test of average structure.
Project files written:
cram7avg.ppm cram7avg.pks cram7avg.asn cram7avg.rstrnt
RMA files written:
test.rmainp test.mdh test.shift test.rma_temp test.rstrnt_temp
Updated RMA files are test_01.rma and test_01.rstrnt

!BIOSYM project 1
Mon Nov 4 11:58:29 1991
RMA Run: test Molecule=CRAM7AVG NMR_project=test.
Test of average structure.
Project files written:
cram7avg.ppm cram7avg.pks cram7avg.asn cram7avg.rstrnt
RMA files written:
test.rmainp test.mdh test.shift test.rma_temp test.rstrnt_temp
Updated RMA files are test_02.rma and test_02.rstrnt


Insight Protein Miscellaneous Properties Table (pro_misc.dat)

Much of the following information is taken straight from the files themselves and serves to explain the syntax and meaning of the values.

The pro_misc.dat file contains a table of miscellaneous protein properties. This information was derived from J. Thornton and co workers (J. Appl. Cryst., 26, 283-291, 1993), R. A. Engh and R. Huber (Acta. Cryst., A47, 292-300, 1991), and from M. Macarthur (private communication).

File rules:

For example:


OMEGA                   180.0   5.8

describes the omega torsion angle as having a mean value of 180.0 °, with a standard deviation of 5.8 °.

Much of the following information is taken straight from the files themselves and serves to explain the syntax and meaning of the values.

The pro_misc.dat file contains a table of miscellaneous protein properties. This information was derived from J. Thornton and co workers (J. Appl. Cryst., 26, 283-291, 1993), R. A. Engh and R. Huber (Acta. Cryst., A47, 292-300, 1991), and from M. Macarthur (private communication).

File rules:

For example:


OMEGA                   180.0   5.8

describes the omega torsion angle as having a mean value of 180.0 °, with a standard deviation of 5.8 °.

Description of Sections

Table 27. Sections in pro_misc.dat File

Section Example Description
Comment   !File rules   ! character implies comment  
Data   <keyword> <mean> <std_devn>   Property name, mean value, and standard deviation  

Table 28. Variables in pro_misc.dat File

Variable Description
<keyword>

Name of per-residue property  
<mean>

Property mean value  
<std_devn>

Property standard deviation  

The following keywords are recognized in this release:

Table 29. Keywords in pro_misc.dat File

Keyword Description
CHI_1_RANGE_1

 
CHI_1_RANGE_2

 
CHI_1_RANGE_3

Three ranges for chi1 torsion  
CHI_2_RANGE_1

 
CHI_2_RANGE_2

 
CHI_2_RANGE_3

Three ranges for chi2 torsion  
PROLINE_PHI

Proline specific phi torsion  
HELIX_PHI

 
HELIX_PSI

Phi, Psi angles in a helix found by Kabsch-Sander method  
CHI_3_SS_RANGE_1

 
CHI_3_SS_RANGE_2

Two allowed ranges for disulfide bond torsion  
OMEGA

Peptide bond torsion angle  
E_H_BOND_KS

H-bond donor energy found by Kabsch-Sander method  
CA_VIRTUAL_TORSION

Alpha carbon virtual torsion CA-N-C-CB  

Sample pro_misc.dat File


! Created Sept 15 1994. 
! Contains information on miscellaneous protein properties.
! Information derived from
! J. Thornton and co workers J. Appl. Cryst. vol 26, 283-291 (1993)
! R.A.Engh and R.Huber, Acta. Cryst., A47 292-300 (1991),
! M.Macarthur private communication.
! File rules
! 1) First word on line is property keyword
! second word is property mean value
! third word is property standard deviation
! 2) Values of mean and standard deviation can be modified by user
! names of keywords cannot be changed
! 3) Unrecognised keywords ignored.
!
! Kabsch+Sander H-Bond energies in Kcal
CHI_1_RANGE_1 64.1 15.7
CHI_1_RANGE_2 183.6 16.8
CHI_1_RANGE_3 -66.7 15.0
CHI_2_RANGE_1 68.7 21.3
CHI_2_RANGE_2 177.5 19.4
CHI_2_RANGE_3 -71.8 21.1
PROLINE_PHI -65.4 11.2
HELIX_PHI -65.3 11.9
HELIX_PSI -39.4 11.3
CHI_3_SS_RANGE_1 96.8 14.8
CHI_3_SS_RANGE_2 -85.8 10.7
OMEGA 180.0 5.8
E_H_BOND_KS -2.02 0.75
CA_VIRTUAL_TORSION 33.9 3.5


X-PLOR Molecular Structure File (.psf)

The .psf file contains inforamtion about the molecular structure. This file is created by the


WRITe STRUcture

stement in X-PLOR and is suitable for input using the


STRUcture

statement in X-PLOR. The contents of this file consist of atmo names, types, charges and masses; residue names and segment names; and a list of bond terms, angle terms, dihedral terms, improper terms, explicit hydrogen-bonding terms, explicit nonbonded exclusions, and nonbonded group partitions. It does not contain atomic coordinates,parameters, constraints, restraints, or any other information that is specific to effective energy terms, such as diffraction data.

Refer to Chapter 3 of X-PLOR, Version 3.1, A System for X-ray Crystallography and NMR (Axel Brunger, Yale Univ. Press, 1992) for the description and usage of this file format.


Residue Library (.rlb)

The residue libraries have been created for use in assigning potential function atom types and partial charges to peptides, proteins, and nucleic acids. They contain experimental data for the twenty standard amino acid residues, and for other selected residues.

Three standard residue libraries are provided:

1.   Consistent valence amino acids, the default (cvffa.rlb)

2.   AMBER DNA and RNA nucleic acids plus amino acids (amber.rlb)

3.   Consistent valence amino acids, for use with the potential energy function consortium CFF91 forcefield.

This section is intended to help you understand the structure of these files and to aid you in using the library to prepare molecules for molecular mechanics simulations.

Note: A complete description of the residue library is given below. However, only the potential function atom type, partial charge, charge group, and named torsion fields are used by Insight II. All other numeric fields may be set to 0 for use with Insight II. Insight II uses the geometry, topology, and bond order information found in the fragment libraries for all building functions.

The residue library provides connectivity information by specifying the parent of each and every atom found in the molecule. For each atom, the parent of the atom is uniquely specified, the bond order is given, and if the atom is involved in a ring closure then the ring closure atom is also specified. If the parent of a given atom is found in the current residue then the name of that parent atom is given. If the parent is in the preceding residue then the bond order to the parent atom is followed by an asterisk, *, to indicate that the parent is located in the previous residue.

The three geometrical parameters that are provided for each atom are:

1.   The distance of the current atom from its parent atom,

2.   The angle subtended by the current atom, its parent, and its grandparent, and

3.   The torsional or dihedral angle subtended by the atom, its parent, its grandparent, and its great-grandparent.

In addition, specific torsion angles can be given names in the residue library. Each torsion name represents a torsion angle between four specific atoms.

The next two fields contain two flags:

1.   A side chain flag.

2.   An out-of-plane flag.

The side chain flag is used to indicate whether a named torsion is a side chain torsion or a backbone torsion (in peptides). If it is a side chain torsion, then the side chain flag is set to 1 or 2; otherwise, the flag is set to 0. The out-of-plane flag is used to indicate whether a given atom is a central atom of an out-of-plane group. If it is a central atom of an out-of-plane group, then the out-of-plane flag is set to 1; otherwise, the flag is set to 0. This information is used in locating and assigning internal coordinates and their associated potential function parameters.

The next two items specified for each atom in the residue library are:

1.   The potential function atom type.

2.   The partial atomic charge.

These atom types are used at run time to select parameters for the internal coordinates from the associated potential function parameter library, cvff.frc, amber.frc, or cff91.frc. The partial atomic charge is used directly in the calculations of the Coulombic contribution to the nonbond potential energy.

The final two fields are used to identify the charge group to which the atom belongs and whether or not this atom is the switching atom (i.e., the atom used to decide whether this charge group is within the cutoff distance for nonbond calculations). A portion of Insight's residue library is given as an illustration in Portion of Insight's Residue Library. (Note that the first two lines shown below are used merely to label the column numbers; they are not part of the library.) The residue libraries are found in the directory pointed to by the environment variable $BIOSYM_LIBRARY.

Portion of Insight's Residue Library


(This excerpt does not include the beginning of the file.)
1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890


SD   CG   1.0           1.740   111.000  195.000 chi2 1 0 s    0.1200 csc  1   
CE SD 1.0 1.670 101.000 194.000 chi3 1 0 c3 -0.3200 csc 0
HE1 CE 1.0 1.080 110.000 300.000 0 0 h 0.1000 csc 0
HE2 CE 1.0 1.080 110.000 180.000 chi4 1 0 h 0.1000 csc 0
HE3 CE 1.0 1.080 110.000 60.000 0 0 h 0.1000 csc 0
PHE 20 Phenylalanine, polypeptide residue
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1
CA N 1.0 1.436 123.100 180.000 omeg 0 0 ca 0.1200 pepN 0
HN N 1.0 1.080 123.000 0.000 0 0 hn 0.2800 pepN 0
HA CA 1.0 1.080 110.000 300.000 0 0 h 0.1000 pepN 0
C CA 1.0 1.509 109.600 180.000 phi 0 1 c' 0.3800 pepC 1
O C 2.0 1.263 118.100 0.000 0 0 o' -0.3800 pepC 0
CB CA 1.0 1.554 111.600 60.000 0 0 c2 -0.2000 meB 1
HB1 CB 1.0 1.080 110.000 63.000 0 0 h 0.1000 meB 0
HB2 CB 1.0 1.080 110.000 183.000 0 0 h 0.1000 meB 0
CG CB 1.0 1.472 113.700 303.000 chi1 1 1 cp 0.0000 arG 1
CD1 CG 1.5 1.376 123.100 87.400 chi2 1 1 cp -0.1000 arD1 1
HD1 CD1 1.0 1.080 120.000 0.000 0 0 h 0.1000 arD1 0
CE1 CD1 1.5 1.368 122.600 180.000 0 1 cp -0.1000 arE1 1
HE1 CE1 1.0 1.080 120.000 180.000 0 0 h 0.1000 arE1 0
CZ CE1 1.5 1.388 118.900 0.000 0 1 cp -0.1000 arZ 1
HZ CZ 1.0 1.080 120.000 180.000 0 0 h 0.1000 arZ 0
CE2 CZ 1.5 1.380 120.600 0.000 0 1 cp -0.1000 arE2 1
HE2 CE2 1.0 1.080 120.000 180.000 0 0 h 0.1000 arE2 0
CD2 CE2 1.5 CG 1.5 1.376 118.000 0.000 0 1 cp -0.1000 arD2 1
HD2 CD2 1.0 1.080 120.000 180.000 0 0 h 0.1000 arD2 0
PHEn 21 Phenylalanine, neutral N-terminus
N N 1.0 0.000 0.000 0.000 0 0 n2 -0.5000 pepN 1
CA N 1.0 1.436 0.000 0.000 0 0 ca 0.1200 pepN 0
HN1 N 1.0 1.080 123.000 0.000 0 0 hn 0.1400 pepN 0
HN2 N 1.0 1.080 123.000 180.000 0 0 hn 0.1400 pepN 0
HA CA 1.0 1.080 110.000 120.000 0 0 h 0.1000 pepN 0
C CA 1.0 1.509 109.600 0.000 0 1 c' 0.3800 pepC 1
SD CG 1.0 1.740 111.000 195.000 chi2 1 0 s 0.1200 csc 1
CE SD 1.0 1.670 101.000 194.000 chi3 1 0 c3 -0.3200 csc 0
HE1 CE 1.0 1.080 110.000 300.000 0 0 h 0.1000 csc 0
HE2 CE 1.0 1.080 110.000 180.000 chi4 1 0 h 0.1000 csc 0
HE3 CE 1.0 1.080 110.000 60.000 0 0 h 0.1000 csc 0
PHE 20 Phenylalanine, polypeptide residue
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1
CA N 1.0 1.436 123.100 180.000 omeg 0 0 ca 0.1200 pepN 0
HN N 1.0 1.080 123.000 0.000 0 0 hn 0.2800 pepN 0
HA CA 1.0 1.080 110.000 300.000 0 0 h 0.1000 pepN 0
C CA 1.0 1.509 109.600 180.000 phi 0 1 c' 0.3800 pepC 1
O C 2.0 1.263 118.100 0.000 0 0 o' -0.3800 pepC 0
CB CA 1.0 1.554 111.600 60.000 0 0 c2 -0.2000 meB 1
HB1 CB 1.0 1.080 110.000 63.000 0 0 h 0.1000 meB 0
HB2 CB 1.0 1.080 110.000 183.000 0 0 h 0.1000 meB 0
CG CB 1.0 1.472 113.700 303.000 chi1 1 1 cp 0.0000 arG 1
CD1 CG 1.5 1.376 123.100 87.400 chi2 1 1 cp -0.1000 arD1 1
HD1 CD1 1.0 1.080 120.000 0.000 0 0 h 0.1000 arD1 0
CE1 CD1 1.5 1.368 122.600 180.000 0 1 cp -0.1000 arE1 1
HE1 CE1 1.0 1.080 120.000 180.000 0 0 h 0.1000 arE1 0
CZ CE1 1.5 1.388 118.900 0.000 0 1 cp -0.1000 arZ 1
HZ CZ 1.0 1.080 120.000 180.000 0 0 h 0.1000 arZ 0
CE2 CZ 1.5 1.380 120.600 0.000 0 1 cp -0.1000 arE2 1
HE2 CE2 1.0 1.080 120.000 180.000 0 0 h 0.1000 arE2 0
CD2 CE2 1.5 CG 1.5 1.376 118.000 0.000 0 1 cp -0.1000 arD2 1
HD2 CD2 1.0 1.080 120.000 180.000 0 0 h 0.1000 arD2 0
PHEn 21 Phenylalanine, neutral N-terminus
N N 1.0 0.000 0.000 0.000 0 0 n2 -0.5000 pepN 1
CA N 1.0 1.436 0.000 0.000 0 0 ca 0.1200 pepN 0
HN1 N 1.0 1.080 123.000 0.000 0 0 hn 0.1400 pepN 0
HN2 N 1.0 1.080 123.000 180.000 0 0 hn 0.1400 pepN 0
HA CA 1.0 1.080 110.000 120.000 0 0 h 0.1000 pepN 0
C CA 1.0 1.509 109.600 0.000 0 1 c' 0.3800 pepC 1

Version Number


123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_ 
2.2
Starting with Version 2.2 of the Discover program, the first line of the residue library must specify the version number. If a version record is missing, an old format style is assumed. For residue libraries following the format described here, the correct version number is 2.2. The version number must be a floating-point number in columns 10-15.

Header Card/First Line of a Residue

Columns 1-4, Residue Name

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
PHE 20 Phenylalanine, polypeptide residue

The 1- to 4- letter abbreviation for a residue (the residue name) is always found at the very beginning of the list of atoms for that residue. The next residue name follows the list of atoms for the previous residue. For example, in PHE begins the list of atoms pertaining to phenylalanine, and PHEn follows the list of atoms pertaining to phenylalanine, PHE (PHEn begins the list of atoms pertaining to the neutral N-terminus version of phenylalanine). Remember, the naming conventions are completely optional.

Columns 26-30, Number of Atoms

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
PHE 20 Phenylalanine, polypeptide residue

The number listed in the first line of a residue, in columns 26-30, represents the number of atoms in that residue. For example, the 20 atoms contained in PHE ().

Columns 36-45, pKa values

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
PHE 20 Phenylalanine, polypeptide residue

The number listed in the first line of a residue, in columns 36-45, represents the pKa value for that residue. Note that PHE in has no pKa value because it has no ionizable protons. The pKa values are used by Insight II to assign hydrogens by pH.

Columns 47-127, Residue comments

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789..
PHE 20 Phenylalanine, polypeptide residue

An optional brief description of a residue may be placed in columns 47-132.

Atom Cards/Second and Following Lines of a Residue

Columns 1-4, Residue Atoms

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

The names listed in the second and following lines of a residue define the atoms contained in that residue. The first field (columns 1-4) contains the atom name. This name must be unique within this residue. By convention, the atom name begins with the atomic symbol. Any choice of letters or numbers may be included in the atom name, so long as the total number of characters does not exceed 4.

Columns 6-9, Parent Atoms

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

Names listed in columns 6-9 are the atom names of this atom's parent (i.e., the name of the atom to which this atom is bonded). The parent atom must be defined before it can be used as a parent, unless the parent exists in a previous residue (see column 15).

Columns 11-13, Bond Order (may be set to 0.0 for Insight II)

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

The bond order information is used in generating automatic parameters when no explicit parameters are available. Currently the automatic parameter procedures recognize bond orders of 1.0, 1.5, 2.0, or 3.0, which correspond to single, partial double (automatic), double, and triple bonds, respectively.

Column 15, Atom Bonds to Previous Residue

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

An asterisk * indicates that the parent atom is found in the previous residue. For example, in the first line for the residue PHE (), N is bonded to C. The * denotes that the parent atom is not the C in PHE, but rather the C in the preceding residue.

Columns 16-19, Ring Closure Atoms (not required by Insight II)

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
CD2 CE2 1.5 CG 1.5 1.376 118.000 0.000 0 1 CP -0.1000 arD2 1

Names listed in columns 16-19 define ring closure bonds. For example, (in ) for PHE, CG closes the 6-atom ring, CG-CD2-CE2-CZ-CE1-CD1-CG. A ring closure atom only designates the atom that closes a ring and does not change other parameters in the atom card.

Columns 21-23, Ring Closure Bond Order (may be set to 0.0 for Insight II)

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
CD2 CE2 1.5 CG 1.5 1.376 118.000 0.000 0 1 CP -0.1000 arD2 1

A number in this field defines the bond order for ring closing bonds in the same way a number in columns 11-13 defines normal bonds.

Columns 25-29, Bond Distance Parameters (may be set to 0.0 for Insight II)

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

Numbers that appear in columns 25-29 represent atom-parent bond distance parameters (in angstroms). For example, in the residue PHE (), N-C * has a bond distance of 1.348 Å. Other examples, in PHE, include: CA-N = 1.436 Å, C-CA = 1.509 Å, O-C = 1.263 Å, CB-CA = 1.554 Å.

Columns 33-39, Valence Angle Parameters (may be set to 0.0 for Insight II)

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
CA N 1.0 1.436 123.000 180.000 omeg 0 0 ca 0.1200 pepN 0

Numbers that appear in columns 33-39 represent atom-parent-grandparent bond angles, or valence angles (in degrees). For example, in the residue PHE (), CA-N-C * has a valence angle of 123.100°. Other examples, in PHE, include: O-C-CA = 118.100°, HA-CA-N = 110.000°, and HZ-CZ-CE1 = 120.000°.

Columns 42-48, Torsion Angle Parameters (may be set to 0.0 for Insight II)

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
C CA 1.0 1.509 109.600 180.000 phi 0 1 c' 0.3800 pepC 1

Numbers that appear in columns 42-48 represent atom-parent-grandparent-greatgrandparent bond angles, or torsion angles (in degrees). For example, in PHE (), C-CA-N-C * has a torsion angle of 180.000°. Other examples, in PHE, include: O-C-CA-N = 0.000°, CG-CB-CA-N = 303.000°, and HB1-CB-CA-N = 63.000°.

Columns 50-53, Torsion Angle Names

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

Columns 50-53 are reserved for torsion angle names, which correspond to the torsion angle parameters in columns 42-48. Each name represents a torsion angle between four specific atoms.

Column 55, Side Chain Flag (may be set to 0 for Insight II)

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

Column 55 can have only two values, 0 and 1. This flag has a value of 0 if the named torsion angle is a main chain (backbone) torsion. This flag has a value of 1 if the named torsion angle is a side chain torsion. For example, in PHE (), the flag has a value of 0 for the torsion angles corresponding to psi, omeg, and phi, which are main chain torsions. The flag has a value of 1 for the torsion angles corresponding to chi1 and chi2, which are sidechain torsions.

Column 57, Out-of-Plane Flag

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

Column 57 can have only three values, 0, 1, and 2. A value of 1 indicates that the present atom is a central atom of an out-of-plane group; thus, any central atom that has a potential to move out of the plane of the bond is flagged with a value of 1. A value of 2 is used for AMBER atom types to indicate the use of standardized ordering for the pseudotorsion atoms.

Columns 59-60, Potential Function Atom Types

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

Names listed in columns 59-61 represent potential function atom types. For example, n is the atom type name for the atom N and can be interpreted as an amide nitrogen. The potential atom type is the primary link into the forcefield parameters and so determines the chemistry of each atom (the internal force constants, nonbond interactions, etc.).

Columns 63-69, Partial Atomic Charges

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

Numbers listed in columns 63-69 are the partial atomic charges in electrons for the corresponding atoms in columns 1-4. Examples of partial charge values, in PHE (), are N = -0.50 e, CA = 0.12 e, O = -0.38 e, and HE1 = 0.10 e. Each atom can have a unique partial atomic charge.

Columns 71-74, Charge Group Name

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

Each atom must be associated with a charge group. This charge group is used during nonbond calculations if a cutoff distance has been specified. If the distance between switching atoms of two charge groups is less than a cutoff distance, then the interactions between all atoms within each charge group is calculated. Charge group names are arbitrary, but must be unique within the residue. In general, charge groups should be as close to neutral as possible (unless the group is charged). This prevents cutting off only part of a dipole, which has the undesirable effect of creating transient monopoles during a calculation.

Column 76, Switching Atom Flag

123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_
N C 1.0 * 1.348 116.300 180.000 psi 0 1 n -0.5000 pepN 1

This flag is 1 if the atom is the switching atom (see above) of the charge group, otherwise it is 0. Although the choice of the switching atom is arbitrary, the atom closest to the geometric center of the group is recommended. Finally, atoms belonging to the same group must be contiguous in the residue library entry.

Terminal Card/EOF Last Line of Residue Library

The input for the entire residue library ends with the word elib in columns 1-4 of the final line of the file.

Specific Format for Insight's Residue Library

Insight's default residue library contains experimental data for the 20 standard amino acid residues and for other selected residues. The internal coordinate information that is contained in the residue library is meant to reflect, as much as possible, experimental structural information obtained from X-ray diffraction. Information on the original structural data is provided in Hagler et al. (1978, 1979a-c, 1985). (For complete references, refer to the Insight II references list.) For each residue, the information provided in the residue library consists of:

1.   A 1- to 4-letter abbreviation for the residue.

2.   A list of atoms.

3.   A list of parent atoms, internal coordinates (bond lengths, valence bond angles, torsion angles), selected torsion names, potential atom types, and partial atomic charges.

Specific naming conventions are given for residue names, residue atoms, torsion names, and potential function atom types.

Residue Names

The 20 standard amino acid residues are listed in alphabetical order. Other selected residues are then listed in random order. The standard 3-letter abbreviations are used to represent the standard amino acids with no terminal ends--the residues that have no caps and can exist internally in a protein chain (i.e., GLY for glycyl, TYR for tyrosyl). Positively charged amino acids, with no caps, have a plus (+) sign in column 4 of the residue name (i.e., ARG+ for positively charged arginyl). Negatively charged amino acids, with no caps, have a minus (-) sign in column 4 of the residue name (i.e., ASP- for negatively charged aspartyl).

Capped residues are represented by an N in column 4 for an amino terminal NH3 (e.g., GLYN for NH3-glycyl-); an n in column 4 for an amide terminal NH2 (e.g., GLYn for NH2-glycyl-); or a C in column 4 for a carbonyl terminal COO- (e.g., GLYC for -glycine). Table 30 shows a summary of these residue naming conventions.

Table 30. General Reference for Residue Names in Insight's Residue Library

Residue Name Description
res   Internal neutral residue.  
res+   Positively charged residue.  
res-   Negatively charged residue.  
resN   Charged amino terminal NH3+.  
resn   Neutral amino terminal NH2.  
resC   Charged carboxyl terminal COO-.  

Residue Atom Names

Atom names are based on the Brookhaven naming convention, using the Greek alphabet. The backbone atoms are represented by N, CA, C, and O; then all the remaining atoms (the side chain atoms) are named in order from the alpha carbon CA. Table 31 includes examples of residue atom names and their corresponding Greek letters.

Table 31. Greek Letter Notation Used in Insight's Residue Library for Residue Atom Names

Greek Letter Greek
Name
Residue Library
Letter
Examples of
Atom Names
a   alpha   A   CA, HA  
b   beta   B   CB, HB  
g   gamma   G   CG, HG, OG, SG  
d   delta   D   CD, HD, OD, ND, SD  
e   epsilon   E   CE, HE, OE, NE, SE  
z   zeta   Z   CZ, HZ, OZ, NZ  
h   eta   H   CH, HH, OH, NH  

If there are two or more atoms in the same position relative to CA, the atoms are given number representations. For example, in PHE (), two carbons are in the delta position; they are thus designated CD1 and CD2.

Torsion Angle Names

Torsion angle names are provided for selected torsion angles. Table 32 includes the torsion angle names and their corresponding torsion angles found in Insight's residue library. Note that a torsion name is associated with the residue in which the grandparent atom is found.

Table 32. Torsion Angle Names Included in Insight's Residue Library

Torsional Angle Name Atoms Included in Angle1
phi (i)   Ci-CAi-Ni-Ci-1  
psi (i)   Ni+1-Ci-CAi-Ni  
omeg (i)   CAi-Ci-Ni+1-CAi+1  
chi1 (i1)   Ni-CAi-CBi-CGi  
chi2 (i2)   CAi-CBi-CGi-CDi  
chi3 (i3)   CBi-CGi-CDi-CEi  
1 Ci-1 means this atom exists in the previous residue.

Definitions

Parents

A parent of an atom completes a bond between two atoms.

The atoms in columns 6-9 represent bonds to the corresponding atoms in columns 1-5. The atoms in columns 6-9 are thus parents of the corresponding atoms in columns 1-5. For the residue PHE (in ), C * is the parent of N, N is the parent of CA, CA is the parent of C, C is the parent of O, CA is the parent of CB, etc. An atom can be a parent of one or more atoms.

Grandparents

A grandparent of an atom completes a valence angle between three atoms. In , for the residue PHE, CA is the parent of CB and N is the parent of CA, so N is the grandparent of CB. N completes a valence angle between the three atoms CB-CA-N. N is also the grandparent of C; thus, N also completes a valence angle between the three atoms C-CA-N. Other examples of valence angles, specified for PHE, include: O-C-CA, CG-CB-CA, HA-CA-N, HZ-CZ-CE1.

Greatgrandparents

A greatgrandparent of an atom completes a torsion angle between four atoms. In , for the residue PHE, CB is the parent of CG (CG-CB) and CA is the grandparent of CG (CG-CB-CA), so N is the great-grandparent of CG. N completes a torsion angle between the four atoms, CG-CB-CA-N. Other examples of torsion angles, specified for PHE, include: O-C-CA-N, C-CA-N-C *, CZ-CE1-CD1-CG, HB1-CB-CA-N.


Distance, Torsion, Chiral, and NOE Volume Restraints (.rstrnt)

The .rstrnt file contains descriptions of the restraints to be applied during a minimization or dynamics calculation. The .rstrnt file replaced the .noe file that was used in early versions of Discover. The .rstrnt file has three sections. The distance and dihedral sections specify the upper and lower bounds for applying the restraint, as well as the force constants for the biharmonic restraining force outside this range. The chiral section specifies the chirality to be achieved at asymmetric centers.

The .rstrnt file is an ASCII file that can be created by you or written by NMRchitect. The file can contain the following records:

Description of Sections

Header Record

The header record must be the first record in the file and contain:


!BIOSYM restraint n

where n is an integer (usually 1). Discover then interprets the file as being an ASCII file containing restraint records as outlined here.

Comment Record

Comment lines begin with an exclamation mark (!) and may occur anywhere after the first record.

Section Identifiers

Section identifiers must start in column 1 with a pound sign (#). All non-comment records that come after a section identifier and before the next section identifier (or the end of the file) are assumed to be records appropriate to that section.

The identifier lines introducing the sections are:


#remote_prochiral_centers


#chiral


#distance


#NOE_distance


#NOE_distance_overlapped


#mixing_times 


#NOE_volume


#NOE_volume_overlapped


#NMR_dihedral


#3J_dihedral

Atom Specification

The restraints records use the following syntax for selecting particular atoms and pseudoatoms, and for defining pseudoatoms.


molecule#:residuename_residue#:atomname

where the molecule number, residue name, residue number, and atom name are as defined in the .mdf file. Colons (:) and underscores (_) are used to delimit these numbers and names as shown.

The atom name can be that of an actual atom, a pseudoatom defined in the atom set section of the .mdf file or a pseudoatom defined using the define average command in Discover.

A previously undefined pseudoatom can be referenced with wildcards or a list. Wildcards can be used for pseudoatoms consisting of atoms in the same residue if all these atoms have names beginning with some common characters. For example, if atoms 1:ASN_2:HB1 and 1:ASN_2:HB2 are present, then 1:ASN_2:HB* defines a pseudoatom consisting of these two atoms. The asterisk wildcard can match strings of any length. These two atoms can also be referred as a list, that is,
1:ASN_2:HB1,HB2. In the list syntax, atom names are separated by commas without intervening spaces.

The pseudoatom is defined when the wildcard or list appears for the first time in the .rstrnt file. Thereafter, this pseudoatom is used whenever the same pattern appears.

One of a pair of prochiral hydrogens can be selected by using its prochiral specification. For example, on encountering the atom name HBS, Discover looks in the specified residue to find two atoms with names HB1 and HB2, determines their prochirality, and selects the pro-S atom to be used in the restraint. Similarly, on encountering HGR*, Discover looks for two pseudoatoms with names HG1* and HG2*, creates the pseudoatoms if necessary from (HG11,HG12,HG13) and from (HG21,HG22,HG23), and then selects the pro-R pseudoatom to use in the restraint. In each case, the character R (or S) is replaced with 1 or 2 and pro-R (or pro-S) is selected. Wildcards are allowed in this context.

Prochirality can be determined only if the molecular data file contains the priority sequence of the substituents at each prochiral center. If the prochiral atoms are not directly bonded to the prochiral center, the remote_prochiral_centers section of the restraints file should contain an entry indicating how these atoms are connected.

Record Format

In each section of the restraints file, data records appropriate to that section follow its identifier line. Within a record, the data is in free format, which means that at least one blank space is required between fields and that each field must contain a non-blank entry. All fields must be specified--no blank fields are allowed, except for trailing blank fields, which are read as zeroes.

The contents of each record are described in the following sections and tables.

Remote Prochiral Centers Section

When prochiral atoms are separated from the prochiral center by more than one bond, they must be listed in this section prior to using their prochiral specification in any restraint record. The identifier line for this section is:


#remote_prochiral_centers

The identifier is followed by records as shown in Table 33. The atom specifications in this section should not use any prochiral specification, since that would lead to a cyclic definition.

Table 33. Remote Prochiral Center Definition

field# contents comments
1   atom specification   one of the prochiral atoms  
2   atom specification   the second prochiral atom  
3   atom specification   the atom bonded to the first prochiral atom that leads to the prochiral center  
4   atom specification   the atom bonded to the second prochiral atom that leads to the prochiral center  
5   atom specification   the prochiral center  

Sample:


#remote_prochiral_centers 
1:VAL_8:HG1* 1:VAL_8:HG2* 1:VAL_8:CG1 1:VAL_8:CG2 1:VAL_8:CB

Chirality Restraints Section

The section identifier is:


#chiral

The records in the chirality restraints section specify chirality around asymmetric centers, as shown in Table 34.

Table 34. Chirality Restraints Definition

field# contents comments
1   atom specification   the asymmetric center  
2   S or R   one character representing the desired chirality at the center  

Sample:


#chiral
1:THRN_1:CA S
1:ILE_35:CB S

Distance Restraints Section

The section identifier is:


#distance

The distance restraints section specifies upper and lower bounds for distances between pairs of atoms, force constants, and a limit for the force, using the format shown in Table 35.

Table 35. Distance Restraints Definition

field# contents comments
1   atom specification   one of the atoms in the pair  
2   atom specification   the other atom  
3   lower bound*   the smallest separation allowed between the pair of atoms (in angstroms)  
4   upper bound   the greatest separation allowed between the pair of atoms (in angstroms)  
5   KL   force constant applied when atoms are closer than the lower bound (kcal mol-1 Å-2)  
6   KU   force constant applied when atoms are farther apart than upper bound (kcal mol-1 Å-2)  
7   maximum force   limit on the magnitude of force (kcal mol-1 Å-1)  
* a value of -1.0 signifies that no lower bound information is available, and that the sum of the van der Waals radii will be used instead.

Example:


#distance
1:AR+N_1:CA 1:ASP-_3:CA 4.700 7.200 1.00 1.00 1000.000
1:PRO_2:CA 1:PHE_4:CA 4.700 7.200 1.00 1.00 1000.000

NOE Distance Restraints Section

The section identifier is:


#NOE_distance

This section contains distance restraints derived from NOE data. These restraints are the same as the restraints in the distance restraints section. However, the records in this section have an expanded format (Table 36), to contain additional data relevant to NOE analysis

Table 36. NOE Distance Restraints Definition

field# contents comments
1   atom specification   one of the atoms in the pair  
2   atom specification   the other atom  
3   lower bound   the smallest separation allowed between the pair of atoms (in angstroms)  
4   upper+correction   the greatest separation allowed between the pair of atoms (in angstroms)  
5   upper bound   not currently used in Discover  
6   KL   force constant applied when atoms are closer than the lower bound (kcal mol-1 Å-2)  
7   KU   force constant applied when atoms are farther apart than upper bound (kcal mol-1 Å-2)  
8   maximum force   limit on the magnitude of force (kcal mol-1 Å-1)  
.

Sample:


#NOE_distance
!ATOM #1 ATOM #2 Distance Force Constant Max
! Lower Upper Upper Lower Upper Force
! + correction
1:CYS_3:HA 1:CYS_4:HN 2.00 3.00 3.00 1.000 1.000 1000.0
1:GLY_31:HA* 1:CYS_32:HN 3.00 5.00 4.00 1.000 1.000 1000.0
1:SER_6:HBR 1:ILE_7:HN 2.00 3.00 3.00 1.000 1.000 1000.0
1:VAL_8:HGS* 1:VAL_8:HA 3.00 5.00 4.00 1.000 1.000 1000.0

NOE Overlapped Distance Restraints Section

The section identifier is:


#NOE_distance_overlapped

This section contains overlapped distance restraints derived from NOE data. The first line of each of these restraints shares almost the same format as the NOE distance restraints. The only difference is that the column corresponding to the pseudo atom correction is absent in the overlapped restraint category. To assign multiple pairs of protons to the same restraint, one can put one additional pair per line with the continuation symbol "+" in the first column of succeeding lines.

Table 37. NOE Overlapped Distance Restraints Definitions

First line of the overlapped distance restraint:
field # contents comments
1   atom specification   one of the atoms in the pair  
2   atom specification   the other atom  
3   lower bound   the smallest effective separation allowed between the pairs of atoms (in angstroms)  
4   upper+correction   the greatest effective separation allowed between the pairs of atoms (in angstroms)  
6   KL   force constant applied when effective distance is smaller than the lower bound (Kcal mol-1 Å-2)  
7   KU   force constant applied when effective distance is bigger than the upper bound (Kcal mol-1 Å-2)  
8   maximum force   limit on the magnitude of force (Kcal mol-1 Å-1)  
.

Table 38. NOE Overlapped Distance Restraints Definition

Succeeding line of the overlapped distance restraint:
field# contents comments
1   Continuation Symbol   a "+" sign in the first column indicates a continuation of the definition of the multiple spin pairs in the same restraint.  
2   atom specification   the 1st atom  
3   atom specification   the 2nd atom  
.

Sample:


#NOE_distance_overlapped
!ATOM #1 ATOM #2 Effective Distance Force Constant Max
! Lower Upper Lower Upper Force
1:CYS_3:HA 1:CYS_4:HN 2.00 5.00 1.000 1.000 1000.0
+ 1:GLY_31:HA* 1:CYS_32:HN
+ 1:SER_6:HBR 1:ILE_7:HN

Mixing Times Restraints Section

The section identifier is:


#mixing_times 

Each field contains the value of the mixing times (in seconds) at which the subsequent NOE_Volume restraints were determined.

The format of each entry is as shown in Table 39.

Table 39. Mixing Times Restraints Definition

field# contents comments
1   tmix1   mixing time 1  
2   tmix2   mixing time 2  
m   tmixm   mixing time m  

Sample:


#mixing_times 
0.05 0.1 0.15 0.2
The sample specifies that the subsequent volume entries are associated with mixing times of 50, 100, 150, and 200 ms.

NOE Volume Restraints Section

The section identifier is:


#NOE_volume

This section contains NOE peaks volume restraints derived from experimentally measured NOE peak volumes or integrals. In the direct NOE refinement scheme, the volume restraints are compared to theoretical NOE volumes calculated for the current model structure. The number of fields will be 2m + 4, where m is the number of mixing times.

The format of each entry is as shown in Table 36.

Table 40. NOE Volume Restraints Definition

field# contents comments
1   atom specification   one of the atoms in the pair  
2   atom specification   the other atom  
3   NOE volume in LB 1   lower bound on the NOE volume for mixing time 1  
4   NOE volume in UB 1   upper bound on the NOE volume for mixing time 1  
5   NOE volume in LB 2   lower bound on the NOE volume for mixing time 2  
6   NOE volume in UB 2   upper bound on the NOE volume for mixing time 2  
2m+1   NOE volume in LB m   lower bound on the NOE volume for mixing time m  
2m+2   NOE volume in UB m   upper bound on the NOE volume for mixing time m  
2m+3   KL   lower bound force constant  
2m+4   KU   upper bound force constant  

Sample:


#NOE_volume


1:GLY_2:HAR 1:PHE_HD* 0.075 0.125 0.175 0.225 0.275 0.325 0.375 0.425 40 80 
1:ALA_4:HB* 1:CYS_10:HAR -999.0 -999.0 0.175 0.225 0.275 0.325 0.375 0.425 40 80..

NOE Overlapped Volume Restraints Section

The section identifier is:


#NOE_volume_overlapped

This section contains NOE peaks volume restraints derived from experimentally measured overlapped NOE peak volumes or integrals. In the direct NOE refinement scheme, the volume restraints are compared to theoretical NOE volumes calculated for the current model structure. The first line of each overlapped restraint shares the same format as that of the non overlapped case. The number of fields will be 2m + 4, where m is the number of mixing times. Each succeeding line then adds a spin pair to the definition of the overlapped peaks.

The format of each entry is as shown in Table 36.

Table 41. NOE Volume Restraints Definition

First line of the restraint:
field # contents comments
1   atom specification   one of the atoms in the pair  
2   atom specification   the other atom  
3   NOE volume in LB 1   lower bound on the NOE volume for mixing time 1  
4   NOE volume in UB 1   upper bound on the NOE volume for mixing time 1  
5   NOE volume in LB 2   lower bound on the NOE volume for mixing time 2  
6   NOE volume in UB 2   upper bound on the NOE volume for mixing time 2  
2m+1   NOE volume in LB m   lower bound on the NOE volume for mixing time m  
2m+2   NOE volume in UB m   upper bound on the NOE volume for mixing time m  
2m+3   KL   lower bound force constant  
2m+4   KU   upper bound force constant  

Table 42. NOE Overlapped Volume Restraints Definition

Succeeding line of the overlapped volume restraint:
field# contents comments
1   Continuation Symbol   a "+" sign in the first column indicates a continuation of the definition of the multiple spin pairs in the same restraint.  
2   atom specification   the 1st atom  
3   atom specification   the 2nd atom  
.

Sample:


#NOE_volume


1:GLY_2:HAR 1:PHE_HD* 0.075 0.125 0.175 0.225 0.275 0.325 0.375 0.425 40 80 
+ 1:ALA_4:HB* 1:CYS_10:HAR

NMR Dihedral Restraints Section

The section identifier is:


#NMR_dihedral

Each record specifies a range for a dihedral angle and the force constants for the biharmonic restraint force (Table 43).

Table 43. NMR Dihedral Restraints Definition

field# contents comments
1   atom specification   four atoms defining the dihedral angle, listed in bonding sequence  
2   atom specification  
3   atom specification    
4   atom specification    
5   lower bound   the smallest dihedral angle allowed (degrees)  
6   upper bound   the greatest dihedral angle allowed (degrees)  
7   KL   force constant applied when angle is too small (kcal mol-1 rad-2)  
8   KU   force constant applied when angle is too large (kcal mol-1 rad-2)  
9   maximum force   limit on the magnitude of force (kcal mol-1 rad-1)  

Sample:


#NMR_dihedral
1:CYS_4:C 1:PRO_5:N 1:PRO_5:CA 1:PRO_5:C -120.0 -60 50.0 50.0 500.0

3J Coupling Dihedral Section

The section identifier is:


#3J_dihedral

Each record (see Table 44) specifies up to four ranges of dihedral angles and two force constants for the multiple-interval biharmonic restraining force. Lower and upper bounds of angles are specified for each interval. However, the same force constant is applied for all deviations from any lower bound and another force constant for all deviations from any upper bound.

Table 44. 3J Dihedral Restraints Definition

field# contents comments
1   atom specification   four atoms defining the dihedral angle, listed in order of bonding  
2   atom specification  
3   atom specification  
4   atom specification  
5   3J   not used by Discover--for NMR reference  
6   3J   not used by Discover--for NMR reference  
7   KL   force constant associated with all lower bounds  
8   KU   force constant associated with all upper bounds  
9   maximum force   limit on magnitude of force (kcal mol-1 radian-1)  
10   lower bound for first range  
11   upper bound for first range  
12   lower bound for second range  
13   upper bound for second range  
14   lower bound for third range  
15   upper bound for third range  
16   lower bound for fourth range  
17   upper bound for fourth range  

Sample .rstrnt File


!BIOSYM restraint 1
!
#remote_prochiral_centers
1:LEU_6:HD2* 1:LEU_6:HD1* 1:LEU_6:CD2 1:LEU_6:CD1 1:LEU_6:CG
!
#chiral
1:AR+N_1:CA S
1:PRO_2:CA S
!
#distance
1:AR+N_1:CA 1:ASP-_3:CA 4.700 7.200 32.00 32.00 1000.000
1:PRO_2:CA 1:PHE_4:CA 4.700 7.200 32.00 32.00 1000.000
!
#NOE_distance
!ATOM #1 ATOM #2 Distance Force Constant Max
! Lower Upper Upper Lower Upper Force
! + Correction
1:AR+N_1:HA 1:AR+N_1:HG1 -1.000 4.000 4.000 32.00 32.00 1000.000
1:MET_52:HG1 1:MET_52:HE* -1.000 5.000 4.000 32.00 32.00 1000.000
!
#NMR_dihedral
1:ASP-_3:N 1:ASP-_3:CA 1:ASP-_3:CB 1:ASP-_3:CG -120.000 0.000 30.00 30.00 1000.000
1:CYS_55:N 1:CYS_55:CA 1:CYS_55:CB 1:CYS_55:SG -120.000 0.000 30.00 30.00 1000.000
!
#3J_dihedral
1:ASP-_3:HN 1:ASP-_3:N 1:ASP-_3:CA 1:ASP-_3:HA 3.98 1.00 30.00 30.00 1000.000
1:AR+N_1:HA 1:AR+N_1:CA 1:AR+N_1:CB 1:AR+N_1:HB1 6.73 1.00 30.00 30.00 1000.000 23.5 47.9 119.4 140.3 -140.3 -119.4 -47.9 -23.5 !A=9.500, B=-1.600, C=1.800
1:PRO_2:HA 1:PRO_2:CA 1:PRO_2:CB 1:PRO_2:HB1 9.02 1.00 30.00 30.00 1000.000 131.8 153.2 -153.2 -131.8 -31.1 31.1 !A=9.500, B=-1.600, C=1.800
1:ASP-_3:HA 1:ASP-_3:CA 1:ASP-_3:CB 1:ASP-_3:HB1 12.51 1.00 30.00 30.00 1000.000 153.5 -153.5 !A=9.500, B=-1.600, C=1.800
1:PRO_8:HA 1:PRO_8:CA 1:PRO_8:CB 1:PRO_8:HB1 2.61 1.00 30.00 30.00 1000.000
1:PRO_9:HA 1:PRO_9:CA 1:PRO_9:CB 1:PRO_9:HB1 2.23 1.00 30.00 30.00 1000.000


Residue Topology File (.rtf)

The Residue Topology File (RTF) is used by CHARMm to generate a Principle Structure File. An RTF describes the molecular topology of each residue in a structure's sequence. It contains information necessary to compute the energy of the molecule, as well as perform other calculations based on the empirical energy function in CHARMm. Atom names, types, masses, partial atomic charges, definitions of bonds, angles, dihedrals, and improper torsion angles are included in an RTF. Coordinates are not included, nor is sequence data.

For detailed descriptions of the RTF file, see the CHARMm stand-alone documentation, which is available on the MSI Documentation CD-ROM or at the MSI web site:

http://www.msi.com/doc


Torsion File (.scs_tor)

The Search_Compare utility scs_xdrtor (see Appendix D, Utilities, in the Search_Compare User Guide) produces a file called run_name.scs_tor.

The format of the .scs_tor files is described and illustrated with examples in the following section.

The .xdr_tor file contains the same information as the .scs_tor file, in a compact format that conserves disk space by up to 95% relative to an ASCII file. XDR files can be shared by programs running on various platforms (IRIS, IBM, ESV, etc.). A utility called scs_xdrtor is provided to inter-convert these two file types (see Appendix D, Utilities, in the Search_Compare User Guide).

The direct output of a search is an .xdr_tor file. It is not readily user-readable, but can be converted to an ASCII file by the utility scs_xdrtor. The .scs_tor file then contains the search information, including the defined rotatable bonds, the search order, implicit torsions and ring closure bonds (if rings are involved in the search), the number of atoms, the number of conformers, the anchor atom (if requested), the energy values (if generated), and the values of the torsion angles. The general format of the .scs_tor output file is similar to that of the .scs_prm input file. The .scs_tor file is described below, then illustrated by example.

Format of the Torsion File

The .scs_tor file consists of one header and sections for each type of information that it contains. The sections begin with the # character and a section keyword, and they end with the next instance of the # character, or at the end of the file. The following are valid section keywords:

In addition, comment lines, beginning with !, may occur anywhere after the first record.

Description of Sections

Header Record

The first record is a header that defines the file as an .scs_tor file. It must be:


!BIOSYM scs_torsion 1.2

The ! must be the first character in the file. The 1.2 is the file version number. Version 1 and 1.1 files can still be read.

Rotatable Bond Section

The rotatable bond section begins with:


#rotatable_bonds

All rotatable bonds are defined by the four atoms constituting the torsion angle. This information is contained in the file in the following format:


RB# atom1 atom2 atom3 atom4

RB# signifies the name of each rotatable bond, where the # indicates an integer number, and each of the four atoms defining the rotatable bonds is specified in the conventional Insight II manner:


molecule_name:residue_name:atom_name


 

Search Order Section

The search order section begins with:


#search_order

This section consists of one line that lists the rotatable bonds (as RB#) in the order in which the search was performed.

Implicit Torsion Section

The implicit torsion section begins with:


#implicit_torsions

This section exists only if the search includes ring bonds. The section consists of one line that lists the rotatable bonds (as RB#) that are implicit torsions.

Closure Bond Section

The closure bond section begins with


#closure_bond

This section exists only if the search includes ring bonds. All closure bonds are defined by two atoms forming the ring bonds. The format of this information in the file is:


atom1 atom2

The atom is specified by:


molecule_name:residue_name:atom_name

Number of Atoms Section

This section begins with:


#nb_atoms

This section consists of a single number indicating the total number of atoms in the molecule.

Number of Conformers Section

This section begins with:


#nb_conformers

This section consists of a single number indicating the total number of conformers found by the search. It comes before the torsion values section.

Anchor Atom Section

The anchor atom section, if present, begins with:


#anchor_atom

This section exists only if there is a user-specified anchor atom and gives the name of the anchor atom. It comes before the torsion values section. If an anchor atom section is not present, this means that no anchor atom was defined by you. The anchor atom name is specified in the conventional Insight manner:


molecule_name:residue_name:atom_name


 

Energy Section

The energy section, if present, consists of:


#energy

This section consists only of the section keyword, which functions as a flag indicating that there is an energy value after the value of the torsion for each conformer in the torsion values section. This keyword must come before the torsion values section. The complete format is:


#energy
!
#torsion_values
angle1 angle2 ... anglen-1 number_of_values
anglen1 energyn1
anglen2 energyn2
...

Torsion Values Section

The torsion values section begins with:


#torsion_values

This section lists the values for the (n - 1) rotatable bonds and the number of values (number of lines that follow) for the nth rotatable bond. The format is:


angle1 angle2 ... anglen-1 number_of_values
anglen1
anglen2
...

Sample .scs_tor File


!BIOSYM scs_torsion 1.2
!
#rotatable_bonds
RB0 CAPTO_ANALOG:1:N1 CAPTO_ANALOG:1:C8 CAPTO_ANALOG:1:C9 CAPTO_ANALOG:1:O4
RB1 CAPTO_ANALOG:1:C CAPTO_ANALOG:1:N1 CAPTO_ANALOG:1:C8 CAPTO_ANALOG:1:C9
RB2 CAPTO_ANALOG:1:C2 CAPTO_ANALOG:1:C CAPTO_ANALOG:1:N1 CAPTO_ANALOG:1:C8
RB3 CAPTO_ANALOG:1:N CAPTO_ANALOG:1:C2 CAPTO_ANALOG:1:C CAPTO_ANALOG:1:N1
!
#search_order
RB3 RB2 RB1 RB0
!
#nb_atoms
32
!
#nb_conformers
24
!
#torsion_values
-30.000000 -90.000000 -60.000000 1
60.000

150.000000 -30.000000 -60.000000 2
150.000
0.000

150.000000 -60.000000 -30.000000 1
120.000

150.000000 -60.000000 -60.000000 2
90.000
60.000

150.000000 -60.000000 90.000000 1
-150.000

150.000000 -90.000000 120.000000 1
-90.000

150.000000 -90.000000 60.000000 3
-90.000
-120.000
90.000

150.000000 -90.000000 30.000000 1
120.000

150.000000 -120.000000 90.000000 1
-60.000

150.000000 -120.000000 60.000000 1
-60.000

90.000000 -90.000000 90.000000 1
30.000

90.000000 90.000000 -60.000000 1
-90.000

90.000000 60.000000 -90.000000 2
-30.000
150.000

90.000000 60.000000 -120.000000 4
120.000
90.000
60.000
30.000

90.000000 30.000000 -90.000000 1
150.000

60.000000 30.000000 60.000000 1
-150.000

Binary Torsion File (.xdr_tor)

Search_Compare uses several kinds of input files:

The run_name.xdr_tor or run_name.scs_tor file is used when a prior search is post-processed (as an energy or distance map). This is indicated by the presence of the keyword use_prior_search in the #scs_commands section of the .scs_prm file.

Search_Compare generates several kinds of output files:

In addition, the utility scs_xdrtor (see Appendix D in the Search_Compare User Guide) produces a file called run_name.scs_tor.


Structure Data File (.sd )

The input files recognized by Converter have the extension .sd, for MDL structure data file. Apex-3D also recognizes this file format. The output data files produced by Converter also have the extension .sd.

For an explanation of the .sd format from MDL, see your MDL product documentation.


Amino Acid Sequence Files (.seq)

A single amino acid sequence can be read into the Homology or Consensus module from a file with the Get Sequences Single command. The file is a text file containing the sequence characters in lines of no more than 80 characters each. The filename should end in the extension .seq, since only files with this extension are listed in the value-aid in the Get Sequences Single command. The sequence characters must be the standard single-letter amino acid codes in upper or lowercase. No other characters of any kind are allowed. The meanings of the single-letter codes are:

Table 45. Amino Acid Single-Letter Codes

A = ALA   G = GLY   M = MET   S = SER  
C = CYS   H = HIS   N = ASN   T = THR  
D = ASP   I = ILE   P = PRO   V = VAL  
E = GLU   K = LYS   Q = GLN   W = TRP  
F = PHE   L = LEU   R = ARG   Y = TYR  

Sample .seq File

Here is an example of an amino acid sequence in the correct format for a sequence file:


VMTQSPSSLSVSAGERVTMSCKSSQSLLNSGNQKNFLAWYQQKPGQPPKLIYGASTRESGVPDRFTGSGSGTDFTLTISS
VQAEDLAVYYC


Subset Definition File (.sub)

Subset definition files are created by the Put Subset command in Insight II and can be read back in using the Get Subset command. These files have the extension .sub.

The format of the file consists of the "atomset" section of the molecular data file (.mdf). Refer to that section of this book for a detailed description of the file format, beginning on page 49.

Sample Subset Definition File


!BIOSYM subset_data 4

#atomset

@degree 6 subset CRN_RINGS
CRN:PHE_13:CG CD1 CD2 CE1 CE2 CZ

CRN:TYR_29:CG CD1 CD2 CE1 CE2 CZ

CRN:TYR_44:CG CD1 CD2 CE1 CE2 CZ

#end


Table Files (.tab)

In the Table/Put command, the table is written out in the file format shown below. Each value in the table is written to the file in column major order, with tabs separating the values.

Sample .tab File


1.100000	2.200000	3.400000	3.500000	4.400000
1.100000 2.200000 3.400000 3.500000 4.400000
1.100000 2.200000 3.400000 3.500000 4.400000
1.100000 2.200000 3.400000 3.500000 4.400000


Graph Data Files (.tbl)

This section provides a description of the file format needed for the creation of graphs using the Graph/Get command.

The top of the file is where informational text and comments may appear. There is no limit to the number of comment lines; however, when a title delimiter (#) appears as the first character in the line, function specifications must follow on the next line.

A function specification record follows each delimiter. Each field of information is preceded by its identifier in capital letters. The order of the fields must not vary from the order specified here.

Thus, a function specification might look like the following:


# 
TITLE: Time in ps
MEASUREMENT TYPE: Time
UNITS OF MEASUREMENT: ps
FUNCTION: Time
Each column of values in the file must be identified by a function specification record. The first function specification record corresponds to the first column of values, the second specification corresponds to the second column, and so on.

There must be two delimiters following the last function specification record, each on a separate line. These two delimiters signify that the axis function values follow. There is no limit to the number of axis functions that may be written to a file.

The number of observables for each axis function may vary; however, the number of rows for each column must be equal. To accomplish this, columns may be padded with asterisks following the last value. The asterisks serve as placeholders, so that when the file is read, the values are associated with the correct axis function. Numeric values may not follow an asterisk in any column (i.e., holes in the data are not allowed).

When two axis functions that do not have an equal number of observables are plotted together, only the minimum number of data points result. For example, if the x axis is time and has 30 observables, and the y axis is distance and has only 25 observables (followed by 5 asterisks), the resulting plot has only 25 points.

Sample Graph Files

Several annotated sample files are included here. Samples 1, 2, and 9 are valid samples. All the others show examples that would result in failure of the graph creation, with the error message:


> Bad file format

Sample 1

INSIGHT V3.0 DATE: Thu Jan 25 15:50:38 1990 Sample 1 {line 1: header info}
This is a valid sample of a graph file. Note that atom
specifications are optional, and in this example not all
of the functions have an equal number of observables.
# {delimiter}
TITLE: Time in ps {title field, first function}
MEASUREMENT TYPE: Time {measurement type field}
UNITS OF MEASUREMENT: ps {units of measurement field}
FUNCTION: time {function name}
# {delimiter}
TITLE: Ang in Deg {second function specification}
MEASUREMENT TYPE: Ang
UNITS OF MEASUREMENT: Deg
FUNCTION: dihedral
ATOM: PENT_MIN:1:C1 {atom identification field}
ATOM: PENT_MIN:1:C2 "
ATOM: PENT_MIN:1:C3 "
ATOM: PENT_MIN:1:C4 "
# {delimiter}
TITLE: Ang in Deg {third function specification}
MEASUREMENT TYPE: Ang
UNITS OF MEASUREMENT: Deg
FUNCTION: dihedral
ATOM: PENT_MIN:1:C2
ATOM: PENT_MIN:1:C3
ATOM: PENT_MIN:1:C4
ATOM: PENT_MIN:1B:C1
# {final delimiters signifying}
# {start of function values}
0.0000 180.0000 180.0000
1.0000 -179.9894 -150.2375
2.0000 -179.9917 -120.0072
3.0000 -179.9930 -89.8617
4.0000 -179.9586 -60.1564
5.0000 -179.9715 -30.3034

6.0000 179.9988 0.0004
7.0000 179.9716 30.3039
8.0000 179.9563 60.1583
9.0000 179.9949 89.8630
10.0000 179.9963 120.0064
11.0000 179.9843 150.2371
12.0000 -179.9997 179.9995
13.0000 -150.2371 -179.9869
14.0000 -150.2422 -150.2430
15.0000 -150.2552 -120.0132
16.0000 -150.2399 -89.8343
17.0000 -150.2239 -60.1589
18.0000 -150.2804 -30.3564
19.0000 -150.2925 -0.0084
20.0000 -150.2877 30.2928
21.0000 -150.2367 60.2017
22.0000 -150.2356 89.8528
23.0000 -150.2625 119.9881
24.0000 -150.2609 150.2550
25.0000 -150.2381 -179.9838
26.0000 -120.0059 * {These asterisks are valid. Notice they}
27.0000 -120.0095 * {serve two functions: the first signifies that}
28.0000 -120.0061 * {the last value for this function has been}
29.0000 -120.0109 * {read, and the rest act as placeholders so}
30.0000 -120.0625 * {that the remaining values are associated}
31.0000 -120.0175 * {with the correct function}
32.0000 -119.9944 *
33.0000 -119.9866 *
34.0000 -119.9809 *
35.0000 -120.0167 *
36.0000 -119.9919 *

Sample 2

INSIGHT V3.0 DATE: Mon Jan 29 11:12:46 1990 Sample 2 { line 1: header info}
This is also a valid example of a graph file. In this case both
functions have an equal number of values.
# {delimiter}
TITLE: Time in ps {title field}
MEASUREMENT TYPE: Time {measurement type field}
UNITS OF MEASUREMENT: ps {units of measurement field}
FUNCTION: time {function name field}
# {delimiter}
TITLE: E in Kcal {second function specification}
MEASUREMENT TYPE: E
UNITS OF MEASUREMENT: Kcal
FUNCTION: total energy
# {final delimiters signifying}
# {start of function values}
0.0000 13.2844
1.0000 15.3901
2.0000 17.7502
3.0000 16.0267
4.0000 16.1193
5.0000 19.8414 {Function values. Note that the
6.0000 22.7331 {columns are of equal length and
7.0000 19.8414 { that there are no "holes" in the data}
8.0000 16.1191

9.0000 16.0268
10.0000 17.7503
11.0000 15.3900
12.0000 13.2844
13.0000 15.3900
14.0000 17.4033
15.0000 19.9211
16.0000 18.0537
17.0000 17.7307
18.0000 21.9917
19.0000 25.2957
20.0000 22.6115
21.0000 18.6231
22.0000 18.1623
23.0000 20.1433
24.0000 17.7743
25.0000 15.3899
26.0000 17.7501
27.0000 19.9207
28.0000 22.5123
29.0000 20.4406
30.0000 20.4251
31.0000 25.0334
32.0000 28.1825
33.0000 25.2620
34.0000 20.5374
35.0000 20.5611
36.0000 22.7063
37.0000 20.1430
38.0000 17.7502
39.0000 16.0266
40.0000 18.0543
41.0000 20.4409
42.0000 18.8049
43.0000 19.0988
44.0000 22.8343
45.0000 26.0181
46.0000 23.7476
47.0000 20.1923
48.0000 19.7691
49.0000 20.5609
50.0000 18.1621

Sample 3

INSIGHT V3.0 DATE: Thu Jan 25 15:50:38 1990 Sample 3
#
MEASUREMENT TYPE: Time {Invalid file format: Missing title field}
UNITS OF MEASUREMENT: ps
FUNCTION: time
<rest of file not shown>

Sample 4

INSIGHT V3.0 DATE: Mon Jan 29 11:12:46 1990 Sample 4
#
MEASUREMENT TYPE: Time
UNITS OF MEASUREMENT: ps
TITLE: Time in ps {Invalid file format: function}
FUNCTION: time {specification fields in wrong order}
<rest of file not shown>

Sample 5

INSIGHT V3.0 DATE: Thu Jan 25 15:50:38 1990 Sample 5
#
TITLE: Time in ps
MEASUREMENT TYPE: Time
UNITS OF MEASUREMENT: ps
FUNCTION: time
#
TITLE: Ang in Deg
MEASUREMENT TYPE: Ang
UNITS OF MEASUREMENT: Deg
FUNCTION: dihedral
ATOM: PENT_MIN:1:C1
ATOM: PENT_MIN:1:C2
ATOM: PENT_MIN:1:C3
ATOM: PENT_MIN:1:C4
#
TITLE: Ang in Deg
MEASUREMENT TYPE: Ang
UNITS OF MEASUREMENT: Deg
FUNCTION: dihedral
ATOM: PENT_MIN:1:C2
ATOM: PENT_MIN:1:C3
ATOM: PENT_MIN:1:C4
ATOM: PENT_MIN:1B:C1
# {Invalid file format: missing}
0.0000 180.0000 180.0000 {second delimiter preceding values}
1.0000 -179.9894 -150.2375
<rest of file not shown>

Sample 6

# {Note that a file may contain as many or as few}
TITLE: Time in ps {lines of informational text or comments as}
MEASUREMENT TYPE: Time {desired. This one has none.}
UNITS OF MEASUREMENT: ps
FUNCTION: time
#
TITLE: E in Kcal
MEASUREMENT TYPE: E
UNITS OF MEASUREMENT: Kcal
FUNCTION: total energy
#
#
0.0000 13.2844
1.0000 15.3901
2.0000 17.7502
3.0000 16.0267
4.0000 16.1193
5.0000 19.8414
6.0000 22.7331
7.0000 19.8414
8.0000 16.1191
9.0000 16.0268
10.0000 17.7503
15.3900 {This is invalid; "holes" are not allowed in the data}
13.2844
15.3900
14.0000 17.4033
15.0000 19.9211
16.0000 18.0537
17.0000 17.7307
18.0000 21.9917
19.0000 25.2957
20.0000 22.6115
21.0000 18.6231
22.0000 18.1623
23.0000 * {This is also considered a "hole" and is}
24.0000 * {invalid. Since an asterisk denotes the}
25.0000 * {end of values, it is an error to follow}
26.0000 17.7501 {an asterisk with function values.}
27.0000 19.9207
28.0000 22.5123
29.0000 20.4406
<rest of file not shown>

Sample 7

INSIGHT V3.0 DATE: Thu Jan 25 15:50:38 1990 Sample 7
#
TITLE: Time in ps
MEASUREMENT TYPE: Time
UNITS OF MEASUREMENT: ps
FUNCTION: time
#
TITLE: Ang in Deg
MEASUREMENT TYPE: Ang
UNITS OF MEASUREMENT: Deg
FUNCTION: dihedral
ATOM: PENT_MIN:1:C1
ATOM: PENT_MIN:1:C2
ATOM: PENT_MIN:1:C3
ATOM: PENT_MIN:1:C4
#
TITLE: Ang in Deg
MEASUREMENT TYPE: Ang
UNITS OF MEASUREMENT: Deg
FUNCTION: dihedral
ATOM: PENT_MIN:1:C2
ATOM: PENT_MIN:1:C3
ATOM: PENT_MIN:1:C4
ATOM: PENT_MIN:1B:C1
#
#
0.0000 180.0000 {Invalid file format: number of columns does}
1.0000 -179.9894 {not match number of function specifications}
2.0000 -179.9917
<rest of file not shown>

Sample 8

INSIGHT V3.0 DATE: Mon Jan 29 11:12:46 1990 Sample 8
#
TITLE: Time in ps
MEASUREMENT TYPE: Time
UNITS OF MEASUREMENT: ps
FUNCTION: time
TITLE: E in Kcal {Invalid file format: missing function}
MEASUREMENT TYPE: E {specification delimiter}
UNITS OF MEASUREMENT: Kcal
<rest of file not shown>


User File (.usr)

The user file can have one of three formats. The first line of the file determines the format.

1.   If the first line of the file is DOTS then the file is assumed to contain lines composed of a coordinate triplet (three numbers separated by spaces) and a color number, which is on the same line as the coordinates and is separated from them by a space. Thus the line is composed of four numbers separated by spaces. The color values are converted to integers and taken as modulo 360. These colors cannot be changed with the Color command.

A dot surface file is a special type of user DOTS file. It follows the DOTS file format, with the addition of a new line type that begins with ATOM_REF. The line contains an atom specification (object:monomer:atom) and three atomic coordinates (x,y,z) in screen space. An example of an atom reference line is:

ATOM_REF FELV:1:CA 0.470000 0.000000 0.000001

These ATOM_REF lines are ignored when the file is retrieved as a user object, but the data they contain is used when the file is retrieved as a molecular surface. (The ATOM_REF coordinates are subtracted from the dot coordinates to put the dots into atom space).

The lines that follow each ATOM_REF line contain the surface dot coordinates for that atom, one dot per line. Each line contains three coordinates (x,y,z) in screen space, and a color number.

2.   If the first line of the file is LINE then the remainder of the file is composed of lines with a coordinate triplet (three numbers separated by spaces) and the letter P or L separated from the coordinates by a space and immediately followed by the end- of-line character. The P and L are interpreted as "move to point P and draw a line from the current position to the next position L". The color of this type of object is selected in the same manner as for any other Get command and can be changed with the Color command.

3.   If the first line of the file begins with the word TEXT then the remainder of the line should contain the size of the text (defaults to 0.03) and the spacing between the lines (defaults to 0.25). The second line of the file contains the x, y, and z coordinates where the text should start and the color of the text (a value from 0 to 360). The remainder of the file is the lines of text which should be displayed. Each separate line of the file is displayed as a separate line on the screen.

Lines beginning with a semicolon are ignored by the program while reading in DOTS and LINE mode. This is allowed so that you can put comments into the file. Comments can appear anywhere in the file except as the first line. Comments are not allowed in TEXT type files--they appear on the screen as text.




Last updated December 09, 1998 at 08:56PM Pacific Standard Time.
Copyright © 1998, Molecular Simulations, Inc. All rights reserved.