Insight II



I       Using Compressed Files

Most of Biosym's software, including Insight II, the various application modules, and many related utility programs, can now read compressed files. This feature affects only the reading of existing files, not the creation or writing of files. Almost any kind of file can be used directly in compressed form, including Brookhaven PDB files, sequence database files, log files, and Biosym .arc files and .psv folder files, to name a few.


Recognized Compression Formats

Files compressed in either of two formats can be read. These are the formats of the two most commonly-used file compression commands on UNIX systems:

1.   the standard UNIX compress command, which marks compressed files by appending a .Z extension onto their filenames;

2.   the gzip command from the Free Software Foundation, which marks compressed files by appending a .gz extension onto their filenames.

Biosym's software recognizes the compression format by the .Z and .gz extensions, so you must not change the extension of a compressed file. Also, you should not use .Z or .gz extensions for files that are not compressed.


Compressed Files Now Appear In Value-aids

Any filename that would appear in a file list value-aid if it were not compressed also appears if it has an extension that indicates compression. For example, in the File/Restore_Folder command, the Files value-aid not only lists filenames ending in .psv, but also those ending in .psv.Z and .psv.gz.


A Decompression Program Must Be On Your System

Biosym's software executes the uncompress program to decompress files ending in .Z, and the gzip -d (or gunzip) program to decompress files ending in .gz. The decompression, and hence the reading of the file, fails if the appropriate decompression program is not installed on your system. The uncompress program is standard on all UNIX systems that Biosym supports. The gzip program can be obtained from the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA, or downloaded via ftp from the directory /pub/gnu at prep.ai.mit.edu.


Decompression Creates a Temporary File

When Biosym's software reads a compressed file, it decompresses the file into a temporary file and reads that instead. If your computer does not have enough free disk space to hold the decompressed temporary file, the reading of the file fails. The temporary file is always removed (strictly speaking, "unlinked") immediately after it is opened for reading, so it disappears when the reading is finished, or when the program terminates unexpectedly.

The name of the temporary file begins with the four letters temp and ends with a string of alphanumeric characters generated by the computer to make the file name unique. The temporary file is automatically created either in your home directory or in the standard system directory for temporary files, depending on which of the associated filesystems has more free disk space. The standard system directory for temporary files is usually /tmp, /usr/tmp, or /var/tmp on most UNIX systems. You can determine which of these applies to your system by reading the definition of "P_tmpdir" in the file /usr/include/stdio.h.

You can override the behavior just described, and force the software to create the temporary file in a directory of your choice, by setting the environment variable TMPDIR to the name of that directory. For example, to force the creation of temporary files in the directory /usr/local/tmp, you would type the command:


>	setenv TMPDIR /usr/local/tmp
at the UNIX prompt, or add it to your .cshrc file.

Because the temporary file is automatically deleted, you normally never see it. There is, however, one rare circumstance in which a temporary file can persist: if the software is killed (for example, via the kill command) during the brief period when the gunzip or uncompress command is in the process of decompressing, then an incomplete temporary file may be left on your disk. If this happens, you should manually delete the temporary file.


Fail-Safe Decompression Strategy

If the decompression fails for any reason, the software tries to find and open an uncompressed version of the same file before it gives up. It does this simply by removing the .Z or .gz extension from the filename and then trying to open the corresponding file. Similarly, if it fails to open or cannot find an uncompressed file, it tries to find and open a compressed version of the same file before it gives up. It does this first by looking for a version of the file ending in .gz, then, if that fails, for one ending in .Z. If either a compressed or an uncompressed version exists, the software reads it, whether or not you specify the correct extension in the filename. The software always tries first to open the file with the exact name and extension that you specify.

This behavior frees you from the burden of trying to remember whether or not you compressed a particular file that you want to read. It also means that any log files or BCL macros that contain explicit data file names continue to work even if you compress (or decompress) those data files after the script is written.


Caveats

Using Soft Links to Compressed Files

A compressed file can be accessed via a soft link, but only if the name of the link has the same extension (.gz or .Z) as the file itself. For example, if a soft link named my_file.gz points to the file pdb1crn.ent.gz, then the Insight II command:

Get Molecule PDB User my_file.gz CRN -Heteroatom -Reference_Object

works correctly. If, however, the soft link named my_file points to pdb1crn.ent.gz, then the command:

Get Molecule PDB User my_file CRN -Heteroatom -Reference_Object

fails with a "Bad file format" error message.

VMS .Z Files Cannot Be Read

Some files compressed on VMS computer systems have .Z extensions but are not in the same format as UNIX .Z files. Unfortunately these VMS .Z files cannot be read directly by Biosym's software. Instead they must first be decompressed on a VMS system, transferred to a UNIX system, and then compressed. Some protein sequence database files are distributed on the Internet in this VMS .Z format, so be sure to download UNIX-compatible files instead.




Last updated December 17, 1998 at 04:29PM PST.
Copyright © 1998, Molecular Simulations Inc. All rights reserved.