Exploratory Data Analysis System |
The GELLAB-II programs were documented in a reference manual describing GELLAB-II [31]. The reference manual also has tutorial examples of running GELLAB-II using a sample set of gels supplied with the system. A poster is available that illustrates some of the aspects of GELLAB-II.
The following describes GELLAB-II as it existed in 1993. It has not been maintained since then and parts of it are being be re-released as open source as we refactor parts of it into the Open2Dprot project on SourceForge.Net at http://open2dprot.sourceforge.net/. We are encouraging the integration of other 2D gel analysis and related proteomics software (LC- MS etc.) into Open2Dprot as well. The following is taken from the 1993 description of GELLAB-II.
Top-level description of GELLAB-II
Figure 1 and Figure 2 illustrate
the data reduction processing performed in the original
GELLAB-II analysis.
Figure 1. Block diagram of the original 2D-gel analysis GELLAB-II
system. Programs associated with major steps of GELLAB-II are
indicated in "[...]". Gel images are acquired by scanning with a
camera interfaced to the UNIX system and saved on the computer disk in
step 1. Accession information about the set of gels is also
used to update an accession file. Landmark spots are then manually
selected that are well defined spots spaced fairly evenly throughout
the gel - with more landmarks in regions with higher distortion in
step 2. Using gel image flicker alignment, the landmark spots
are aligned for all of the gels with a Representative gel (Rgel). The
gel images are then segmented and measurements made of the spots that
are found in step 3. This information and the raw segmentation
data is then used to pair corresponding spots in the remaining gels
with the Rgel in step 4. The set of gel pairings with the same
Rgel may be merged together to form a list of sets of equivalent
Rspots called the composite gel database (CGL) in step 5. Thus
a Rspot set (most likely) contains corresponding spots from all the
gels in that it occurs. Finally, in step 6 data mining is
performed on the PCG database, reports are created, and Rmap and
mosaic images of statistically significant spots can be generated and
displayed. These results can be annotated and analyzed further using
additional software.
Running the programs
Programs are invoked in GELLAB-II through the Unix command-line by
naming the program and specifying optional arguments. Commands may be
combined in Unix script files to implement a batch mode. All programs
have several common Unix-style command-line switches to facilitate
learning the consistent user interface. This subset of switches is
useful in learning how to run particular programs. Descriptions of
the algorithms used in these programs are given in many of the
GELLAB-I papers listed in the References
[4,7-13,17,20,28-30]. [The original GELLAB system (GELLAB-I) was
written in the SAIL (Stanford Artificial Intelligence) Language - an
Algol dialect. It was then translated using the PSAIL compiler [23,26]
to C and then integrated with X-windows and Unix to create GELLAB-II.]
Good introductory papers that describe the basic GELLAB analysis are
[4] and [10] with [12] being a more general and detailed summary.
Extensions to the early system are discussed in
[11,13,17-18,20,28-30]. A comparison of aspects of 2D gel database
analysis systems is given in [28]. This document lists the GELLAB-II
programs and describes the basic steps in performing a 2D gel
analysis. Full documentation of GELLAB-II is given in the Reference
Manual [31]. A few selected examples of
running some of these programs (Section 5) are included at the end
of this document. Each GELLAB-II program can provide help on how it
should be run using the following command line switch options.
-info
Print detailed information about the program including what it
does, how to run it, and literature references
specific to that program.
E.g., cgelp2 -info.
You can do use this with the Unix more(1) program to page
through the documentation slowly. E.g., cgelp2 -info | more
or use it to create a printable file. E.g., cgelp2 -info >
cgelp2.info.
-version
Print the version number of the program. E.g., cgelp2 -version
-usage
To print a quick list of command line switch usage for the
program. Note that you need only type that initial part of a program's
switches that makes them unique. This is indicated by that part of the
switch being upper case in the usage print out.
E.g., cgelp2 -usage or cgelp2 -usage | more
The X-windows System window manager program (under X11R4) twm(1) with startup file .twmrc creates three menus: TWM-WINDOWS (left button), WINDOW-OPS (middle button), and APPLICATIONS (right button). Dragging the mouse to the GELLAB-II selection in the APPLICATIONS menu, causes a GELLAB-II programs menu to pop out to the right when the mouse is in the right side of the menu. These main GELLAB-II selections are listed below. Continue dragging the mouse to the right part of the selection in the menu desired. A function-specific menu now pops out to the right. At this point, select the specific sub-function you want and release the mouse button. Upon invoking a program from the GELLAB Tasks menu, it prompts with a popup form-dialogue window for additional information required for the command line. Press HELP in the dialogue window to list the (Emacs-like) text editing commands. After you have answered the questions, press OK to perform the function or CANCEL to abort it. When the function starts, it pops up a X11 xterm window that disappears when the function being performed is finished.
Accession gels into the gel accession database file for subsequent
processing.
Autopair gels (GSF data) to Gel Comparison File (GCF) paired-spot
lists.
Convert image files to PPX images (PPX is the GELLAB standard image format).
Convert PPX images to PostScript for subsequent display or plotting.
Debug PPX image file (view numeric data at the pixel level).
Display gel images or derive image map images.
Draw Rmap (Representative gel with spots overlaid) as a GSF plot.
Edit Gel Comparison File (GCF) paired-spot lists.
Edit Gel Segmentation File (GSF) spot-lists.
Gel Database Manager for a Composite Gel Database (CGL).
Landmark gels by interactively defining a small number of common
landmark spots.
Make GELLAB batch scripts to accession, landmark, segment spot, pair spots,
and build initial composite gel database.
Mosaic to create derived PPX images (a montage) of local gel regions
surrounding a selected spot across a set of gels.
Pair gels to a Gel Comparison File (GCF) between a gel and the
Representative gel and generate derived image files showing pairing.
Print GELLAB-II gel.rc database project resource file in user-friendly
form.
Rmap image generation of an overlay map of a subset of spots of interest
on a gel PPX image file.
Segment a gel image to a Gel Segmentation File (GSF) and derived PPX image
files.
Xpix image display to display one or more PPX image files.
The GELLAB-II programs are listed below. Corresponding literature
references are specified by [...] listed in the section Reference
Manual [31].
accppx - display gel image(s) given their accession number(s)
autopair - pair 2 gels (GSF spot lists) by automatic pairing to
create GCF file [this program was not released as of the last
GELLAB-II release].
cgelp2 - interactive Paged Composite Gel database analysis
system with both command line (also used for batch) or graphical user
interface.
cmpgl2 - paired 2 gels (GSF spot lists) using landmark database
to create GCF file.
dendrogram - hierarchically cluster protein spots or gel
samples and draw a plot.
dwrmap - draw Rmap numbered plot of GSF spot list.
getacc - multiple gel accession data to gel accession database:
images, information, calibration.
landmark - interactive graphics acquisition to enter paired gel
landmarks into the gel LandMark Set database.
makjob - create GELLAB-II scripts for batch processing gels to
accession gels, landmark gels, segment gels, pair gels with respect to
the reference gel and build the initial database.
markgel - generate Rmap image of a set of spots from a
cgelp2 generated (.sps) data file.
mosaic - generate mosaic (montage of gel panels of a local spot
region from multiple gels) image from cgelp2 generated (.sps)
file.
pgelrc - "pretty print" the gel.rc GELLAB-II database project
resource state file.
plotn - plot GELLAB-II Universal Graphics Files (.ugf) to
Tektronix or postscript graphics.
ppxcvt - convert foreign image formats to Portable PiXture file
used by GELLAB-II.
ppxodt - Portable PiXture (.ppx) file image debugger.
ppx2ps - convert PPX image file to Postscript.
sg2gii - segment gel image to Gel Segmentation File (.gsf).
tek2psG - convert tektronix 4010 graphics input to PostScript
(derived from E. Moy's tek2ps program).
Xpix2 - with X11R4 display and manipulate PPX images.
Xpix11 - with X11R4 display and manipulate PPX images (later
version).
Briefly, this data reduction is achieved as follows:
(1) To create a new GELLAB database, one enters a set of gel data into
the database. This process is called accessioning. Accession a
set of scanned gels by entering the information about them with the
getacc program that: (a) assigns an accession number to each
gel; (b) converts, if necessary, scanned gels into Portable PiXture
(PPX with .ppx file extension) files; (c) requires the experimenter to
enter associated experimental study accession information; and (d) if
needed, calibrates the optional ND step wedge scanned with the gel and
defines an active region in the gel image called the computing
window. The information from (c) and (d) is entered into an accession
file. The PPX image header contains information on OD calibration,
image size, computing window, etc. and is described in the Reference
Manual. An accession file typically has a gel prefix, a 3
character project name, and a .id file extension. E.g.,
gelts3.id. During a data acquisition session, one would enter a
number gels and at the end of the session getacc would prompt
you for a few pieces of information necessary for further
processing. These include (a) the name of the Reference gel or
Rgel, (b) a three character project prefix used for all files
associated with the project, and (c) the names of the different
experimental classes to that the different gels belong. It then
invokes the makjob program to generate Unix batch scripts to
interactively landmark these gels, segment or extract list of
quantified spots from the gels, pair N-1 of the N gels with the
selected Reference gel, and construct the composite gel database and
perform some initial statistical tests. The makjob program also
lets you directly generate these batch scripts for different sets or
subsets of gels that have been previously accessioned.
(2) Spot-list extraction and quantification is performed by the
sg2gii program that results in a Gel Segmentation File (.gsf)
and an optional extracted spot image file. The GSF file contains
position and quantitation information for all spots in a single gel
and must be further processed to compare it with other gels.
(3) Pairing of GSF spot lists from two gels (one of that is the Rgel)
is performed by the cmpgl2 program. The output is called a Gel
Comparison File (GCF) and is a file with a .gcf file
extension. It consists of the pairing of spot data from the two GSF
input files. The pairing program also requires a list of a small
number of corresponding landmark spots for the two gels being
paired. This is stored in the LandMark Set (LMS) data base file that
typically has a lms prefix and .lm file extension. LMS
data can be acquired several ways: (a) using the landmark
interactive graphics program running under X-windows, (b) using
program dwrmap to draw Rmaps from GSF data that can be plotted
with plotn. The landmark numbers can be read manually from the
plots and then entered via a terminal session using the
landmark program. A third method involves using the Xpix
program in its "compare" mode to interactively generate the
landmark coordinate pairs that can then be text edited into the
proper LMS DB format. Needless to say, use of the landmark
program is encouraged as it is by far the easiest and most accurate.
(4) Construction of the Paged Composite Gel DB (PCG DB) is performed
by cgelp2. Cgelp2 requires a set of N-1 GCF files for N
gels since the Rgel is included in each GCF. By "paged" we mean that
as the data base is too large to fit in memory, pieces of it are paged
in and out of memory from the actual PCG DB disk file (that has a
.pcg file extension). [Database can be easily over 100 Mbytes
for databases consisting of more than 100 gels.] The accession file is
also accessed to extract the "study" information for each gel in the
PCG DB. This information is used for automatic classification of gels
into the current experimental classes of gels and other analyses.
Exploratory data analysis really starts once the PCG DB is
constructed. The particular strategy to follow is outside the scope of
this document but is discussed in many of the papers listed in the references on cgelp2 and papers on
particular biological problems.
The initial batch script may be generated by getacc or
makjob: (a) constructs the PCG DB file, (b) constructs an
initial experimental gel classification based on accession file study
information, (c) normalizes the protein concentration values between
gels using the Ratio-List method and reorders spots in all Rspots sets
in the PCG DB based on this normalization. It (d) then performs an
initial F-test and t-test at p-values of 0.90, 0.95, and 0.99 for all
of the experimental classes. It performs a Wilcoxon-rank-sum test of
classes 1 and 2 at 0.90, 0.95, and 0.99 for classes 1 and 2 as well as
a missing-class test. It also computes and displays histograms of
various Rspot set spot features for the entire PCG DB to aid in
setting the initial prefilter parameters. Subsequent analyses consist
of changing the view of the PCG DB, performing searches in the
new view, and displaying this transformed data as images, plots,
tables, lists, etc. See [12-13,17,28-30] for more discussion or using
GELLAB for exploratory data analysis.
annotate
autopair [EXPERIMENTAL]
cgelp2
cmpgl2
dendrogram
dwrmap
getacc
landmark
makjob
markgel
mosaic
pgelrc
plotn
ppxcvt
ppxodt
ppx2ps
sg2gii
tek2psG
Xpix
set path = ($path ~gelmgr/gellab/bin)
Each user could have a subdirectory called ~/gellab for storing
GELLAB-II project(s) data. Any other local user directory could
alternatively be used. The master state file gel.rc contains
the names of a number of other files as well as paths. These are
indicated by a 'keyword=value' syntax in the gel.rc file. Both
gellab and gel.rc can be put in your home
directory. Alternatively, if you have several gel projects, create a
separate directory for each one and then run pgelrc to create
the required gel.rc, gel.id, lms.lm and gellab directory
tree. These include the following entries that default to the
following files:
gelFile= ~/gellab/id/gel.id the Accession Database file.
The default paths are:
ppnP1X= ~/gellab/ppx/ the original gel picture disk PATH.
Note that .gsf and .gcf files as well as derived Rmap,
mosaic, and segmented gel images are saved in the ppnP2X or
aux directory. The composite gel database is kept in
ppnP4X and generated files from an analysis of the composite
database are in ppnP5X.
To install a new GELLAB-II file system in a new account,
cd ~/
will create the required directory trees in your login account.
To access GELLAB-II, add
set path = ($path ~gelmgr/gellab/bin/`arch`)
To create a new GELLAB-II sub-project 'prj' consisting of a
separate set of gels:
Then answer the questions (typing the RETURN key for the default is
sufficient for most questions).
When using GELLAB-II with the X-Windows System, the GELLAB programs can
be be invoked from interactive graphics menus. We currently use
version X11R4 with the twm window manager with the
.twmrc startup file. When a GELLAB program is selected, it may
prompt you for a text response to supply the command line required.
Currently several programs use X-Windows: Xpix,
landmark, getacc, plotn and cgelp2.
Example 1: print current state file if it exists. If not, it will ask
you if the default values are acceptable and then create a
gel.rc file and gellab directory tree for the project in
the current directory.
Example 2: generate batch scripts given a list of gel accession
numbers in file tst3.ccl, a project prefix ts3, four experimental
classes and the rgel.
Example 3: segment a gel into a GCF file and a 'z' segmented image
spot file and then display it.
Example 4: given an accession number, display the gel in a Xpix
window.
Example 5: given two accession numbers, display the corresponding gel
image files in two Xpix windows.
Example 6: segment a gel as above but also generate the 'c' central
core image and the original less the segmented spots image 'y' files.
Then display the these two derived images.
Example 7: pair GSF spot list files for two gels into a GCF file.
Example 8: pair GSF spot list files for two gels into a GCF file and
also generate 'u' and 'v' labeled paired-spot image files. Then
display the these two derived images.
Example 9: generate the two 'l' landmark set images for the two gels,
but do not pair the gels.
Example 10: run cgelp2 on the command file batch script to
create a PCG DB.
Example 11: start the CGL database program on an existing database
file.
Example 12: generate an Rmap of gel 324.1 for SPSS file 'ts3s02.sps'
procedure from the cgelp2 PCG DB program.
Example 13: same as above but display the Rmap in an Xpix
window after it is generated.
Example 14: generate a mosaic image of Rspot 63 for SPSS file
'ts3s02.sps' produced from the cgelp2 PCG DB program.
Example 15: same as above but display the mosaic in an Xpix
window after it is generated.
Example 16: Display a previously computed map and the mosaic image
for a Rspot on the Rmap. Xpix window.
1. Lemkin, P., Merril, C., Lipkin, L., Van Keuren, M., Oertel, W.,
Shapiro, B., Wade, M., Schultz, M., Smith, E. (1979) Software aids for
the analysis of 2D gel electrophoresis images, Computers and
Biomedical Research 12:517-544.
2. Lemkin, P., Lipkin, L. (1980) BMON2 - A distributed monitor
system for biological image processing. Computer Programs in
Biomedicine 11:21-42.
3. Lemkin, P., Lipkin, L., Merril, C., Shiffrin, S. (1979) Protein
abnormalities in macrophages bearing asbestos. NIEHS Conf. Medical
Aspects of Mineral Fibers. Environmental Health Perspectives
34:75-89, 1980.
4. Lipkin, L.E., Lemkin, P.F. (1980) Database techniques for
multiple PAGE (2D gel) analysis. Clinical Chemistry
26:1403-1413.
5. Lester, E.P., Lemkin, P., Cooper, H.L., Lipkin, L.E. (1980)
Computer-Assisted Analysis of Two-Dimensional Electrophoresis of Human
Peripheral Blood Lymphocytes, Clinical Chemistry 26:1392-1402.
6. Lester, E.P., Lemkin, P., Lipkin, L.E., Cooper, H.L. (1981)
Two-Dimensional Electrophoretic Analysis of Protein Synthesis in
Resting and Growing Lymphocytes in Vitro, J. Immunology
126:1428-1434.
7. Lemkin, P., Lipkin, L. (1981) GELLAB: A computer system for 2D
gel electrophoresis analysis. I. Segmentation and preliminaries.
Computers in Biomedical Research 14:272-297.
8. Lemkin, P., Lipkin, L. (1981) GELLAB: A computer system for 2D gel
electrophoresis analysis. II. Spot pairing, Computers in Biomedical
Research 14:355-380.
9. Lemkin, P., Lipkin, L.(1981) GELLAB: A computer system for 2D gel
electrophoresis analysis. III. Multiple gel analysis. Computers in
Biomedical Research 14:407-446.
10. Lester, E.P., Lemkin, P.F., Lipkin, L.E. (1981) New Dimensions in
Protein Analysis - 2D Gels Coming of Age Through Image Processing,
Invited paper, Analytical Chemistry 53:390A-397A.
11. Lemkin, P.F., Lipkin, L.E. (1981) GELLAB: Multiple 2D
Electrophoretic Gel Analysis, in Electrophoresis '81, R. Allen,
Arnaud (eds), W. De Gruyter, New York. pp 401-411.
12. Lemkin, P.F., Lipkin, L.E.(1983) Database Techniques for 2D
Electrophoretic Gel Analysis, in Computing in Biological
Science, Elsevier/North-Holland, M. Geisow, A. Barrett (eds),
pp 181-226.
13. Lemkin, P.F., Lipkin, L.E., Lester, E.P. (1982) Extensions to
the GELLAB 2D Electrophoresis Gel Analysis System. Paper given at
"Clinical Applications of 2D Electrophoresis", Mayo Clinic, Nov.
15-18, 1981. Clinical Chemistry 28:840-849.
14. Lester, E.P., Lemkin, P.F., Lipkin, L.E. (1982) A
two-dimensional Gel Analysis of Autologous T and B lymphoblastoid Cell
lines, Clinical Chemistry 28:828-839.
15. Lester, E.P., Lemkin, P.F., Lowery, J.F., Lipkin, L.E. (1982)
Human leukemias: A preliminary 2D electrophoretic analysis,
Electrophoresis 3:364-375.
16. Lester, E.P., Lemkin, P.F., Lipkin, L.E. (1983) States of
differentiation in leukemias: A 2D gel analysis. In Chromosomes and
Cancer: From Molecules to Man. Proceedings of 5th Annual Bristol
Myers Symposium on Cancer Research. Academic Press, pp 226-245.
17. Lemkin, P.F., Lipkin, L.E. (1983) 2D Electophoresis gel database
analysis: Aspects of data structures and search strategies in GELLAB,
Electrophoresis 4:71-81. Presented at Argonne Workshop on
Technical advances in 2D electrophoresis and clinical applications of
the technique", Aug. 29-Sep.1, 1982.
18. Howard, R.J., Aley, S.B., Lemkin, P.F. (1983) High resolution
comparison of Plasmodium Knowlesi clones of different variant antigen
phenotypes by 2D gel electrophoresis and computer analysis.
Electrophoresis 4:420-427.
19. Lester, E.P., Lemkin, P.F., Lipkin, L.E. (1984) Protein indexing
in leukemias and lymphomas, NY Acad. Science 428:158-172.
20. Lemkin, P., Sonderegger, P., Lipkin, L. (1984) Identification
of coordinate pairs of polypeptides: A techniques for screening of
putative precursor product pairs in 2D gels. Clinical Chemistry
30:1965-1971.
21. Sonderegger, P., Lemkin, P., Lipkin, L., Nelson, P. (1985)
Differential modulation of the expression of axonal proteins by
non-neuronal cells and the peripherial and central nervous system,
EMBO J. 4:1395-1401.
22. Lester, E.P., Lemkin, P.F. (1984) A 'GELLAB' computer assisted 2D gel
analysis of states of differentiation in hematopoietic cells, In
Neuhoff, V. (Ed.): In Electrophoresis '84, 1984. Basel,
Switzerland, Springer-Verlag Chemie, pp 309-311.
23. Lemkin, P. (1985) PSAIL - A portable SAIL compiled translator for
C environments, Computer Language 2:39-45. [Used in converting
GELLAB-I to GELLAB-II]
24. Sonderegger, P., Lemkin, P.F., Lipkin, L.E., Nelson, P.G. (1986)
Coordinate regulation of the expression of axonal proteins by the
micro-environment, Developmental Biology 118:222-232.
25. Stoeckli, E.T., Lemkin, P.F., Kuhn, T.B., Ruegg, M.A., Heller, M.,
Sonderegger, P. (1989) Axonally Secreted Proteins: I. Identification of
Proteins Secreted from Axons of Embryonic Dorsal Root Ganglia Neurons,
EMBO J. 180:249-258.
26. Lemkin, P.F. (1988), PSAIL: A Portable SAIL to C Compiler -
Description and Tutorial, SIGPLAN Notices Oct 23(10):149-171.
[Used in converting GELLAB-I to GELLAB-II]
27. Lemkin, P.F. (1988) Xpix - An image processing system for X
windows, Computers Biomedical Research 26:1-16.
28. Lemkin, P.F., Lester, E.P. (1989) Database and Search Techniques
for 2D Gel Protein Data: A Comparison of Paradigms For Exploratory
Data Analysis and Prospects for Biological Modeling,
Electrophoresis, 10(2):122-140.
29. Lemkin, P.F. (1989) GELLAB-II, A workstation based 2D
electrophoresis gel analysis system, in proceedings of
Two-Dimensional Electrophoresis, T. Endler, S.Hanash (Eds),
Vienna Austria, Nov 8-11, 1988, VCH Press, W.Germany. pp 53-57.
30. Lemkin, P. F. (1992) The GELLAB Papers - A Collection of Papers
Describing the GELLAB-II System. NCI/FCRF, July 26, 1992.
31. Lemkin, P.F. (1993) The GELLAB-II 2D Gel Exploratory Analysis
System. Reference manual, pp 677, August 1993.
32. Lemkin, P.F., Rogan, P., Automatic Detection of noisy spots in
two-dimensional Southern Blots, Applied and Theoretical
Electrophoresis 1991;2:141-149.
33. Amberger, A., Lemkin, P.F., Sonderegger, P., and Bauer,
H.C. (1993): ECGF and heparin determine differentiation of cloned
cerebral endothelial cells in vitro. Molecular and Chemical
Neuropathology 20:33-43.
34. Myrick, J.E., Lemkin, P.F., Robinson, M.K., Upton, K.M. (1993):
Comparison of the Bio Image VisageTM 2,000 and the GELLAB-II
two-dimensional electrophoretic analysis systems.
Applied & Theoretical Electrophoresis 3:335-346.
35. Wu, Y., Lemkin, P.F., Upton, K. (1993) A fast spot segmentation
algorithm for 2D electrophoresis analysis.
Electrophoresis 14:1350-1356.
36. Robinson, M.K.,, Myrick, J.E., Henderson, L.O., Coles, C.D.,
Powell, M.K., Orr, G.A., Lemkin, P.F. (1995) Two-dimensional protein
electrophoresis and multiple hypothesis testing to detect potential
serum protein biomarkers in children with fetal alchol syndrome.
Electrophoresis 16(7):1176-1183.
37. Lemkin, P.F. (1995) Representations of protein patterns from 2D
gel electrophoresis databases. In: Pickover, C., (Ed) The Visual
Display of Biological Information. World Scientific Publishers,
River Edge, New Jersey, pp 43-59.
GELLAB Tasks
1. Brief Descriptions of GELLAB-II Programs
Most GELLAB-II programs generally require one or more arguments so you
should read the individual programs' documentation (using -info
as suggested above) prior to attempting to run them. GELLAB-II
programs use a database project resource file called gel.rc in
the user's current path to provide state information. This includes
various directories for image files, gel database files, and other
intermediate files (see 4. Unix Directories
and Support Files). If you do not have this file in your path,
running any GELLAB program (such as pgelrc that prints a
user-friendly form of the project resource file gel.rc) can be used to
prompt the user in defining the initial gel.rc file. A batch script
generation facility, that is part of GELLAB (makjob), can
greatly automate running these programs. Then minimum investigator
intervention is required for major parts of its operation in composite
gel database preparation.The GELLAB-II programs include:
2. Performing An Analysis On A Set Of 2D Gels From An Experiment
A typical GELLAB-II analysis of a set of 2D gels is a data reduction
process. It analyzes a set of gels of the same material but with
different experimental conditions to produce lists of spots with
similar specific attributes. These subsets of spots are clustered
using statistical techniques. The major steps of an analysis are: (1)
accessioning gel experiments and scanned image files, (2) spot
quantification, (3) gel pairing, and (4) composite gel database
construction, and searching and display of search results of different
views of the database. The data reduction is shown in the sets of
files in the next figure.
a) {Initial image files Gi.ppx} and {Accession file .id}
G1 G2 Gn
| | ... | spot segmentation/quantification
v v v
GSF = {spot list} GSF = {spot list} GSF = {spot list}
1 2 n
{Gel Segmentation Files (GSF) .gsf}
b) GSF GSF LMS (n-1) landmarking with Rgel GSF
r i ri r
| | | {LMS are in landmark DB file .lm}
v v v
---------------------- (n-1) gel comparisons with GSF
| r
v
GCF = {{spot pairs} , {spot pairs} ,..., {spot pairs} }
i A B K
{Gel Comparison Files (GCF) .gcf}
c) {GCF , GCF , ..., GCF }
1 2 n-1
|
| cgelp2 composite gel DB construction
v [CGL is stored in (PCG) .pcg file]
CGL=({Rspot }, {Rspot }, ..., {Rspot }).
1 | 2 z
v
--------------------------------------------------
| | | | | Derived export files
v v v v v
{DB .cgl} {SPSS .sps} {Table .tbl} {Plot .ugf} {Inquire .inq} ... etc.
| |
v v
{Derived image files .ppx} {Line drawing plots}
Figure 2. Some of the files used in a GELLAB-II gel
analysis. Data file structures and corresponding file extensions
used in the gel analysis. A GELLAB-II file extension is a 2 or 3
character name preceded by a ".", (e.g. ts3pcg.cgl). a) Gel
Segmentation Files (GSFs) are produced by segmentation of the gel
images by sg2gii. b) Gel Comparison Files (GCFs) are
produced by comparing GSFs using landmark spots with cmpgl2 or
autopair. c) The Paged Composite Gel (PCG) database is
constructed by merging GCFs with cgelp2. The PCG DB is a 3D
data reduction of the original set of gel images and accession
information. The PCG DB is the realization of the Composite GeL
database (CGL) model. The SPSS (.sps) file is an exportable file
suitable for input to SPSS or other statistics packages. The
markgel and mosaic programs uses the exported .sps
files. The derived export files are other types of data derived from
the PCG DB (see Figure 1).
3. Details on Programs
accppx
display one or two gel image files using Xpix given the accession
number(s) and optional picture prefix type. Optional picture types
include: "l" for landmark Rmap, "m" for Rmap images, "y" and "z" for
segmented spot images produced by sg2gii, "c" for segmented connected
component images, "w" for the set of montage images generated by the
mosaic program. (See the PPX file definition in the glossary as well
as sg2gii and markgel for more information). Also search
accession file and disk for status of gels and PPX files on the
system. If a picture name is used instead of a picture number it tries
to find that file, adding a .ppx file extension if needed. The
current gel.rc resource file specified picture paths are searched
first before searching the user's PATH environment variable.
is an interactive spot annotation system. It reads the spot lists for
the specified gel that is displayed in a window. Using the mouse,
spots of interest may be queried for their annotation or be specified
to add or edit their annotation. Various other annotation specific
functions are available including generating derived Rmap images with
investigator specified annotation for selected spots.
is an automatic gel pairing program that can be used to replace
cmpgl2 and landmark programs for doing spot pairing. It
requires a minimum of one landmark point in each gel to get started
and should be more accurate as well. Pairing results are better with
similar gels. It does take somewhat longer however to compute the spot
pairings.
runs the Paged Composite Gel database analysis system. This builds a
PCG DB file from a set of Gel Comparison Files (GCF)s produced
by the cmpgl2 program. When running cgelp2, additional
information is available on top level commands by typing HELP
to list all of the top level commands or HELP specific-command.
For example, type HELP HELP to get more information on the
HELP command. There is also an ?APROPOS facility for finding
relevant commands. [4,9-10,12-13,17-18, 20-22,28-30]
runs the gel pairing program, that generates a Gel Comparison File
(GCF) (.gcf) from two Gel Segmentation Files (GSF) (.gsf)
produced by the sg2gii segmentation program. It also requies a
landmark set DB (LMS) entry for the two gels being paired. [4,8,10,12]
generates a dendrogram cluster analysis plot. It uses cgelp2
produced SPSS-compatible (.sps) or INQUIRE (.inq) text files. It can
cluster a set of Rspots as a function of density of a set of gels or
cluster a set of gels as a function of the density profile of a set of
Rspots sets. It also plots the results after they are generated
and/or makes an optional .ugf plot file. A data file (.dgm) is also
produced that contains numeric cluster analysis information. [24]
given a Gel Segmentation File (GSF) Plots a Rmap from the .gsf
file. It plots a Rmap with spots labeled by their GSF spot
number. This can be used with the landmark program to manually
generate the landmark set data entry. It also plots the
results after they are generated and/or makes an optional .ugf plot
file. [13]
in a data acquisition session, acquire 2D gel images and their related
accession information that is appended to the accession file. At the
end of the session, ask a few questions regarding the type of
experiment to be performed. Then generate Unix batch scripts to a)
interactively landmark the set of gels, b) segment the gel images into
GSF spot lists, c) pair GSF spot lists into GCF paired spot lists, d)
merge the GCF files by constructing a PCG DB file and e) perform an
initial statistical analysis of the PCG DB. Using the -edit
-ask: switches you can accession gel images scanned elsewhere.
You may also edit entries in the accession file. It allows you to
alter different fields for accession entries in the accession file.
Image size defaults to 512x512 but any size (e.g., 1024x1024) can be
specified. [4,7,12]
is an interactive X-windows graphics program to landmark two gels.
This process defines a small set of corresponding spots (10 to 20) in
each of the two gels. These spot positions are used to update an entry
in the LandMark Set (LMS) data base file that is used by other
programs including the spot pairing program cmpgl2. It can also
use a previously defined LMS entry to indicate where the landmarks are
in the Rgel image when landmarking another gel. This makes finding the
same landmarks much easier and reproducible. In addition, the option
is available to have landmark read the GSF segmented spot list files
and use this for finding spots when you indicate a position near a
segmented spot. [4,8,10,12]
generates GELLAB-II Unix scripts. It requests a list of gel accession
numbers for a subset of gels previously accessioned. It then asks a
few questions regarding the type of experiment to be performed and
generate Unix batch scripts to a) interactively landmark the set of
gels, b) segment the gel images into GSF spot lists, c) pair GSF spot
lists into GCF paired spot lists, d) merge the GCF files by
constructing a PCG DB file and e) perform an initial statistical
analysis of the PCG DB. A makjob run can be customized to perform
some and not other specific analyses (see -info switch for makjob).
See example of running makjob in the Examples. [12,13]
generates a Rmap image having specified a gel accession number and a
SPSS-compatible (.sps) file generated by the cgelp2
program. The Rmap is the synthetic image generated by the projection
of the set of spots specified by the SPSS file onto a copy of the gel
image associated with the gel accession number. [4,9,12]
generates one or more mosaic images having specified a particular
Rspot number and a SPSS-compatible (.sps) file generated by the
cgelp2 program. The imagee or plot of a Rspot for a set of
gels is a composite image or graphic formed from panels from each gel
arranged in a regular checkerboard pattern ordered by minimum spot
density (protein concentration). The panels are taken from a
subregion of each gel surrounding a particular Rspot. [4,9,12]
prints a user friendly form of the gel.rc GELLAB state file. This
file contains the default names of various database files,
directories, segmentation parameters, and information on the last data
processed. It is used by all GELLAB-II programs upon startup If the
file does not exist, then running pgelrc will run an
interactive Question and Answer session to generate the initial gel.rc
file and gellab directory tree. You can also find the status of the
next free accession number and PPX file, that accession numbers are
in the system, and what PPX files from the database are out on the
disk. [12]
reads Universal Graphics Files (.ugf) produced by various GELLAB-II
programs cgelp2, dwrmap, dendrogram. It is able to replot a
.ugf on the same or different type of display as well as to plot the
file on other devices (such as a Postscript laser printer). Used with
tek2psG it can print plot files on a laser printer.
convert different input picture file formats into the GELLAB Portable
PiXture file format (.ppx). It can convert ASCII hex, decimal,
or octal numbers as well as other binary formats with variable header,
leading and trail number of pixels/lines, and variable number of
lines. It also has input header optical density calibration conversion
to PPX header format for selected input types (such as BioImage).
Input images may be larger than the PPX output image in that case you
have the option of sampling or averaging the data. Output images are
normally 512x512 but may be specified to any size. Data may also be
transformed by complementing, linear scaling, or performing a log
transform to produce the 8-bit data currently required for the PPX
file.
is a picture debugger for opening, reading pixels, 3x3 neighborhoods,
and 18x18 windows of a .ppx file. Data may be viewed in hex, octal, or
decimal. Individual pixels may be changed and the edited picture file
saved.
converts a .ppx image file to Postscript that can then be
printed on a laser printer.
is a gel spot list segmentation program that generates a Gel
Segmentation Files (GSF) from the image file associated with the
gel accession number. The accession number and gel image are produced
by the getacc gel image acquisition program. Image size
defaults to 512x512 but any size image specified in the PPX file
header (e.g., 1024x1024, 2048x2048, or 4096x4096) can be
specified. [4,7,10,12,13]
converts Tektronix 4010/4015 input (such as is produced by
plotn) to Postscript, that is suitable for printing on a
PostScript laser printer. Tek2psG was derived from E. Moy's tek2ps
program.
is a general purpose X-windows X11R4 or later .ppx file interactive
display program. It is controlled by the user moving and clicking a
mouse to get menus selections and interact with the image(s). It can
manipulate one or two images on the screen at a time with each image
having its own real-time small zoom window (additional images can be
kept in memory and be alternately displayed). It can perform general
image processing types of operations where you can save transformed
image disk files for later recall [27]. Xpix is
symbolic-linked to either Xpix2 or Xpix11 that are later
versions of the program.4. Unix Directories and Support Files
Several Unix directories and support files are required for the
operation of GELLAB-II. You need to add a path to the GELLAB-II
executable and runtime documentation files that are kept in the system
installation ~gelmgr/gellab/bin directory. Add the following
line to your .cshrc Unix csh shell startup file:
lmsFile= ~/gellab/lms/lms.lm the Landmark Set DB file.
spotListFile= ~/gellab/ann/spt.ann the Annotation DB file.
ppnP2X= ~/gellab/aux/ the auxillary picture disk PATH.
ppnP3X= ~/gellab/tmp/ the temporary picture disk PATH.
ppnP4X= ~/gellab/pcg/ the PCG composite gel Database PATH.
ppnP5X= ~/gellab/gen/ the generated file Database PATH.
~gelmgr/gellab/demo/make-user.do
to your .cshrc file. We assume you will run the C-shell
csh or another compatible shell.
cd (wherever you want to put the data)
mkdir prj
pgelrc
5. Examples of GELLAB-II Commands to Run Programs
The following is a short set of examples illustrating some of the
types of operations possible with GELLAB-II. No examples of
cgelp2 are given because that is beyond the scope of this brief
document and is covered in the Reference Manual [31]. See the detailed
discussions in the papers listed in the References.
pgelrc
makjob -rgel:0324.1 -class:AML:ALL:CLL:HCL:HL-60 -accs:ts3.ccl -prj:ts3
sg2gii 324.1
accppx 324.1 324.1 -P2:z
accppx 324.1
accppx 324.1 369.1
sg2gii 324.1 -ctlcoreimage -restofimage
accppx 324.1 324.1 -P1:c -P2:y
cmpgl2 324.1 369.1
cmpgl2 324.1 369.1 -MarkLabels
accppx 369.1 369.1 -P1:u -P2:v
or add the display switch to the same command:
cmpgl2 324.1 369.1 -MarkLabels -Xpix
cmpgl2 324.1 369.1 -onlyMarkLMSimages
accppx 324.1 369.1 -prefix:l
or add the display switch to the same command:
cmpgl2 324.1 369.1 -onlyMarkLMSimages -Xpix
cgelp2 -f ts3cgl.gdo
cgelp2 -d ts3cgl.pcg
markgel 324.1 ts3s02.sps
accppx 324.1 -prefix:m
markgel 324.1 ts3s02.sps -Xpix
mosaic 63 ts3s02.sps
accppx w00063.ppx
mosaic 63 ts3s02.sps -Xpix
accppx 324.1 -P1:m w00064.ppx
6. References
7. Notes on Last GELLAB-II Release)
Revised: 12/08/2005