2D Gel Analysis - Using GELLAB-II
On Large Databases++

Peter Lemkin

++This web poster was derived from a poster presented at Electrophoresis '93 Symposium by P.F. Lemkin(1), J.E. Myrick(2), M.K. Robinson(2), Y. Wu(3), G.A. Orr(4), November 7-10, 1993, Wild Dunes, SC, USA.

(1) LMMB/NCI/DBS/FCRDC, Frederick, MD 21702; (2) Div. Env. Health Lab. Sci., NCEH, CDC, Chamblee, GA; (3) Scanalytics/CSPI, Billerica, MA; (4) PRI/FCRDC, Frederick, MD, USA.

Abstract | Introduction | User interface | Sample analysis | Summary | Acknowledgments | References

Abstract

GELLAB-II is a computer system for the analysis of 2D electrophoretic protein gels. Analysis consists of a series of data reduction steps that transform a set of 2D gel images to a single composite database that consists of corresponding spots from all gels. There are three main stages in the data analysis: (a) data acquisition, (b) data reduction, and (c) exploratory data analysis. Stages (a) and (c) are performed interactively while (b) is performed in the background as a batch process. Recent improvements have been made to the interactive parts of GELLAB-II composite gel database subsystem including a dynamic reference-spot map showing particular spots of interest. This poster describes the X-Windows based Graphical User Interface of GELLAB-II and how the interactive phase of the gel analysis is performed. By providing a graphical interface, user productivity improves and the entire system is easier to learn and use. We illustrate this by examples from some databases of studies by Myrick et al. done at the Centers for Disease Control and Prevention. The studies include: an occupational cadmium exposure study with 29 gel images ([MyrJ93a], [MyrJ93b]), two case-control studies with 38 gels and 77 gels of serum proteins.

Introduction

GELLAB-II is a computer system for the analysis of 2D electrophoretic protein gels ([LipL80], [LemP89a], [LemP89b]). [The [LemP93]
GELLAB-II Reference Manual, 1993, 677 pages is available as a PDF file.] Given a set of stained or autoradiographed 2D gels, a major goal is to find significant spot differences. Another important goal is keeping track of "all those spots" in a database. GELLAB-II helps this process by performing a series of data reduction steps on the data. It effectively maps a set of 2D gel images to a single composite database (CGL DB) consisting of corresponding spots from all gels. Figure 1 illustrates this composite gel database. GELLAB-II is able to build CGL DBs with over 100 gels and over 8000 spots per gel (roughly 50 Mbytes). Several large databases were constructed and we present some data from two of these, a 38 gel and a 77 gel study of case and control of serum proteins. Some of this data is used here to illustrate aspects of handling large amounts of data with GELLAB-II.

Compsite Gel Database


Figure 1. Three-dimensional composite gel database model. a) Illustrates a composite gel database consisting of data from a set of 2D-gels. Corresponding paired spots (circles) are denoted by diagonal lines drawn through them. Such sets of corresponding spots are called Rspot sets. One of the gels is selected to be a reference or Rgel, denoted Gr. The circle means the spot is present and the X means that it is missing in that gel. Spot R1 occurs in all n gels. Spot R2 occurs in the Rgel and in one other gel. Spot R3 is only in the Rgel. Spot R4 is not present in the Rgel but is in most of the other gels. Spots R1, R2, and R3 are in the unextended Rspot database (since they occur in the Rgel) while spot R4 is in the eRspot part of the database (since it does not occur in the Rgel). b) Shows the basis of using mean spot positions for estimating canonical spots for a subset of k gels from the n gel database. The canonical spot can then be used to estimate the position of spots missing from some of the other gels. The mean and variance (varX,VarY) of Rspot positions across a set of gels is mapped to the coordinate system of the Rgel. When that has been done, the set of gels of the same experimental class can be replaced by a single averaged gel called the Cgel' (the estimate of the canonical gel). The mean displacement vector of a canonical spot from its associated landmark spot (in any gel under discussion) is used to extrapolate the position in gels where the expected canonical spot is missing.

Another way of thinking about the CGL DB is as consisting of Rspot sets in a "3-dimensional spread sheet". Each Rspot set (defined in Figure 1) then is a distribution of protein concentrations for that protein as seen across the set of gels - that is the lines connecting spots across gels is illustrated in Figure 1. This then allows us to test each Rspot for statistically significant differences when the gels in the CGL DB are taken from different disease states or experimental conditions. Exploratory data analysis consists of partitioning and searching the data in different ways to find the "view" showing the meaningful differences between the subsets of data.

Tukey in [GalW86] and others have suggested that the use of visual feedback is essential in exploratory data analysis. We have available a number of tools, both visual and numeric, with which to dissect a set of data and we will be illustrating some of them here. Fisher et al. in [GalW86] suggest a definition of exploratory data analysis that is appropriate for 2D gel databases: "Exploratory data analysis can be characterized as a search for regularity or structure among objects in an environment, and the subsequent interpretation of discovered regularity."

Large numbers of gels may be analyzed by GELLAB-II, improving statistical power, making spot differences more apparent, and more easily identifying outlier spots or gels using clustering techniques.

The data reduction and analysis consists of three main stages: (1) data acquisition, (2) data reduction, and (3) exploratory data analysis shown in the flow chart in Figure 2 . Stages (1) and (3) are performed interactively while (2) is performed automatically in the background as a batch process. We discuss (1) and (3) here. Figure 3 shows a flow chart of the iterative process used in the exploratory data analysis.


                                                                User or Batch
                                                                Processing
   ---------------------------------------------------
   | DATA ACQUISITION: gel scanning, enter gel       | Interactive
1. | experiment information, region of interest,     |
   | and  OD calibration.                            |
   ---------------------------------------------------
                              |
                              V
   ---------------------------------------------------
   | BATCH JOBS CREATION: given project name, list   | Interactive
   | of N gels, experimental classes, reference      |
2. | gel.                                            |
   ---------------------------------------------------
                              |
                              V
   ---------------------------------------------------
   | LANDMARK N-1 gels: to define landmark spots     | Interactive batch
3. | common to the Reference gel and the other gels. |
   ---------------------------------------------------
                              |
                              V
   ---------------------------------------------------
   | SEGMENT N gels: to find and quantitate spots    | Background batch
4. | in gel images generating N Gel Segmentation     |
   | Files (GSF).                                    |
   ---------------------------------------------------
                              |
                              V
   ---------------------------------------------------
   | PAIR SPOTS in N-1 gels with Reference gel: this | Background batch
5. | Generates N-1 Gel Comparison Files (GCF).       |
   ---------------------------------------------------
                              |
                              V
   ---------------------------------------------------
   | BUILD COMPOSITE data base: merge the N-1 GCFs   | Background batch
6. | into a single searchable Composite GeL (CGL) DB.|
   ---------------------------------------------------
                              |
                              V
   ---------------------------------------------------
   | EXPLORATORY DATA ANALYSIS: of composite         |  Interactive
7. | database Search, compute and display `views' of | or
   | 3D (CGL) gel data. Generates images, plots and  |  Background batch
   | tables: Rmaps, mosaics, cluster dendrograms,    |
   | protein expression profiles, (t-, F-, WMW-,     |
   | KW-, missing spots) statistical tests etc.      |
   ---------------------------------------------------

Figure 2. Flow chart showing data reduction and analysis process. The type of processing is indicated by: Interactive, background batch, and Interactive batch. The data reduction is divided into 3 parts: (a) data acquisition (steps 1-3) which acquires gel images, gel information and landmarks needed for processing in the following stages; (b) batch data reduction (steps 4-6) reducing multiple gel image data to a single composite database by spot segmentation, pairing and composite database construction; and finally (c) exploratory data analysis. This last stage is open ended and reports can include derived images, graphic plots and tables. Additional exploratory data analysis may also be done using background batch.



     +------->---- Composite Gel (CGL) DB
     |                |
     ^                | Normalize DB                    (setup CGL DB)
     | Yes            V
  Renormalize? -->--- + Set statistics limits,          (change PREFILTER parameters)
     |         No     | Assign gels to experimental classes,
     |                | Set pairing labels, pIe-MW region,
     |                | Set working set of gels,
     |                | Select density normalization method.
     |                | Select experimental classes to compare.
     |                V
     |             Composite Gel DB
     ^                |
     |                | Search DB using one (of many
     |                | possible) tests
     |                V
     |             SEARCH RESULTS LIST (SRL) of Rspots
     |                | 
     |  Yes           V
     +------<----- Redo search?                          (analyze results)
     |                | 
     |                V
     |             Generate SPSS or SAS files   --->--+
     |             of Rspot set data                  |
     |                |                               |
     |                + <--[original gel image files] |
     |                |                               |
     |                |                        Further analysis
     |                V                               |
     |             RMAP or MOSAIC derived images      |
     ^                |                               |
     | Further        | Display images                |
     | refinement     |                               |
     |                V                               V
     +-----<----- Evaluate visual and numeric results?   (evaluate results)
            No        |
                      | Yes (found putative results)
                      V
                   

Figure 3. A flow chart showing the iterative database search process used in the exploratory data analysis of the composite gel (CGL) database (DB). The search results desired might be to find spots which are significantly different or to find spots whose expression profiles cluster similarly, etc. These constitute a putative result that must be further investigated. Iterative search and evaluation allows testing new hypotheses which arise during the process.

GELLAB-II Graphical User Interface

In order to make GELLAB-II easier to use, we have made improvements to the interactive components of GELLAB-II. Some of this interaction deals with manipulating images while some deals with manipulating textual information. A graphical user interface (GUI) is a mouse-based method for users to interact with a computer without having to remember large numbers of commands. Push buttons, pull-down menus, forms, dialog and popup messages help reduce the amount of information a user is required to learn in order to use the system. The Graphical User Interface of GELLAB-II is implemented using the network based X-Windows System [SchR88]. The overall GUI design allows the user to interact directly with gel databases through images by clicking on spots as well as by use of pull-down menus - at all stages of data reduction. Examples from different parts of the system are presented.

2D gel data acquisition

Figure 4 illustrates the initial part of the data acquisition process. Gel specific information is entered in an experiment data entry form. Then two regions of interest are defined for the gel: (a) the region in the gel where the spots are found, and optionally (b) the region where a neutral density (ND) calibration step-wedge standard scanned with the gel is found. The ND wedge is used if the scan data is not already calibrated in optical density (OD). This wedge data is then analyzed to obtain the calibration and the resultant grayscale-to-OD calibration curve. All of this information is then associated with the name of the gel image and is called its accession information. Experiment information is accessioned though data entry forms - the fields of which may be user defined or modified.

NOTE: in the following figures (4-5, 7-19)showing screen shots, clicking on the figure will bring up the high resolution version in your Web browser so you can see the details.

Accessioning a gel

Figure 4. Accessioning a gel associates gel image file name, experimental information, optical density calibration, and region of interest with a unique accession number for that gel. This information is required in subsequent analysis. The experiment information is entered in the data entry form on the left. Except for the accession number and image file name fields which are fixed, all of the other fields may be defined by the user.

Accesioning a gel with computing window

Figure 4.1 Accessioning a gel also lets the user define a the computing window or region of interest is defined by a rubber band box in the gel image on the right. Since the BioImage scanner data is already calibrated in OD, we did not need to calibrate it. Gel images need to be calibrated in optical density (OD). This is required for gel scanners not calibrated in OD by analyzing a calibrated neutral density step wedge scanned with the gel [not shown].

Batch jobs creation

The data acquisition session finishes by creating three UNIX batch scripts for doing the data reduction. The first script does interactive landmarking. When completed, it automatically starts the second script to segment the spots in the gels into spot lists followed by pairing the spot lists. Finally, it in turn starts the third script which runs the composite gel database program on a special database command language script constructing the initial database from the paired spot files. At this point the database is ready for the investigator to begin interactive exploratory data analysis. The initial data reduction process is illustrated in Figure 2 and the later exploratory phase is illustrated in Figure 3 .

Landmark gels to define landmark spots

Spot pairing requires corresponding landmark spots to be defined between the Reference gel and each of the other N-1 gels. Figure 5 shows two gels being interactively landmarked. Landmarks can be added, deleted, moved or reviewed. Upon finishing landmarking of the two gels, the set of (xRi,yRi,xGi,yGi) coordinates for in landmarks (i going from 1 to K landmarks) data for the N-1 gels G is used to update the landmark database.

Landmarking a gel to the reference gel

Figure 5. Landmarking is an interactive process which defines a set of corresponding spots between the Reference gel and each of the other gels in the database. One gel is landmarked at a time. Landmark spots may be defined, redefined, and deleted. The two small windows can be flickered to locally align them about the two putative spots being considered as a corresponding landmark spot. Flickering images in real time means alternately displaying the two images in the same space on the screen (switching every 1/10 to 3 seconds at user controlled rates) while moving one relative to the other. In these gels from the 77 gel study, the Reference gel or Rgel on the left shows the landmarks which were defined. Red indicates it was landmarked in both gels, green that it is to be landmarked, and blue that it is currently being landmarked. [You can try this flicker alignment method using our Flicker system ( http://www.lecb.ncifcrf.gov/flicker on your gel images or images from Internet gel databases. Flicker is described in an extended paper of Lemkin et al. (1997) Electrophoresis 18(3-4):461-470.

Composite gel database system - CGELP2

After we have reduced the gel data to paired spot lists, we use the the composite gel database system (CGELP2) to merge these together into a single composite gel database file illustrated in Figure 1. The initial database script did a preliminary normalization of the database and some initial statistical searches. In addition to constructing the composite gel database, CGELP2 is used to help perform the exploratory data analysis. This system has an extensive command line language which may be invoked by either: a) a dumb terminal, b) batch script files, c) graphical user interface. The database exploration session log is displayed in an output window and may be selectively saved in a disk file at the user's option. The CGELP2 command history is saved for review or modification and re-execution. This is illustrated in Figure 6.


a) Command Structure of CGL DB program

               Typed or batch script file commands <---+
                                          |            |
                                          |            |
                                          V            |
 +--< Direct manipulation <---- Command Evaluation     ^
 |    Graphical User Interface            |            |
 |                                        |            |
 |                                        V            |
 +-----------<--------- Command History file ----->----+


b) Example from History File generated by CGL DB program
      . . .
   301:gels
   302:limits
   303:set density mode//least-squares//
   304:set srl subsets//directory////
   305:inquire//tp-test/ratio//1,2//0.005//2.0
   306:inquire//print-spot//689////
   307:set srl subsets//assign//tp-test/RatioLimit:2.0 min OD <=1.8, p-value<0.005////
   308:inquire//sort srl//1,2
   309:inquire//sort srl//3,4
   310:inquire//tp-test//1,2//0.005
   311:set srl subsetse//assign//tp-test OD<=1.8, p-value<=0.005////
   312:set srl subset//directory////
   313:set srl subset//list//11////
   314:backup
      . . .


Figure 6. Command evaluation in the composite gel database program (cgelp2). a) The command structure of shows the relation of the Graphical User and typed command interfaces to the Command evaluation. Commands may be invoked from either the direct manipulation Graphical User Interface (GUI) or from the typed or file scripted commands. b) Shows examples of command history entries saved between data analysis sessions. One or a series of commands from the history may be recalled, modified and re-executed as part of the exploratory data analysis process. Items selected from the GUI menus appear first in the history command string and are followed by any additional parameters separated by '//'s.

Examples Of Exploratory Data Analysis

We now show some examples from a GELLAB-II exploratory user session. Because of space, we present just a few of the types of displays and analyses available.

GUI interface: Dynamic Rmap of spots of interest

Figure 7 shows the composite database GUI user interface. Notice the pull down menus at the top (consisting of over 60 top level commands). The gel image on the left is the "dynamic Rmap". This shows changes in the current spots of interest during searches of the database. For example, viewing the results of a search or of a previous search. In addition, the user can click on spots to inquire about them in the CGL DB.

Composite gel database when started

Figure 7. The composite gel database system display is shown when first started. The top row of buttons invoke pull-down menus with over 60 commands with the menu for the FILE options shown. A dynamic Rmap of the Reference gel for the 77 gel database is shown on the left (any other gel in the database may be substituted during the data analysis). A variable resolution and size zoom popup window reflects the cursor position in the dynamic Rmap. Various mouse and accelerator key actions are listed in the right scrollable output window. Data in the output window may also be saved in a log file. The dynamic Rmap is used for both direct query by the user of any spot in the database. It is also used for showing which spots result from searches or from retrieval of significant spots saved from previous database queries. The dynamic Rmap shows the status of which spots are in the Search Results List (SRL). The dynamic Rmap shows the landmark spots with Rspot numbers for the 77 gel database.

Printing protein concentrations by clicking on a spot

Figure 8 shows the dynamic Rmap after editing the results of a t-test search for spot differences between two experimental classes (to be discussed). Figure 9 shows the results of clicking on a spot in the dynamic Rmap spot which causes its CGL DB data to be displayed in the scrollable window on the right.

Dynamic Rmap showing spots in the database

Figure 8. The dynamic Rmap image in the composite gel database system for the 77 gel database is shown after "editing" Rspots from a previous database search. Rspots visible in the dynamic Rmap are those currently in the Search Results List. The user can delete or add spots in the Search Results List by clicking on spots in the dynamic Rmap. The RSPOT SUBSETS menu is shown in the middle of the display. The SRL subsets directory is in the middle right below the resized popu zoom window. It shows subsets of Rspots found to be significant and saved in previous analyses as Search Results List Subsets (denoted SSRL[i] for subset i). Subset SSRL[3] holds 77 gel spots significant for CASE/CONTROL differences and has been restored into the SRL and the dynamic Rmap. Some of the SET SRL SUBSET menu operations shown in the menu include: Assign-SRL-to-SSRL-subset, Delete-SSRL-subset, Explicit-definition, Directory-of-SSRLs, List-SSRL[i], Restore-SSRL[i], Union, Intersection, Subtract, SPSS-file-from-SSRL-subset, etc. Note that spot searches can optionally be restricted to spot in a particular SSRL[i] subset as part of the PREFILTER (shown in figure 10).


An Rspot's numeric values

Figure 9. Any Rspot set's numeric values can be displayed by clicking on the spot in the dynamic Rmap. This causes Rspot set data from the composite gel database to be reported in the scrollable window on the right. Data may be presented summarized by a) as a table sorted by normalized density in the (middle "Rspot data" popup window) where the spot from each gel is represented, or b) experimental class in the (lower "Rspot data" popup window). Two popup windows illustrate this for Rspot[933] in the 77 gel database. The Print button in the popup windows may be pressed to print a report of the corresponding window. Alternatively, if you know the numeric Rspot name of a spot, you can print it by this number instead of clicking on the spot.

Setting the Rspot PREFILTER prior to a search

Before a search is done, one must design a partition of the gel and spot feature data and set search parameters. These parameters are applied to each Rspot set during operations on the database to determine whether it is "visible" to the database operation. This pre-testing process is called the PREFILTER. Figure 10 shows the popup data entry form to review and selectively change the PREFILTER preferences limits.

The PREFILTER

Figure 10. The PREFILTER is a set of spot and gel feature preferences which control database access to gels and Rspot sets in subsequent searches, analyses and views of the data. The data entry form is invoked by the SET PREFILTER menu option. This allows selectively changing these parameters. The PREFILTER is used to prescreen data before passing it to the particular test to be performed and is shown for the 77 gel database.

Setting the Gel Experimental Classes prior to a search

When doing a multiple experimental class search, it is necessary to divide the gels into the corresponding subsets of gels to be compared. Figure 11shows the results of the SET CLASSES command which uses the associated gel experiment information to automatically partition the gels into their associated classes. When comparing classes, an Rspot (consisting of corresponding spots from different gels) is automatically subdivided by class and the PREFILTER applied to each subset of spots from gels belonging to each class.

Partitioning a set of gels by experimental conditions

Figure 11. The investigator may partition gels by experimental condition in the composite gel database. Those subsets of gels with similar experiment condition attributes are called members of the same experimental class. This display is invoked by the SET CLASSES menu option for partitioning gels by class name. The gel experiment accession information is automatically used to partition the gels into their associated classes. The classes are then used to specify search conditions later when performing searches. Note that a gel may belong in more than one class. For example, in the 38 gel database shown here it may be a CASE or CONTROL gel or a MALE or FEMALE.

Performing a t-test to find spots differences between classes

Figure 12 shows some of the parametric and non-parametric INQUIRE search options under the Query CGL DB pull down menu. Figure 13 shows a dialog box where the user specifies which experimental classes are to be compared. Figure 14 shows the results of performing a t-test at a p-value of less than 0.005 between the case and control gels for the 77 gel study. Spots found were automatically displayed in the dynamic Rmaps and Search Results Lists (SRL) while the names of these spots that appear to be significant appear in the popup window showing them and their t-test statistics. Figure 15 shows the results for a similar analysis done on the 38 gel study. These table reports were generated by GELLAB-II. Similarly, any of the other INQUIRE tests can be performed.

Some possible GELLAB-II database queries

Figure 12. Some possible GELLAB-II database queries are shown in the pulldown menu for the Query CGL DB button shown for the 77 gel database. These include parametric, non-parametric, and other types of searches. These searches are performed on all Rspots in the entire composite gel database which pass the PREFILTER test.


The class dialog window

Figure 13. The class dialog window pops up for the user to specify which experimental gel classes are to be used in the TP-TEST t-test search (or any other multi-class search). The dynamic Rmap shows the 77 gel Search Results List found data from a previous search with the same classes and a p-value < 0.005.


Results from 77 gel database analysis

Figure 14. The 77 gel database results are shown in the dynamic Rmap. These were obtained by performing a t-test at a p-value of < 0.005 between the CASE and CONTROL gels. The Rspots found were put into the in the Search Results List (SRL) and automatically displayed in the dynamic Rmap on the left while the t-statistics features are displayed in the output window on the right. These spots were verified using mosaic images and additional numeric analysis. The top part of the right output window shows the tail end of the t-test search while the bottom shows these Rspots displayed in a table generated by the SORT SRL BY P-VALUE operation.


Results from 38 gel database analysis

Figure 15. The 38 gel database results are shown in the dynamic Rmap. These were obtained by performing a t-test at a p-value of < 0.02 between the CASE and CONTROL gels and verified using mosaic images and numeric analysis similar to that used for the 77 gel database. Two SORT SRL BY P-VALUE operations were performed to generate tables for the same Rspots: a) CASE(1) vs CONTROL(2) shown at the top right, and b) MALE(3) vs FEMALE(4) shown at the middle right of the output window. The popup SRL SUBSETS window at the bottom right shows the SRL subsets which were currently saved in the system with the significant spots we are working on here saved in SSRL[13].

Reviewing results of previous spot searches

The above Figure 8 shows a SRL subset directory which lists the names of sets of Rspots found to be significant (for one reason or another) and were saved previously during the analysis. Derived sets of Rspots can be created by performing Boolean operations (union, intersection and subtraction) on other sets of Rspots. These derived sets can in turn be saved and used in further operations.

Generating a mosaic image of one spot in N gels

Figure 16 shows one of several Rspot mosaic images generated Rspot[933]. The mosaic image(s) are constructed by extracting a small region about each corresponding gel and assembling them top to bottom and left to right -- sorted by increasing (normalized) protein concentration. Each panel is labeled with the gel accession number (gel identifier), protein concentration and experimental class of the gel.

Mosaic image for Rspot[933] in 77 gel database

Figure 16. Shows a mosaic image for an Rspot, Rspot[933], in the 77 gel database. The mosaic is spread over a number of sub-images since the panels for a large database will not fit on the screen at the same time. The mosaic is constructed by extracting a small region about each corresponding gel and assembling them from top to bottom and left to right sorted by increasing (normalized) protein concentration. Each panel is labeled with the gel accession number (gel identifier), protein concentration and experimental class of the gel. If there are more than 16 gels in the mosaic, it is spread over as many additional subimages are required since up to 16 gel panels appear in each subimage.

Generating protein expression profiles for a set of Rspots

Figure 17 shows protein expression profiles for the set of Rspots in the SRL. This allows the investigator to graphically place spots adjacent to one-another to find similar patterns.

Protein expression profiles

Figure 17. Protein expression profiles are shown for the set of significant Rspots in the 38 gel database. This option allows the investigator to graphically place spot profiles adjacent to one-another to compare patterns.

Generating a dendrogram cluster analysis from selected spots

Figure 18 shows a dendrogram clustering of the gels based on a subset of the Rspots found to be significant with p-values from the t-test search less than 0.005. These spots were saved in a SRL subset. We have removed some of the spots which were probably noise spots from around the edge of the gel before the clustering by editing them from the dynamic Rmap.

Dendrogram plot showing clustering

Figure 18. Shows a dendrogram plot showing clustering 77 gel study gels base on three significant Rspots [933, 1555, 1558] as a function of gel density for CASE and CONTROL experimental conditions. These three spots are shown in the popup zoom window. The number of gels used in the dendrogram was reduced in each class in order to make the plot more visible. The average-cluster method is used. Clustering indicates the predictive power of these spots, taken together, in separating gels from the experimental classes. Here gels were clustered as a function of the pattern of Rspot density. Similarly, Rspots in the SRL may be clusted as a function of the gel expression profile. Another clustering ( Figure 18a) is done for three significant Rspots [666, 780,759] as a function of gel density for MALE and FEMALE experimental conditions.

Generating an ordered protein expression profile table for a set of Rspots

Figure 19 shows an ordered expression profile table for the set of Rspots in the SRL. It is another way of graphically clustering spots with similar expression profiles indicating possibly similar control mechanisms.

Ordered expression profile table

Figure 19. A ordered expression profile table is shown in the scrollable output window for the significant Rspots in the 38 gel database. The top part of the table shows ratios of means of classes (m1,m2,m3,m4): m1/m1, m2/m1, m3/m1, and m4/m1. The lower part of the table shows clustering of most-similar spots with a least square error distance between expression profiles of < 0.20. Spots which are most-similar to the spots are on the left part of the table.

Summary

GELLAB-II is a data-reduction system for 2D protein gel scientific data. Having the proper data handling and analysis tools makes exploratory data analysis of moderate to large gel databases much easier. Having sufficient data helps the analysis by making the statistical patterns more apparent. GELLAB-II hides much of the complexity of the data reduction from the user by performing it invisibly in the background under batch. By providing a graphical interface, user productivity improves and the entire system is easier to learn and use.

Acknowledgments

Thanks are due Rob Ashmore and Kyle Upton for their help on the UNIX and X-Windowing aspects of GELLAB-II while they were at PRI/FCRDC. Also thanks to our other collaborators Lewis Lipkin, Eric Lester, Carl Merril, Steve Aley, Peter Wirth, Peter Sonderegger, Heinz Busse, Peter Rogan, and many others for useful suggestions. Some their ideas were incorporated in GELLAB over the years improving its capabilities.

References

There are additional
GELLAB-II references on the GELLAB Web site.

[LemP89a] P. F. Lemkin, E. P. Lester (1989) Database and Search Techniques for 2-D Gel Protein Data: A Comparison of Paradigms For Exploratory Data Analysis and Prospects for Biological Modeling, Electrophoresis 10(2) 122-140.

[LemP89b] P. F. Lemkin, GELLAB-II, A workstation based 2-D electrophoresis gel analysis system, 53-57, in T. Endler, S. Hanash (Eds), Two-Dimensional Electrophoresis, VCH Press, W. Germany, 1989.

[LemP93] P. F. Lemkin, GELLAB-II Reference Manual (PDF), Draft, 1993, 677 pages.

[LipL80] L. E. Lipkin, P. F. Lemkin (1980) Database techniques for multiple PAGE (2D gel) analysis, Clin. Chem. 26, 1403-1413.

[SchR88] R. W. Scheifler, J. Gettys, R. Newman, The X Window System, Digital Press, Bedford, MA, 1988.

[GalW86] W. A. Gale, Artificial Intelligence and Statistics, Addison-Wesley, Reading, MA, 1986.

[MyrJ93a] J. E. Myrick, S. P. Caudill, M. K. Robinson, I. L. Hubert (1993) Quantitative two-dimensional electrophoretic detection of possible urinary protein biomarkers of occupational exposure to cadmium, Appl. Theoret. Electrophoresis 3, 137-146.

[MyrJ93b] J. E. Myrick, P. F. Lemkin, M. K. Robinson, K. M. Upton (1993) Comparison of the Bio Image Visage 2,000(tm) and the GELLAB-II two-dimensional electrophoresis image analysis systems, Appl. Theoret. Electrophoresis 3, 335-346.


Go to GELLAB-II | LECB | NCI Home Page ]

Revised $Date$ / P. Lemkin