r4 - 02 Mar 2006 - 23:06:07 - KevinDrewYou are here: TWiki >  IBG Web  > IBGResearch

Illinois Bio-Grid


DePaul Bioinformatics Group


Research Areas

Genomics

GenBank growth chartGenomics is the field of investigating proteins based on their primary structure (DNA nucleic acid sequence or amino acid sequence). Biologists frequently are able to inexpensively determine the sequences of amino acids in their proteins (or the nucleic acid sequence in the DNA that equates to their protein). Those Biologists then frequently want to look for homologous proteins, viz. have a similar evolutionary origin. Software tools are used to search for homologous proteins that are in a national database of sequenced proteins: the NCBI's GenBank. If they find such a protein and if the protein in GenBank has a known function, then they have a good idea what the function of the new protein probably is.

The growth of Data at NCBI (GenBank) has been exponential, and the computation time grows by at least the square of the size of the data. This is quickly growing beyond the capacity of normal computers to compute. Additionally, Biologists would like to be able to search for homologous proteins against a batch of input protein sequences (derived from the mass spectrometry equipment), finding target proteins that are homologous to all of the input sequences. It is unlikely that the NCBI will ever expand their software to include such functionality because it is so computationally intensive. We are developing a toolkit of such software. This toolkit, called the IBG Workbench, includes FASTA, BLAST, and Smith-Waterman algorithms, all converted to run with batches of input sequences and also to run in a distributed environment on the Grid. Parts of this workbench were demonstrated at SuperComputing 2002 convention and won two of the three Grand Challenge competitions.

MassSpec.jpgIn a second Genomics project, we are working on a Grid enabled version of software algorithms that will take raw data from a Mass Spectrometer and calculate the amino acid sequence of the input protein. For example, a Biologist might start with a whole cell digest of some organism. They would inject a sample of the extract into a series of columns where peptides released from one column are separated on a second column and then are detected and fragmented by the mass spectrometer. The mass spectrometer is acquiring data at about the rate of 3000 spectra per hour.Massive calculations on each spectra must be done for de novo sequencing. In order to handle this huge compute load, we are working on an algorithm to do this in parallel on the Grid. This tool will be part of the IBG Workbench.

Proteomics

MassSpectrogram.jpgWe are working on additional modules for the IBG Workbench (mentioned above) that will be useful to proteomics researchers trying to predict tertiary structures of proteins from their amino acid sequences. The intention is to produce reusable modules that could be loaded together allowing researchers to concentrate on their particular areas of research interest. This framework will include modules to read DNA and amino acid sequences from the various GenBank databases as well as primary, secondary, and tertiary structures of proteins. These IBG Workbench modules will also include chemical libraries to calculate energy levels of molecules, as well as modules that use these chemical libraries to perform ab-initio calculations of protein folding. Other methodologies of predicting protein structure, include rule-based and "lego" algorithms will also be supported with their own modules. Having a suite of modules for researchers to choose from will allow them to minimize their development time because they will only need to concentrate on the portion of the problem that their research addresses.

Phylogenetics

PhyloTree.jpgPhylogeny.jpgPhylogenetics is the study of evolutionary relationships (phylogeny). We are working with Phylogenetics collaborators at the Field Museum of Natural History in Chicago on determining feasible evolutionary relationships of given taxa by looking at differences in DNA sequences and determining the evolutionary tree starting at some hypothetical evolutionary ancestor of all of the taxa and determining minimum number of mutations required to reproduce the differences in the taxa studied.



IGB Workbench

All of the above BioMedicalInformatics applications share quite a bit of functionality. Certainly, all of the interactions with the Grid is common functionality; however, connections to the NCBI databases (GenBank), sequence comparisons, etc. are common to many of these applications. This common functionality is useful to a wide array of other BioMedical applications as well. Understanding the usefulness in producing a workbench of such tools and a platform to allow development of other tools using the common infrastructure, we are developing the IBG Workbench of these modular tools. All software developed will be open source and available to all Computer Scientist or BioMedicalInformatics researchers world-wide. See our IBGSoft page for more information.
Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r4 < r3 < r2 < r1 | More topic actions
 
Illinois Bio-Grid
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback