Research Practicum in Computational Biology (CSC 542) topics
The following are potential projects that the student can join for this course (CSC 542). Please see
information at the bottom of the page for information on how to register for the course.
Also, please see the
Course Description for CSC 542
A special note to external biological researchers who are potential sponsors of an informatics student: we have
a page that describes what the students already know to help you decide whether you want to sponsor a student or not.
Project: Automating workflows for large-scale phylogenetic tree reconstruction
Site: Field Museum
Supervising Researcher: Dr. Rick Ree
Date Posted: Fall, 2006
Long-term systematics projects generate new sequences and pull in others from online databases such as Genbank on an ongoing basis. The manual workflow of updating databases, alignments and phylogenetic analyses is very labor-intensive. Tools for helping automate the process of integrating new data into existing phylogenetic trees as it becomes available would greatly facilitate tree-of-life research.
Specifically, students interested in this project could focus on extending the existing codebase of mor (
http://mor.clarku.edu), an open-source framework for automatically building large phylogenies from DNA sequence data.
Project: Macroevolutionary Models of Organism Traits and Biogeography
Site: Field Museum
Supervising Researcher: Dr. Rick Ree
Date Posted: Fall, 2006
Stochastic (probabilistic) models are useful for historical inference in evolutionary biology in their flexibility and allowance for uncertainty about process details. Such models can be used to predict outcomes that can then be compared to observed data. I have developed a crude model for the evolution of geographic range on phylogenies that can be used to infer ancestral distributions of species. This methodology is currently very slow as it relies heavily on simulation and is implemented in python. Scaling the approach with more efficient algorithms and parallel processing is needed.
Project: Comparative Studies of Organism Traits and Biogeography
Site: Field Museum
Supervising Researcher: Dr. Rick Ree
Date posted: Fall, 2006
Phylogenetic relationships are becoming increasingly important to biological studies outside of systematics. Ecologists, for example, commonly want to know how the species in a particular community are related. These sorts of questions fall outside the scope of traditional systematics studies and require synthesizing disparate sources of phylogenetic information. Phylogenetic "supertree" methods are one way to approach the problem, but others are needed. In particular, no methods currently leverage prior knowledge about hierarchical relationships in assembling data matrices from Genbank, for example. Ideally we need tools for integrating diverse sources of information in constructing trees for ad hoc taxon lists.
Project: Web-Based Biodiversity Informatics
Site: Field Museum
Supervising Researcher: Dr. Rick Ree
Date Posted: Fall, 2006
Web-based mapping is currently a hot topic, as exemplified by Google's new map pages. I maintain an online database of plant and fungal specimens from south-central China <http://hengduan.huh.harvard.edu/fieldnotes> and would like to explore how new mapping technologies could be applied to this site. Also, I would like to implement emerging specimen data-sharing standards (Darwin Core) in this site to improve its dissemination of biodiversity data.
Project: Computer Models of Skull Motion
Site: Field Museum
Supervising Researcher: Dr. Mark Westneat
Date posted: Fall, 2006
Dr. Westneat has computer models that do a long set of bioengineering calculations of forces, angles and motions of bones in various vertebrate skulls (fishes, reptile, birds). The fish one is most complete and detailed, the bird one needs the most work. These models accept large sets of morphometric data, have a user interface for simulating muscle inputs, and output large sets of data from those simulations.
Goals for further development of these models include:
1. Translating from Codewarrior Pascal on Mac platform to a more recent Java or C environment (Codewarrior if possible).
2. Incorporating new calculations into the models. For example, random number generators and/or brownian motion evolutionary algorithms for simulations.
3. Incorporating muscle force calculations into the models.
4. General application friendliness, drawing and output windows, etc.
Project: Archaeological Spanish Documents and Chronicles
Site: Field Museum
Supervising Researcher: Dr. Antonio Cureb
Date Posted: Fall, 2006
Dr. Cureb uses documents and chronicles written in Spanish for their information about prehistoric peoples. The problem with them is that there are many documents and the chronicles are extremely long. Sometimes it can take days to find a piece of information if you don't know where to look. Dr. Cureb has begun a database project to store these documents. The data base includes few fields, but it has transcriptions from the chronicles and key words. He has been thinking of publishing the database as a reference source for other researchers. However, although the data base can fit in a CD, he needs a search capability included so it is functional.
The project will include data base cleaning and formatting the fields correctly, working with issues related with searches (e.g., standardizing the key word field and others), and creating the search capability.
Project: Python scripting for the Mesquite evolutionary analysis package
Site: Field Museum
Supervising Researcher: Dr. Rick Ree
Date Posted: Fall, 2006
Mesquite (www.mesquiteproject.org) is a modular, open-source Java program for evolutionary analysis emphasizing phylogenies. It is heavily oriented toward graphical user interaction, with a rich set of components for visualizing and interacting with trees and associated phylogenetic data. It has a custom built-in scripting framework, but power and usability could be substantially improved with jpython, a Java implementation of the Python language. Jpython was designed to be easily embedded into Java software to allow applications to be scripted in this powerful and easy-to-learn language. Dr. Ree would like to work with both an interested student and Mesquite's primary author, Wayne Maddison, to enhance Mesquite in this way.
Signing up
Please contact Dave Angulo at
dangulo@cti.depaul.edu for details on registering for this course or more information.
Also see the
Bioinformatics Courses at DePaul CTI page (
http://facweb.cti.depaul.edu/bioinformatics/BioinformaticsCourses.htm).
--
DaveAngulo - 27 Mar 2007