Illinois Bio-Grid
DePaul Bioinformatics Group
IBG Reading List
This page is a list of papers, books, and other material that are useful for computer scientists to read for several different areas of bioinformatics and grids. This list was compiled mainly for the student researchers in the Illinois Bio-Grid (
http://www.illinoigbiogrid.org), but may be useful to others. For more information, contact Dave Angulo at
dangulo@cti.depaul.edu
Materials in
bold are
"Must Read" materials.
"Must Read" does not mean front to back reading but rather these readings are important and you will want to be aware of what information they contain. Also, if you are involved with a project under a specific topic, it is benficial to read most of the
"Must Read" material under that topic.
Out of all the books, the Krane book should be read first. The Lesk book also has very good introductory material. The Lesk book is available in DePaul's 24/7 e-library for free for DePaul students. The first chapter of the Krane book is available for members of the IBG lab for free (ask Dave). Also, you can get the international edition of the Krane book for about $18.
The DePaul library provides students access to
Scientific American in PDF format. Use this resource to access topical reviews such as articles on grid computing (April 2003), genomics (April 2005, August 2006), the human genome project (July 2000) and personal genome projects/the $1000 genome (January 2006), proteomics (April 2002), protein structure-function (August 1996), data mining for protein pathways (May 2005), alternative splicing (April 2005), genetic engineering (June 2006) and other relevant bioinformatics topics.
Here are some tips on how to search for research articles (using "PubMed" and Medline instead of just Google) and how to stay current in your field.
Glossaries
Programming (MUST READ)
General BioInformatics Readings:
-
Introduction to the computational challenges in biology http://www.statoo.com/en/publications/2003_Bioinf_SSS_46.pdf
-
Krane, Dan and Raymer, Michael . Fundamental Concepts of Bioinformatics. Benjamin/Cummings, 2000.
- This is a very elementary, but also very readable text. It is up to date, and has good references to the literature, and a useful glossary.
- Ask Dave for an free copy if you wish
- Koski, Timo. Hidden Markov Models for BioInformatics. Dordrecht: Kluwer Academic Publishers, 2001.
- A rigorous treatment of the subject
- Fogel and Corne. Evolutionary Computation in BioInformatics. San Francisco: Morgan Kaufmann; 2003.
- Well written, suitable for both computer scientists and biologists. Covers a number of topics from multiple sequence alignment to protein folding to metabolic pathways to in silico drug design. Uses AI techniques from genetic algorithms to neural networks. Also listed under "Artificial Intelligence books"
-
Lesk, Arthur M. Introduction to BioInformatics. Oxford: Oxford University Press, 2002.
- (http://www.oup.co.uk/best.textbooks/biochemistry/bioinf/)
- A good introductory text. Easy to read. Also see his Introducction to Protein Architecture book.
- Also available at http://library.books24x7.com/book/id_4302/toc.asp but only for DePaul students, who must do the following:
- Go to the library's page at www.lib.depaul.edu .
- Click "Books, videos & music" (under "RESEARCH") -> "books24x7" (under "ONLINE")
- Log into DePaul's EZ proxy using the CampusConnect ID and password when prompted.
- You'll reach the books 24x7 page after the proxy login.
- Do a search under "Look Up Books You Know" using "Introduction to Bioinformatics" or "Bioinformatics"
- Click on the link for the book when it pops up.
- Mount, David W. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, 2001. ISBN: 0879695978.
- Campbell, Malcolm and Heyer, Laurie. Discovering Genomics, Proteomics, and Bioinformatics. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, 2003. ISBN 0-8053-4722-4
- Tramontano, Anna. Introduction to Bioinformatics: Chapman and Hall/CRC, 2007.
- A basic overview of bioinformatics. Handy glossary at the beginning of each chapter, however, there is no all-encompassing glossary at the end of the book. Data storage, sequence analysis, protein evolution (phylogenetic trees), database search methods, prediction of 3-D protein structure, and fold recognition methods are discussed
Free Journals:
Free Books:
Phylogenetic books:
- Brooks, Daniel and McLennan, Deborah. Phylogeny, Ecology, and Behavior. Chicago: University of Chicago Press, 1991.
- Talks more generally about how to definie a phylogeny given various types of phenotype information. It is a difficult reading book intended for biologists.
- Nie, Masatoshi and Kuma, Sudhir . Molecular Evolution and Phylogenetics. Oxford: Oxford University Press, 2000.
- A mathematically focused book on phylogeny
- Difficult to read but very detailed.
- Felsenstein, Joseph. Inferring Phylogenies. City?: Sinauer Associates, 2003.
- Easy to read introduction. Sometimes too superficial to understand the content.
- Semple, Charles; Steel, Mike. Phylogenetics. Oxford: Oxford University Press, 2003.
- Very theoretical written by a Mathematician. Has a lot of theory about NP completeness.
Phylogenetic Papers:
Genomics books:
- Durbin, Eddy, Krogh, and Mitchison. Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids. Cambridge: Cambridge University Press, 1998.
- This is an outstanding reference to statistical models for biological sequence analysis. A classic. If you're going to buy only one book about bioinformatics, this should be it. It is tough reading, but is the reference book in the field.
- Attwood and Parry-Smith. Introduction to BioInformatics. Edinburgh: Prentice Hall, 1999.
- Gives a very good introduction to what biologists do with the genomic databases, and what those databases contain.
- Navarro and Raffinot. Flexible Pattern Matching in Strings. Cambridge: Cambridge University Press, 2002.
- A reference book for algorithms for string pattern matching, especially geared to biological sequences.
- A review of DNA at the advanced high-school biology level. http://www.dnaftb.org/dnaftb/26/concept/index.html
- The Microarray Standards issue of Nature Biotechnology (Sept. 2006) includes statements from the Food & Drug Administration (FDA) addressing the reliability and consistency of microarrays to analyze gene expression data (see discussion in Bio-IT World Sept. 8, 2006).
- Gibson and Muse. A Primer of Genome Science, 2nd ed. Sinauer Press, 2004. An excellent, well-illustrated book on genomics, based on the authors' experience teaching at North Carolina State University.
Homology Search
Molecular Biology books:
- Clote, Peter and Backofen, Rolf . Computational Molecular Biology. Indianapolis: Wiley, 1999.
- A nice graduate level text book for a one-semester course. More mathematically sophisticated than most
- Setubal, João and Meidanis, João . Introduction to Computational Molecular Biology. Brooks Cole, 1997.
- A clear undergraduate textbook, surprisingly up-to-date for a six year old text in a fast moving field
- Kinter, Michael ; Sherman, Nicholas. Protein Sequencing and Identification Using Tandem Mass Spectrometry. Indianapolis: Wiley, 2000. ISBN: 0-471-32249-0
Artificial Intelligence approaches to BioInformatics books:
- Fogel and Corne. Evolutionary Computation in Bioinformatics. San Francisco: Morgan Kaufmann, 2003.
- Well written, suitable for both computer scientists and biologists. Covers a number of topics from multiple sequence alignment to protein folding to metabolic pathways to in silico drug design. Uses AI techniques from genetic algorithms to neural networks. Also listed under "General BioInformatics books"
Artificial Intelligence approaches to BioInformatics papers:
Molecular Dynamics Simulations of Lipid Layers papers:
- Feller, Scott. Molecular Dynamics Simulations of Lipid Layers in Current Opinion in Colloid & Interface Science. 2000. http://persweb.wabash.edu/facstaff/fellers/publications/curr_opinion.pdf
- Lipid Studies A list of abstracts of papers on lipid simulations
- Chiu, S. W., Jakobsson, Eric, Mashl, R. Jay, Scott, H. Larry. Cholesterol-Induced Modifications in Lipid Bilayers: A Simulation Study in Biophysical Journal. 2002 83: 1842-1853
- Mashl, R. Jay, Scott, H. Larry. Subramaniam, Shankar, Jakobsson, Eric Molecular Simulation of Dioleoylphosphatidylcholine Lipid Bilayers at Differing Levels of Hydration in Biophysical Journal. 2001 81: 3005-3015
- Chiu, S. W., Jakobsson, Eric, Scott, H. Larry Combined Monte Carlo and Molecular Dynamics Simulation of Hydrated Lipid-Cholesterol Lipid Bilayers at Low Cholesterol Concentration in Biophysical Journal. 2001 80: 1104-1114
- Chiu, S. W., Jakobsson, Eric, Subramaniam, Shankar, Scott, H. Larry. Combined Monte Carlo and Molecular Dynamics Simulation of Fully Hydrated Dioleyl and Palmitoyl-oleyl Phosphatidylcholine Lipid Bilayers in Biophysical Journal. 1999 77: 2462-2469
- Chiu, SW, Clark, M, Balaji, V, Subramaniam, S, Scott, HL, Jakobsson, E. Incorporation of surface tension into molecular dynamics simulation of an interface: a fluid phase lipid bilayer membrane in Biophysical Journal. 1995 69: 1230-1245
- Scott, HL, McCullough, WS. Lipid-cholesterol interactions in the P beta' phase. Application of a statistical mechanical model in Biophysical Journal. 1993 64: 1398-1404
- Scott, HL . Lipid-cholesterol interactions. Monte Carlo simulations and theory in Biophysical Journal. 1991 59: 445-455
- Scott, HL, Pearce, PA. Calculation of intermolecular interaction strengths in the P beta' phase in lipid bilayers. Implications for theoretical models [published erratum appears in Biophys J 1989 Jul;56(1):following 224] in Biophysical Journal. 1989 55: 339-345
- Scott, HL, Jr, Coe, TJ. A theoretical study of lipid-protein interactions in bilayers in Biophysical Journal. 1983 42: 219-224
- Scott, HL, Cheng, WH. A theoretical model for lipid mixtures, phase transitions, and phase diagrams in Biophysical Journal. 1979 28: 117-132
- Programming with GROMACS (from Groningen University)
tein Dynamics
http://staff.science.uva.nl/~vreede/bioinfo/gromacs.html adapted at the FNWI (Faculty of Science) in the Netherlands
-
- Programming With CHARMM
- Nota Bene: We don't use CHARMM, we use GROMACS, but I found these instructive. (CHARMM is $500)
- A Brief CHARMM Tutorial using water molecules to get started using CHARMM.
- Another CHARMM Tutorial using butane molecules. This introduces Gnuplot and VMD.
- CHARMM Documentation
- A CHARMM tutorial from Krzysztov Kuczera at the University of Kansas.
- An introduction to CHARMM data structures from Krzysztov Kuczera at the University of Kansas.
- A CHARMM tutorial from Roland Stote
- http://groups.yahoo.com/group/cheminformatics/files/Papers/
- A cheminformatics site - I don't know if the papers are worth reading or not. If you read these, please let me know!
Proteomics
Textbooks
- Timothy D. Veenstra and John R. Yates III, "Proteomics for Biological Discovery", Wiley, June 2006 (ISBN: 0-471-16005-9). Yates developed SEQUEST search engine and other proteomics/MS tools. View the table of contents and download the first chapter (with a great overview of mass spectrometry) at Wiley's web site.
- Twyman, R.M. . Principles of Proteomics. New York: Garland Science, 2004 (used in CSC 541)
Tobin Sosnick: protein folding.
Protein Folding Readings:
- Lesk, Arthur. Introduction to protein architecture: the structural biology of proteins. Oxford University Press, 2001. 0198504748
- Leach, Andrew. Molecular Modelling: Principles and Applications. Harlow, England: Prentice Hall, 2001. 0-582-38210-6
- Sternberg, Michael, Ed. Protein structure prediction: a practical approach. Oxford University Press, 1996. 0-19-963496-3
- Fasman, Gerald, Ed.. Prediction of protein structure and the principles of protein conformation. 1989. 0-306-43131-9
- P. Ferrara, J. Apostolakis, and A. Caflisch. "Computer Simulations of Protein Folding by Targeted Molecular Dynamics." Proteins. May 15, 2000; Vol. 39, No. 3:252-260. http://biocroma.unizh.ch/Caflisch/protgen2000.pdf
- This could be interesting for those who having heard about our current protein folding project, but wonder about other possible strategies. This paper however is somewhat dense with information about there specific simulations, but does some general information for those willing to dig through it. Anyone reading should already know about the protein folding problem and understand its inherent complexity.
- T. Lazaridis and M. Karplus. "Effective Energy Functions for Protein Structure Prediction." Current Opinion in Structural Biology. April 2000; Vol. 10, No. 2:139-145. http://www.sci.ccny.cuny.edu/~themis/curropin.pdf
- This is an excellent article for anyone who plans to work on the Protein Folding library. For those not familiar with Monte Carlo simulation the descriptions of energy function might not be interesting, but since OOPS is a MC/SA this paper would provide a good introduction on how to score theoretical protein folds. The reader should have an understanding of basic protein architecture before reading it.
- Anfinsen, Christian B., "Studies on the Principles that Govern the Folding of Protein Chains", Nobel Lecture, December 11 (1972), from Nobel Lectures, Chemistry 1971-1980, Editor-in-Charge Tore Frängsmyr, Editor Sture Forsén, World Scientific Publishing Co., Singapore, 1993. http://nobelprize.org/chemistry/laureates/1972/anfinsen-lecture.pdf
- Anfinsen, Christian B. "Principles that Govern the Folding of Protein Chains", Science, Vol 181(96), 223-30, July 20 (1973).
- Fersht, Alan. Structure and Mechanism in Protein Science: A guide to Enzyme Catalysts and Protein Folding. New York: W. H. Freeman, 2003.
Inverse Protein Folding Reading:
- J. L Klepeis and C. A. Floudas, "In Silico Protein Design: A Combinatorial and Global Optimization Approach", SIAM News, Vol 31, num. 1, January/February (2004)
- N. A. Pierce and E. Winfree, "Protein Design is NP-Hard", Protein Engineering, vol. 15, no. 10, 779-782 (2002)
- D. Benjamin Gordon, Geoffrey K. Hom, Stephen L. Mayo, Niles A. Pierce, "Exact Rotamer Optimization for Protein Design", Wiley Periodicals, Inc (2002)
- J. W. Ponder and F. W. Richards, "Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes", Journal of Molecular Biology, vol 193, 775-791 (1987)
- L. Regan and S. E. Jackson, "Protein Design: Theory and Practice", Current Opinion in Structural Biology" vol 13, 479-481 (2003)
- D. B. Gordon and S. L. Mayo, "Radical Performance Enhancements for Combinatorial Optimization Algorithms Based on the Dead End Elimination Theorem", Journal of Computational Chemistry, vol. 19, 1505-14 (1998)
- F. Offredi, F. Dubail, P. Kischel, K. Sarinski, A. S. Stern, C. Van de Weerdt, J. C. Hoch, C. Prosperi, J. M. Francois, S. L. Mayo, and J. A. Martial, "De Novo Backbone and Sequence Design of an Idealized alpha/beta-barrel Protein: Evidence of Stable Tertiary Structure", Journal Molecular Biology, vol 325, 163-174 (2003)
- J. Desmet, M. D. Maeyer, B. Hazes, I. Lasters, "The Dead End Elimination theorem and its Uses in Protein Side-Chain Positioning", Nature, vol. 356, 539-542 (1992)
- B. Dahiyat and S. L. Mayo, "De-Novo Protein Design: Fully Automated Sequence Selection", Science, vol. 278, 82-87, (1997)
- J. M. Shifman and S. L. Mayo, "Exploring the Origins of Binding Specificity through the Computational Redesign if Calmodulin", PNAS, vol. 100, no. 23, 13274-13279 (2003)
- H. M. Blaine, D. Deepshikha, W. A. Baase, E S. Zollars, S. L. Mayo, B. W. Matthews, "Repacking the Core of T4 Lysozyme by Automated Design" Journal of Molecular Biology, vol. 332, 741-756 (2003)
- S. A. Marshall and S. L. Mayo, "Achieving Stability and Conformational Specificity in Designed Proteins via Binary Pattering", Academic Press, vol. 305, 619-631 (2001)
- D. N. Bolson and S. L. Mayo, "Enzyme-like proteins by Computational Design", PNAS, vol. 98, no. 25 14274-14279 (2001)
- P. Berman, B. Das Gupta, D. Mubayi, R. Sloan, G. Turan, Y. Zhang, "The Protein Design Sequence Problem in Canonical Model on 2D and 3D Lattices"
- D. T. Jones, "De Novo Protein Design Using Pairwise Potentials and a Genetic Algorithm", Protein Science, vol. 3, 567-574 (1994)
- J. R. Desjarlais and T. M. Handel, "De Novo design of the hydrophobic cores of proteins", Protein Science, vol. 4, 2006-2018 (1995)
- G. A. Lazar, J. R. Desjarlais and T. M. Handel, "De Novo design of the hydrophobic cores Ubiquitin", Protein Science, vol. 6, 1167-1178 (1997)
- D. T. Jones, "De Novo Protein Design of the Hydrophobic Cores of Proteins", Protein Science, vol. 3, 567-574 (1994)
- T. Hiroyasu, M. Miki, T. Iwahashi, Y. Okamoto, "Dual Individual Distributed Genetic Algorithm for Minimizing the Energy of Protein Tertiary Structure"
- J. R. Desjarlais, N. D. Clarke, "Computer Search Algorithms in Protein Modification and Design", Current Opinion in Structural Biology, vol. 8, 471-475 (1998)
- C. A. Voigt, D. B. Gordon, S. L. Mayo, "Trading accuracy for Speed: A Quantitative Comparison of Search Algorithms in Protein Sequence Design", Journal of Molecular Biology, vol 299, 789-803 (2000)
- L. Wernisch, S. Hery, S. Wodak, "Automatic Protein Design with all Atom Force-Fields by Exact and Heuristic Optimization", Journal of Molecular Biology, vol 301, 713-736 (2000)
- A. Jaramillo, L.Wernisch, S. Hery, S. Wodak, "Automatic Procedures for Protein Design, Combinatorial Chemistry and High Throughput Screening", vol 4, 643-659 (2001)
- H. W. Hellinga, F. M. Richards, "Optimal Sequence Selection in Proteins or known structures by Simulated Evolution", Biochemistry, Vol 91, 5803-5807 (1994)
What is Mass Spectrometry
- Proteomics Informatics Course lectures on-line The Institute for Systems Biology teaches a 5 day course on interpreting Mass Spectrometry data and other aspects of Proteomics. All 10 lectures are on-line (as PDFs) at the ISB Seattle Proteome Center web site.
- Ask for Dave's lecture notes.
Mass Spectrometry Reading:
- Hernandez P, Muller M, Appel RD. Automated protein identification by tandem mass spectrometry: issues and strategies. Mass Spectrom Rev 2006; 25: 235-254.
- Kinter, Michael and Sherman, Nicholasr. Protein Sequencing and Identification Using Tandem Mass Spectrometry. New York: John Wiley and Sons, 2000
- Excellent discourse on sequencing proteins of tryptic digests with Mass Spectrometers. Discusses techniques for approaching the task of sequencing raw mass spectra and the chemistry behind ionization in Chapter 4. See Dave for a free copy.
- Pevzner, Dancik, Tang. "Mutation- and modification-tolerant protein identification via tandem mass-spectrometry.: Fourth International Conference on Computational Molecular Biology (RECOMB 2000). Tokyo, Japan, April 2000, 231-236.
- http://portal.acm.org/citation.cfm?id=332560
- Also Journal of Compututational Biology. 2000;7(6):777-87
- http://www-cse.ucsd.edu/groups/bioinformatics/papers/mutation-tolerant-protein-id.pdf
- Pevzner is the leader in the field of MS algorithms. This is an important work.
- Database search in tandem mass spectrometry is a powerful tool for protein identification. High-throughput spectral acquisition raises the problem of dealing with genetic variation and peptide modifications within a population of related proteins. A method that cross-correlates and clusters related spectra in large collections of uncharacterized spectra (i.e., from normal and diseased individuals) would be very valuable in functional proteomics. This problem is far from being simple since very similar peptides may have very different spectra. Pevzner introduces a new notion of spectral similarity that allows one to identify related spectra even if the corresponding peptides have multiple modifications/mutations. Based on this notion, they developed a new algorithm for mutation-tolerant database search as well as a method for cross-correlating related uncharacterized spectra.
- Pevzner, Mulyukov, Dancik, Tang. "Efficiency of database search for identification of mutated and modified proteins via mass spectrometry." Genome Research 2001 Feb;11(2):290-9.
- http://www-cse.ucsd.edu/groups/bioinformatics/papers/database-search-mutated-proteins-ms.pdf
- Although protein identification by matching tandem mass spectra (MS/MS) against protein databases is a widespread tool in mass spectrometry, the question about reliability of such searches remains open. Absence of rigorous significance scores in MS/MS database search makes it difficult to discard random database hits and may lead to erroneous protein identification, particularly in the case of mutated or post-translationally modified peptides. This problem is especially important for high-throughput MS/MS projects when the possibility of expert analysis is limited. Thus, algorithms that sort out reliable database hits from unreliable ones and identify mutated and modified peptides are sought. Most MS/MS database search algorithms rely on variations of the Shared Peaks Count approach that scores pairs of spectra by the peaks (masses) they have in common. Although this approach proved to be useful, it has a high error rate in identification of mutated and modified peptides. Pevzner describes new MS/MS database search tools, MS-CONVOLUTION and MS-ALIGNMENT, which implement the spectral convolution and spectral alignment approaches to peptide identification. They further analyze these approaches to identification of modified peptides and demonstrate their advantages over the Shared Peaks Count. They also use the spectral alignment approach as a filter in a new database search algorithm that reliably identifies peptides differing by up to two mutations/modifications from a peptide in a database.
- http://www.swissproteomicsociety.org/digest/2005/issue16.html
- Swiss Proteomics Society maintains a categorized proteomics reading list
- http://www.oardc.ohio-state.edu/tomato/Proteomics%20Presentation.pdf
- LutefiskHowTo.pdf
- http://www.hairyfatguy.com/Lutefisk/
- Not a very useful site, but it was the original site for Lutefisk. Also has a Haiku Corner
- Mass Spectrometry File Formats
- MatrixScience's description of the differences between common MS/MS data file formats
- Bristow, Anthony; Webb, Kenneth; Lubben, Anneke; and Halket, John. Reproducible product-ion tandem mass spectra on various liquid chromatography/mass spectrometry instruments for the development of spectral libraries Pub?
- Description of a study on spectral libraries across Mass Spectrometry platforms. This paper gives convincing results that show a spectral library can be useful when searching spectra of different mass spectrometers.
- Hoofnagle, Andrew; Resing, Katheryn; and Ahn, Natalie. Protein Analysis by Hydrogen Exchange Mass Spectrometry. Pub?
- This paper gives background on techniques used for hydrogen exchange experiments. It also gives theory on hydrogen exchange, in particular exchange rates of peptides.
- Geer, Lewis, et. al. Open Mass Spectrometry Search Algorithm. Pub?
- This paper discusses an open source algorithm for searching a database of proteins using mass spectrometry data. Also includes a results based comparison to another search algorithm, Mascot.
- Horn, David, et. al. "Automated de Novo Sequencing of Proteins by Tandem High-Resolution Mass Spectrometry." - Pub?
- Busch, Kenneth. "Reduction of noise through Informatics." - Pub?
- It mainly talks about Mass Spectra and Chemical noise and seems like a good paper.
- Frank, Ari and Pevzner, Pavel. "PepNovo: De Novo Peptide Sequencing via Probabilistic Network Modeling." Anal. Chem.2005, 77,964-973.
- Audens - Automatic De Novo Sequencing
- http://www.ti.inf.ethz.ch/pw/st_proj/sa_mathis.pdf
- This is an interesting paper, particular in the explanation of the mowers. However the mass spec details are not very in depth. In fact the simplicity of the mass spec in somewhat misleading, and so shallow that this material is not recommended for anyone who would be working on our project, as they would have to "unlearn" material to be useful on a project. Nevertheless, the details on the mowers are very interesting and highly recommended for anyone doing research on mass Spec, as it discusses the unsolved problem of reducing noise and thus reinforcing the meaningful peaks.
- E.J. Finehout and K.H. Lee. "An Introduction to Mass Spectrometry Applications in Biological Research." Biochemistry and Molecular Biology Education. Vol. 32, No. 2: 93-100, 2004.
- http://www.leelab.org/research/papers/BAMBED32-93.PDF
- A good low level intro to the field of Mass Spec, but too much duplicated information for anyone in our group who should read the Kinter/Sherman book. However, it has some good overviews of the abilities of Mass Spec to do things other than merely protein analysis, such as studying micro/macro biological molecules, and protein quantification. Interestingly enough it includes a small portion on detecting post-translation modifications.
- Steen, H., and Mann, M. (2004). The abc's (and xyz's) of peptide sequencing. Nat Rev Mol Cell Biol 5, 699-711.
- http://www.abrf.org/ResearchGroups/MassSpectrometry/EPosters/ms97quiz/abrfQuiz.html
- This is a quiz on peptide sequencing using MS, it has answers so you can check to see how you did
- http://genomebiology.com/2004/6/1/R9
- An extremely important paper for us. This describes in detail what the PeptideAtlas people are doing. They have a database of empirical Mass Spectra that they're using for genome annotation (see next entry)
- http://www.peptideatlas.org/
- An extremely important site for us. This is a database of empirical Mass Spectra that they're using for genome annotation (see previous entry)
- Lennon, JJ; and Walsh, KA. Locating and identifying posttranslational modifications by in-source decay during MALDI-TOF mass spectrometry. Protein Sci. 1999 Nov;8(11):2487-93.
- Peptide Mass Fingerprinting with optimized Peak Detection paper: PeakDetection_Gras.pdf:
- The June 2006 issue of BioTechniques had a special section: Mass Spectrometry for Proteomics Analysis
- The power of mass spectrometry in biological discovery John R. Yates BioTechniques Vol. 40, No. 6: pp 779 (June 2006) 000112199.pdf Full text (PDF - 82K)
-
- Education or vocational training? (personal essay) John B. Fenn BioTechniques Vol. 40, No. 6: pp 780-781 (June 2006) Full text (HTML) | Full text (PDF - 35K)
-
- Advancing proteomics with ion/ion chemistry David M. Good and Joshua J. Coon BioTechniques Vol. 40, No. 6: pp 783-789 (June 2006) 000112194.pdf Full text (PDF - 742K)
-
- Analysis of posttranslational modifications of proteins by tandem mass spectrometry Martin R. Larsen, Morten B. Trelle, Tine E. Thingholm, and Ole N. Jensen BioTechniques Vol. 40, No. 6: pp 790-798 (June 2006) 000112201.pdf Full text (PDF - 379K)
-
- Sampling and analytical strategies for biomarker discovery using mass spectrometry Thomas P. Conrads, Brian L. Hood, and Timothy D. Veenstra BioTechniques Vol. 40, No. 6: pp 799-805 (June 2006) 000112196.pdf Full text (PDF - 326K)
- The University of New South Wales (New Zealand) has a cool interactive Flash tool to help understand how CID (Collisionally Induced Dissociation) occurs and how to assign peaks to resulting mass spectra.
ProteolyticO18LabelingStrategies_ms_rev-2007_o18.pdf: 2007 Review on proteolytic 18O-labeling strategies for quantitative proteomics by Miyagi and Rao of Case Western's Center for Proteomics.
Spectra comparison papers
- Wan, Katty X.; Vidavsky, Ilan; Gross, Michael L. 2001. Comparing Similar Spectra: From Similarity Index to Spectral Contrast Angle. 2002 Am. Soc. Mass Spectrometry 1044-0305/02
- This is an article comparing the similarity index and spectral contrast angle and will be an important paper for anyone interested in database comparisons. This paper is not specific to proteomics Mass Spectrometry but rather Mass Spectrometry in general.
- Alfassi, Zeev B. 2002. On the Comparison of Different Tests for Identification of a Compound from its Mass Spectrum. 2003 Am. Soc. Mass Spectrometry 1044-0305/03
- This article is a review of the previous article and its pitfalls. A reply by Wan should be attached at the end.
- Alfassi, Zeev B. 2003. On the Normalization of a Mass Spectrum for Comparison of Two Spectra. 2004 Am Soc. Mass Spectrometry 1044-0305/04
- This article discusses the utility of normalizing spectra for comparison.
- Fu, Yan; Yang, Qiang; Sun, Ruixiang; Li, Dequan; Zeng, Rong; Ling, Charles X.; Gao, Wen. 2003. Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry. Bioinformatics Vol. 20 no. 12 2004 1948-1954
- This is an article that builds on the previous articles to create a method for comparing peptide mass spectra. Again, this article is useful for anyone interested in database spectrum comparisons as well as other analysis programs.
MS Standards (XML)
- mzXML & mzData
- HUPO-PSI Standard Format Merger Announcement
- An announcement describing the deliverables and time-lines for the HUPO-PSI project to merge mzData and mzXML to create a single standard interchange format for proteomics. mzXML 4.0 will merge with mzData 2.0 in December 2006/January 2007, and mzXML will cease to exist as a separate standard. We already support mzData in our current IBG Desktop application, but developers need to be aware of this pending change.
- pepXML
- protXML
Gene Designing
- Jayaraj, Reid, and Santi. "GeMS: an advanced software package for designing synthetic genes" in Nucleic Acids Res. 2005; 33(9): 3011�3016.
- Moreira, Andres. "Genetic Algorithms for the Imitation of Genomic Styles in Protein Backtranslation." Theoretical Computer Science. 2004 322(2): 297 - 312.
- Ravi Vijaya Satya, Amar Mukherjee, Udaykumar Ranga. "A Pattern Matching Algorithm for Codon Optimization and CpG Motif-Engineering in DNA Expression Vectors," csb, p. 294, IEEE Computer Society Bioinformatics Conference (CSB'03), 2003.
Protein Docking
General Proteomics
- Golemis, Erica and Adams, Peter. Protein-Protein Interactions: A Molecular Cloning Manual, Second Edition. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, 2005. ISBN 0-87969-722-9
- Witkowski, Jan. The Inside Story: DNA to RNA to Protein. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, 2005. ISBN 0-87969-750-4
- Prusiner, Stanley. Prion Biology and Diseases, Second Edition. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, 2004. ISBN 0-87969-693-1
- Campbell, Malcolm and Heyer, Laurie. Discovering Genomics, Proteomics, and Bioinformatics. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, 2003. ISBN 0-8053-4722-4
- Protein Kinesis: The Dynamics of Protein Trafficking and Stability (Cold Spring Harbor Symposia on Quantitative Biology LX). 1995. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, ISBN 0-87969-069-0
General Biology Reading:
- A review of DNA at the advanced high-school biology level. http://www.dnaftb.org/dnaftb/26/concept/index.html
- Alberts, Bray, Hopkin, Johnson, Lewis, Raff, Roberts, and Walter. Essential Cell Biology (Second Edition). New York: Garland Science, 2003. 0-8153-3480-X
- "The best" molecular biology book for Computer Scientists
Popular Science Biology Selections:
- Ridley, Matt. Genome - Pub?
- Matt Ridley is a journalist/professor who has popularized much of biology's recent movements. In Genome, Ridley takes each of our 23 chromosomes and describes an interesting gene or feature that has been "discovered" in recent years. From B&N:
- "By selecting one newly discovered gene from each of the 23 human chromosomes & telling its story, the author recounts the history of our species, from the dawn of life to the future of medicine."
- Ridley, Matt. The Red Queen - Sex and the Evolution of Human Nature - Pub?
- Ridley, Matt. Nature Via Nurture: Genes, Experience, and What Makes us Human - Pub?
- Dennett, Dan. Darwin's Dangerous Idea - Pub?
- Dennet is amongst the all-time favorite authors of one of the students in the IBG lab. This student thinks Dennet's writings should be required reading for anyone who has an affinity for Computer Science and has thought about some of the bigger questions, i.e. Free Will, Consciousness, etc. This particular book is probably one of the most thorough and accessible books on the Evolution that has been written, according to this student
- Dennett, Dan. Freedom Evolved, Consciousness Explained - Pub?
- Here he uses the analogy that consciousness is a program running on the virtual machine software of the brain!
- Dennett, Dan .(co-edited with Doug Hofstadter). The Minds I - Pub?
- Dawkins, Richard. The Selfish Gene - Pub?
- This book is a reformulation of classical evolutionary theory which places the gene at the center of our scientific analysis. It has influenced many contemporary biologists over the last 20 years or so and justice to this book can hardly be done in a couple of lines. One of the students in the IBG lab says, "It is by far, the best pop science book I've read."
- Dawkins, Richard. The Blind Watchmaker, Climbing Mount Improbable. - Pub?
- Dawkins, Richard. The Extended Phenotype. - Pub?
- Diamond, Jared. The Third Chimpanzee - Pub?
- Pinker, Steven. The Blank Slate - Pub?
- Wright, Robert. The Moral Animal - Pub?
Grid and Parallel Programming:
Globus
Cactus
MPI
- Karniadakis, Geroge, and Kirby, Robert. Parallel Scientific Computing in C++ and MPI. Cambridge: Cambridge Press, 2003.
- Recommended by a student. Covers algorithms and has lots of examples with CD-ROM
- Foster, Ian. Designing and Building Parallel Programs. Addison-Wesley: Reading, MA, 1994. ISBN 0-201-57594-9
Other
- Lea, Doug. Concurrent Programming in Java: Design Principles and Patterns, (second edition). Addison-Wesley, 1999 .
- Berstis, Viktors. Fundamentals of Grid Computing. IBM Redbooks Paper. 2002.
- This is a good intro to the concepts related to grid computing. The paper focuses mainly on high level concepts and design of systems.
Data Analysis Reading:
R and PARAFAC
For details and tutorials on using R and PARAFAC, you might want to see our
R and PARAFAC tytorial.
Packages for the emulation of MATLAB within R can be found at
http://cran.r-project.org/ --- "matlab" a R language "MATLAB emulation package" which emulates MATLAB using R and also "R.matlab" a R package for the "read and write of MAT files together with R-to-Matlab connectivity". R.matlab provides methods to read and write MAT files. It also makes it possible to communicate (evaluate code, send and retrieve objects etc.) with Matlab v6 or higher running locally or on a remote host.
Matlab tool box
A book has 10 chapters about Matlab
http://www.models.kvl.dk/courses/parafac/chap0contents.htm
Paper explains the multi-way decomposition method PARAFAC and its use in chemometrics.
http://www.models.kvl.dk/users/rasmus/presentations/parafac_tutorial/paraf.htm
http://www.ms.uky.edu/~rayens/TRICAP_2003/Presentations/Applying%20PARAFAC%20in%20Complex%20Problems.pdf
http://www.i3s.unice.fr/~khouaja/khouaja_ISCCSP-03-1106.pdf
http://www.ece.drexel.edu/telecomm/Talks/yuan.pdf
The analysis of three-way arrays by constrained Parafac methods.
http://three-mode.leidenuniv.nl/bibliogr/krijnenwp_thesis/index.html
The Parafac analysis software suite "N-Way Toolbox for MATLAB", employed by S. Rutan, S. Porter et al in paper entitled "Analysis of Four-Way Two-Dimensional Liquid Chromatography - Diode Array Data: Application to Metabolomics" can be found and downloaded (for free) from
http://www.models.kvl.dk/source/
To emulate Matlab using R please see packages "matlab" and "R.matlab" located at
http://cran.r-project.org/.
Short descriptions of these packages can be found in section "R and PARAFAC" above.
Other reading on Data Analysis
Please find the attachment at the bottom of the page titled Data_Analysis_reading.zip
Metabolomics
The following readings have relevance to R, metabolomics and mass spectrometry. They are placed here since these disciplines intersect in our metabolomic efforts.
*
Smith_ASMS05.pdf: (Poster) Metlin XCMS: Global metabolite profiling incorporating LC/MS filtering, peak detection, and non-linear retention time alignment using open-source software
*
bi0480335.pdf: Assignment of Endogenous Substrates to Enzymes by Global Metabolite Profiling
*
ac051437y_xcms.pdf: XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification
The XCMS package can be found at bioconductor.org. The XCMS package is not attached here, since it is subject to modification by it's authors; however the XCMS package flow chart
FlowChart.pdf is included here.
Paper describing the Human Metabolome Database (HMDB) - a curated collection of human metabolite and human metabolism data:
HumanMetabolomeDatabase_D521.pdf: HMDB: the Human Metabolome Database
Other Science Readings:
Wavelets
Wavelets are basically an enhancement on Fourier Transformations - which give one the frequencies contained in a signal. There are some efforts to use them in Mass Spectrometry and other bioinformatics applications - mainly as noise reduction.
The Math and Science of Wavelets
- Introductory Wavelet Tutorial http://users.rowan.edu/~polikar/WAVELETS/WTtutorial.html
- I (Dave) found this to be an extremely easy to read introduction that should help anyone.
- Here is a zip of the original pages (since they reside on a student web site and might disappear at any time) wavelets.zip (876 Kb)
- Aboufadel, Edward and Schlicker, Steven . Discovering Wavelets Wiley 1999. ISBN 0-471-33193-7
- Recommended by Thomas M. Yackish, Professor Emeritus of EE at Grand Valley State University (Michigan).
Wavelets in Mass Spectrometry
- Rejtar, T, et. al. Increased identification of peptides by enhanced data processing of high-resolution MALDI TOF/TOF mass spectra prior to database searching. Anal Chem. 2004 Oct 15;76(20):6017-28.
- Morris, JS, et. al. Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics. 2005 May 1;21(9):1764-75. Epub 2005 Jan 26.
- Coombes, KR, et. al. Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics. 2005 Nov;5(16):4107-17.
Noise Removal
A page discussing Signal Conditioning and filtering.
http://www.incogen.com/public_documents/research/interf04_finproc.pdf
Kast, Jurgen. Noise Filtering Techniques for Electrospray Quadrupole Time of Flight Mass Spectra. Elsevier Inc. 2003.
http://www.narrador.embl-heidelberg.de/GroupPages/Literature/NoiseFilteringLayout.pdf
A paper that has a good section, Section 3, on noise reduction and normailization using various techniques.
http://www.nettab.org/2005/docs/NETTAB2005_CannataroOral.pdf
Grosshans, P. B. ; Shields, J.P; Marshall, A.G. Comprehensive Theory of the Fourier Transform Ion Cyclotron Resonance Signal For All Ion Trap Geometries. J. Chem. Phys 1992, 94, 5341- 5352.
Seminars Presented to IBG
- Co-evolution between amino acid residues, presented 8/31/06 by Dr. Zhengyuan (Hugh) Wang, Research Assistant, Center of Computation and Technology (CCT) at Louisiana State University
- Integrating multiple length scales in protein folding, presented in March 2006 by Dr. Tobin R. Sosnick, Dept. of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, University of Chicago
Software Engineering
- AGILE DEVELOPMENT - EXTREME PROGRAMMING (XP): This presentation describes the importance of using a software development process model, and it specifically talks about Extreme Programming (XP) and the Agile development model.
- SOFTWARE ENGINEERING CHALLENGES IN BIOINFORMATICS: Bioinformatics grew out of molecular biology as it became clear that specialist skills were needed to organise and analyse the data being generated. Now molecular biology has reached a stage of development where continued progress depends on combining knowledge of the very small – molecular mechanisms; with knowledge of the small, medium and large; cell biology, developmental biology, systems biology and medicine for example. With this change, bioinformatics will face its biggest challenge to date: integrating very different data on very different scales with existing molecular data. The technical challenges will be enormous but surmountable; SUCCESS will depend on good communication and SOFTWARE ENGINEERING management SKILLS.
Reference Tools:
- CBioC: Collaborative Bio Curation
- A web-based platform that allows researchers around the world to participate in the first-stage curation of information extracted automatically from biomedical abstracts.
--
AbderRahmanAli - 5 Aug 2007
--
AbderRahmanAli - 18 Jul 2007
--
DaveAngulo - 11 Jul 2007
--
DaveAngulo - 23 May 2007
--
MariellenDwyer - 25 Mar 2007
--
LarryHelseth - 04 Mar 2007
--
LarryHelseth - 01 Mar 2007
--
LarryHelseth - 19 Feb 2007
--
LarryHelseth - 12 Dec 2006
--
LarryHelseth - 09 Oct 2006
--
LarryHelseth - 21 Sep 2006
--
LarryHelseth - 13 Sep 2006
--
LarryHelseth - 12 Sep 2006
--
LarryHelseth - 06 Sep 2006
--
DaveAngulo - 28 Aug 2006
--
GilKwak - 03 Apr 2006
--
GregZivich - 22 Mar 2006
--
GilKwak - 15 Mar 2006
--
DaveAngulo - 04 Feb 2006
--
DaveAngulo - 02 Feb 2006
--
DaveAngulo - 28 Dec 2005
--
DaveAngulo - 6 Oct 2005
--
KevinDrew - 19 Jan 2005
--
DaveAngulo - 13 Feb 2003
Following are mistakes made when uploading files by various people. Please fix these: