r1 - 01 Feb 2005 - 07:14:41 - DominicBattreYou are here: TWiki >  IBG Web  > MassSpecParsers

Parser strategies for the Mass Spec Projects

Bold items are those which have found a consensus.

Parsers

  • RAP:
    • allows fast indexing,
    • works for Java and C++
    • does not have an mzXML writer (needs to be written by us)
  • Eric's SAX still needs to be done
    • a lot of development still needs to be done, persues similar approach as RAP
  • JAXB
    • can be used already!
    • works only for Java
    • all data must be kept in memory before an mzXML file can be written. This scales very badly if we want to return a large subset of the database as a single mzXML file.
    • might be substituted by a RAP solution later. The substitution should be rather easy.

How to store data

  • We can keep everything in the database except for the peek lists, which are stored as DTA files on the disk (one spectrum per file)
    • Advantage: Easy to handle, fast access to single spectrum on web interface
  • We can keep everything in the database and all mass specs of an accession in a mzXML file
    • Advantage: Easy and fast to generate ad-hoc downloads (see below)
  • A mixed approach: Spectra are stored in DTA files. Once an accession gets accepted by the curator, an mzXML file gets generated that contains all information from the database and the DTA files

Explaination ad-hoc downloads: One might search for all qtof analyses and want to download all accessions found. In this case we want to generate a single big mzXML file on the fly which contains all the information available. This might be several TB in size. Therefore, the JAXB approach will not work in the long term, but by that time we can write an event based writer for RAP, which accepts a sequence of accessions, one after another.

Decision for the languages

  • C/C++: we want to use RAP for reading and converting the objects with a bridge to the existing classes. The next step is to write a writer for the RAP classes and use the bridge to copy values between the RAP and IBG data model.
  • Java: We want to go for JAXB now. We might need to change that, but the use of the parser will be bound to few classes and therefore, it should be easy to subsitute it.

-- DominicBattre - 01 Feb 2005

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
Illinois Bio-Grid
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback