PREreview of Seeking community input for: mzPeak - a modern, scalable, and interoperable mass spectrometry data format for the future

by Sapphire Pelican

Published: February 27, 2025
DOI: 10.5281/zenodo.14934154
License: CC BY 4.0

I appreciate the intent and effort involved in developing this document and proposed approaches. However, it seems that new file formats are being created often, and some of the proposed features of this format could be counterproductive for certain aims. To note my bias, I am a big fan of the mzMLb format.

The biggest difference between this and mzMLb appears to be the desire to collect as much metadata as possible. For a Thermo LC and MS, there are a variety of LC parameters collected and encoded in Thermo raw files (pump block pressures, flow meter pressure, column oven temperature, and more) as well as MS metadata (at least 257 chromatogram traces). But if I use a non-Thermo LC with a Thermo MS, those LC metadata are likely missing from the MS raw files, which complicates extraction. I think extraction of all the desired metadata from generated files across all MS vendors to be almost impossible, which weakens the argument of moving from “future-proof” mzMLb, since some metadata are already encoded in mzMLb.

The proposed storage of data in multiple files reminds me a lot of the Sciex instruments generating .wiff, .wiff2, and .wiff.scan files for a single run. Often, user error results in data repositories missing one or more of these file types, which results in the metadata or scan data being lost. I think the existing approach with a single file for mzML, mzXML, mzMLb, and other formats to be vastly superior to having separate files with the aim for faster processing.

Just my two cents as I would be happy to be proven wrong with this file format becoming very popular, but those two issues are sizeable.

Cheers,

Chris

Competing interests

The author declares that they have no competing interests.

Comments

Write a comment

Comment by Tim Van Den Bossche

Authored by Tim Van Den Bossche

Published

April 30, 2025

DOI

10.5281/zenodo.15309291

License

CC BY 4.0

Hi Chris,

Thanks for your comment! If you would like to further contribute to this manuscript which is now in its final state, and be listed as co-author, please let us know by providing your contact details (there’s no ORCID linked to Sapphire Pelican ;) ). We will submit the manuscript very soon, so don’t wait too long :)

All the best,

Tim

Competing interests

The author of this comment declares that they have no competing interests.
Comment by Samuel Wein

Authored by Samuel Wein

Published

March 3, 2025

DOI

10.5281/zenodo.14959490

License

CC BY 4.0

Hi Chris,

Thanks for the review. I am also a big fan of mzMLb, and one of the technical implementations that we are looking at is just trying to extend that format to encompass the further metadata that we want to collect, specifically with providing the instrument vendors an explicit section to store vendor specific data that doesn’t nicely correspond to our controlled vocabulary.

I’d also prefer to find a solution that works in a single file (or at the very least transparently packs multiple files into an archive on close). Your comments on the user error mirror what I have seen when trying to grab raw data out of Pride depositions, more files means more chances for user errors. We will keep that in mind in trying to balance speed versus complexity.

Competing interests

I am an author of this preprint.