Write a comment

PREreview of Seeking community input for: mzPeak - a modern, scalable, and interoperable mass spectrometry data format for the future

by Yasin El Abiead

Published: February 26, 2025
DOI: 10.5281/zenodo.14933345
License: CC BY 4.0

I am excited to see this development.

I have just a few questions:

I don’t see any discussion of if/how the diversity of vendor formats is addressed. For example, while Thermo numbers scans sequentially Sciex applies a different scan numbering system utilizing 3 different numbers.
- Will there be the same flexibility as in mzML files to account for this?
- Will there be a way to have some superimposed identifier, allowing to retrieve scans based on a scan order irrespective of the numbering system native to the original Vendor?
It is great that this format aims to support metadata relating to hyphenated instruments such as LC pump pressures as well as experimental metadata. One concern I have here is that it sounds like this information will be stored as free text. Will there be dedicated keys to communicate used instruments and/or experimental setups via controlled ontologies?
One of the main challenges when working with mass spectral raw data as a data scientist is that such raw files have to be viewed in relationship to each other (for example samples are run within a sequence and can then be aligned using RT and mz). However such information is often lost once raw data have been stored in a repository. Will there be a key that serves for a sequence identifier?
Will there be a dedicated key to store a hashed unique identifier - that can be used to deduplicate different open formats of the same raw vendor file?
While all mass spectral raw data are originally acquired in profile mode the first thing that usually happens during conversion is to centroid mass peaks to simplify processing and optimize storage space. While this choice is practical — and I don’t see this changing in the near future — I think it would be great if the average mass resolution (which can be easily derived from profile raw data) could be stored — maybe per scan in the scan header in a dedicated key.

I realize that not all of this can be implemented at this point, but I think reserving keys for such entries early would go a long way.

In any case. Amazing development!

Best, Yasin

Competing interests

The author declares that they have no competing interests.

You can write a comment on this PREreview of Seeking community input for: mzPeak - a modern, scalable, and interoperable mass spectrometry data format for the future.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.