PREreview of Seeking community input for: mzPeak - a modern, scalable, and interoperable mass spectrometry data format for the future
- Published
- DOI
- 10.5281/zenodo.14934154
- License
- CC BY 4.0
I appreciate the intent and effort involved in developing this document and proposed approaches. However, it seems that new file formats are being created often, and some of the proposed features of this format could be counterproductive for certain aims. To note my bias, I am a big fan of the mzMLb format.
The biggest difference between this and mzMLb appears to be the desire to collect as much metadata as possible. For a Thermo LC and MS, there are a variety of LC parameters collected and encoded in Thermo raw files (pump block pressures, flow meter pressure, column oven temperature, and more) as well as MS metadata (at least 257 chromatogram traces). But if I use a non-Thermo LC with a Thermo MS, those LC metadata are likely missing from the MS raw files, which complicates extraction. I think extraction of all the desired metadata from generated files across all MS vendors to be almost impossible, which weakens the argument of moving from “future-proof” mzMLb, since some metadata are already encoded in mzMLb.
The proposed storage of data in multiple files reminds me a lot of the Sciex instruments generating .wiff, .wiff2, and .wiff.scan files for a single run. Often, user error results in data repositories missing one or more of these file types, which results in the metadata or scan data being lost. I think the existing approach with a single file for mzML, mzXML, mzMLb, and other formats to be vastly superior to having separate files with the aim for faster processing.
Just my two cents as I would be happy to be proven wrong with this file format becoming very popular, but those two issues are sizeable.
Cheers,
Chris
Competing interests
The author declares that they have no competing interests.