-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail to parse Tide pepXML #42
Comments
The log file generated by PDV when loading the files shows there is a problem in spectra mapping between pepXML and the mgf files.
For mgf/pepXML input from Crux, we use start_scan from the pepXML file as spectrum ID to extract MS/MS spectrum data from mgf. The start_scan from the pepXML we generated from a previous version of Crux is the index of spectrum in MGF file not scan number in MGF file. But it looks like in your pepXML file, it’s scan number in the mgf file. Are there any changes in start_scan in the latest Crux? |
Unfortunately, I don't know the answer to this. Looking back at the release notes, it could be that these changes were in this update: May 28, 2020: Added fixes for pepXML schema validation failures. |
If there is no scan number (SCANS) in MGF, what will be used as start_scan in latest Crux pepXML output? It is common that mgf files don't have scan number.
|
It uses ordinal numbers instead in that case. Here is the line that gets printed to the log file:
A sample MGF and pepxml file are attached. plasmo-neighbors.trypsin-p.narrow.tide-search.pep.xml.txt |
I am really unsure if this is helpful, but for what its worth I was able to parse PDV using the "database searching" feature using a pepxml containing a single PSM and using the complete mgf file. However no ions are annotated: As soon as I reduce the mgf file to the single scan of interest, it does not parse. What does work is using PDV's "one PSM" feature. In that case it will accept the mgf with the single scan. I have attached the complete mgf, the single mgf, and the pepxml containing the single PSM. MSB19717Trypsin021915_1910.mgf.txt |
I am not familiar with the data parsing functions in Crux. But it was my understanding so far that crux uses proteowizard to parse the input files, mgf etc. I think there has been some proteowizard update. I hope my comment helps. |
For Crux, the current version can correctly match PSMs in pepXML to spectra in mgf file only when the start_scan in pepXML is an ordinal number of spectrum in mgf file. |
I tested Crux v4.1, the current version of PDV works well with mzML/pepXML, mzML/mzid, mzXML/pepXML, mzXML/mzid files. I added a few examples generated using Crux v4.1 to the README. For MGF input, it looks like the spectrum ID mapping for both pepXML and mzid outputs was changed in v4.1 so PDV cannot parse the result sucessfully in some cases. So far, I found the start_scan was assigned differently with different head formats of MGF:
Considering different spectra may have the same scan number when a MGF file is combined from multiple MGF files, I would suggest to always assign start_scan as the ordinal number of spectrum in mgf file. Using a consistent way for spectrum mapping for the same format of MS/MS data will make users parse the result easier. |
I tried to parse a Tide pepXML file, but failed. The error is "Failed to parse the PepXML file, please check your file." I suspect that this is because our format has changed since you first evaluated Tide's PepXML back at Crux v3.2. Can you take a look at the attached file and see if it's possible to support it, or if we need to make changes on our end?
plasmo-neighbors.trypsin-p.narrow.tide-search.pep.xml.txt.gz
MSB17171Trypsin030814.mgf.txt.gz
The text was updated successfully, but these errors were encountered: