Poster Presentation 24th Annual Lorne Proteomics Symposium 2019

From raw data to biological insights: A computational pipeline for SWATH-MS-based proteomics (#117)

Srikanth S. Manda 1 , Max Wittmann 1 , Michael Dausmann 1 , Michael Hecker 1 , Brett Tully 1 , Peter Hains 1 , Phil Robinson 1 , Roger Reddel 1 , Qing Zhong 1
  1. Children's Medical Research Institute, University of Sydney, Westmead, New South Wales, Australia

SWATH-MS is a type of data-independent acquisition (DIA) method that combines the advantage of both shotgun and targeted proteomics, attaining deep proteome coverage, reproducibility, and accurate protein quantification. The analysis of SWATH-MS data involves generation of a spectral library that records prior knowledge about the mass spectrometric and chromatographic behaviour of peptides, which are used to deconvolute highly complex SWATH-MS data in a targeted manner. Tools such as PeakView®, Spectronaut X™, OpenSWATH or DIAUmpire can be used to analyze the SWATH-MS data against the spectral library of interest. Upon controlling false discovery rates (FDR), protein inference and quantification could be achieved by tools like aLFQ  , Diffacto and mapDIA. To our knowledge, currently, there is no pipeline consolidating these various steps.

In this study, we propose a computational pipeline that integrates various open source tools to generate a final protein matrix starting with the raw input. The pipeline starts with the conversion of wiff files to MGF for data-dependent acquisition and mzML for SWATH-MS runs respectively, followed by the application of three complementary search engines (MS-GF+, Mascot and X!Tandem) against the UniProt database with decoys. The search results are combined using the PeptideShaker algorithm . Results are subsequently filtered at 1% peptide and protein levels to generate the final spectral library. OpenSWATH pipeline is then used along with the PyProphet and TRIC algorithms, with FDR being controlled for both peptides and proteins at either run-specific, experiment-wide or global contexts. Finally, the Diffacto method is used for protein inference and quantification. Exploratory data visualisation using various plots is further provided on the resulting protein matrix.  

We have applied this pipeline to a benchmark cancer dataset and compared with a commercial pipeline, showing comparable results both at the peptide and protein levels. We further demonstrated that the pipeline is robust and easily scalable to any large dataset generated by SWATH-MS.