021-67658806

技术专题

MCP:利用贝叶斯模型分析“鸟枪法”鉴定蛋白质组数据

北京蛋白质组研究中心/蛋白质组学国家重点实验室朱云平研究员课题组张纪阳博士等通过建立贝叶斯模型分析“鸟枪法”鉴定蛋白质组数据,大幅提升蛋白质组质谱数据的利用率。相关论文发表在最新一期国际蛋白质组学权威杂志:《分子与细胞蛋白质组学》(Molecular & Cellular ProteomicsMCP)上面,同期杂志还发表了该所姜颖副研究员课题组、钱小红研究员课题组的两篇研究论文,创该刊单期同一单位发文数之最。

大规模、高通量的蛋白质组研究产生了海量的数据,其中包含了大量的噪声,而高可靠的数据是进一步生物学分析的基础,故目前的分析方法均采用了过严的标准,但在降低假阳性的同时也人为地造成了数据较高的假阴性及较低的利用率。因此,"在保证高可信度的前提下,最大限度地利用实验数据"一直是蛋白质组学界的追求。"鸟枪法"是目前蛋白质组鉴定中地位最重要、应用最广泛的技术策略。他们基于随机数据库策略、非参概率密度模型和贝叶斯公式,建立了串联质谱数据过滤的多元贝叶斯非参模型。通过标准蛋白和复杂样品的严格考核,表明该模型具有良好的灵敏性和普适性,可将质谱数据的利用率提高10~40%,创本领域最好水平。

Molecular & Cellular Proteomics 8:547-557, 2009.doi:10.1074/mcp.M700558-MCP200

Bayesian Nonparametric Model for the Validation of Peptide Identification in Shotgun Proteomics

Jiyang Zhang,,?, Jie Ma,?, Lei Dou, Songfeng Wu, Xiaohong Qian, Hongwei Xie, Yunping Zhu,|| and Fuchu He,**,

From the  State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing 102206, China,  School of Mechanical Engineering and Automatization, National University of Defense Technology, Changsha 410073, China, and ** Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China

Tandem mass spectrometry combined with database searching allows high throughput identification of peptides in shotgun proteomics. However, validating database search results, a problem with a lot of solutions proposed, is still advancing in some aspects, such as the sensitivity, specificity, and generalizability of the validation algorithms. Here a Bayesian nonparametric (BNP) model for the validation of database search results was developed that incorporates several popular techniques in statistical learning, including the compression of feature space with a linear discriminant function, the flexible nonparametric probability density function estimation for the variable probability structure in complex problem, and the Bayesian method to calculate the posterior probability. Importantly the BNP model is compatible with the popular target-decoy database search strategy naturally. We tested the BNP model on standard proteins and real, complex sample data sets from multiple MS platforms and compared it with PeptideProphet, the cutoff-based method, and a simple nonparametric method (proposed by us previously). The performance of the BNP model was shown to be superior for all data sets searched on sensitivity and generalizability. Some high quality matches that had been filtered out by other methods were detected and assigned with high probability by the BNP model. Thus, the BNP model could be able to validate the database search results effectively and extract more information from MS/MS data.