The user can select the specific datasets for global overview of the dataset statistics. The options include the selection of hospitals, projects, sample preparation techniques, sample types, mass spectrometry techniques, and patient disease types. The number of hospitals, patients, samples, mass spectrometry files, matrices, and identified proteins of the corresponding chosen datasets would be interactively displayed at the top right of the page.
For the matrices selected by the user, the disease type, sample type, sample preparation technique, and mass spectrometry method would be shown as a pie chart. After clicking on the legend in the upper left corner of each pie chart, the legend block color would fade and the corresponding portion would disappear. The percentage scale would be recalculated. The disappeared option will be added to the pie chart again after re-clicking on the disappeared legend.
The bar chart shows the number of proteins that are jointly identified in the user selected matrices. The horizontal axis of the bar chart represents a matrix combination, the vertical axis represents the number of proteins identified together in this matrix combination. The dot plot represents whether a dataset (by rows) exists in a dataset combination (by columns). For example, as shown in the above figure, the first column in the Bi_2 and Bi_3 rows have black dots, indicating that the same proteins have been identified in Bi_2 and Bi_3 datasets. The bar plots were sorted by the number of counts by each subset. A cumulative line curve was plotted to illustrate the cumulative percentage.
Missing rates were the missing identification ratio by sample conditions in a dataset, defined as the ratio of the number of missed identified conditions in total number of conditions. The missing rates of unified identified proteins in chosen matrices would be displayed in a tabular form.
The protein name can be specified by either a gene name or UniProt ID. The dropdown menu displayed the searchable protein which were the union of the proteins identified in all matrices. For example, the protein 'SAA1' can be either searched with its gene name 'SAA1' or UniProt ID 'P0DJI8'.
The hospital, sample type, mass spectrometry technology, and sample preparation techniques corresponding to matrixes that identified this protein would be displayed as a table in the sidebar, which is convenient for users to locate the relevant sample information.
The grouped boxplots chart shows the protein expression for the matrices that have been identified with the selected protein. The column of the box chart represents the patient’s disease type (refer to the legend on the top with each type of classification is given at the top of the box chart). The significance level of the pairwise differential change in protein expression between the two types were indicated by the corrected P value <0.5, 0.1, 0.01, 0.001 corresponds to *, **, ***, **** above the horizontal line, and insignificant changes were without any marks. Two datasets were plotted independently, i.e. Ning_1 and Tang_1. Ning_1 was further grouped by tissue types as this dataset acquired samples from different tissue organs. Tang_1 was plotted by weeks as this dataset included the temporal information.
Four conditions in the sidebar can be selected, which are the interested matrix, the interested pathway, and the two disease conditions to be compared. The last three options were depending on the first option.
The hospital, sample type, and disease subgroups corresponding to each matrix are displayed in the sidebar, so that users can choose the matrix according to the disease subgroups of interest.
The pathway network indicated the expression of the intersection protein of the matrix and pathway selected by the user under different disease types. Among them, the gray nodes were the proteins that are not in the selected matrix in the pathway. The nodes in red or blue colors were the proteins that exist in the matrix at the same time in the pathway. The size of the node indicated the log2Fold-Change value of the protein expressed in two disease states, with up-regulation indicated by red and down-regulated indicated by blue. The larger the node, the greater the difference in the expression level of the protein in the two selected disease states. To determine whether the change is significant, the user can check in the table below the pathway diagram, which shows the proteins with value of log2 fold-change, p-value and adjusted p-value.
The expression of all proteins in the selected matrix in two disease states were shown as a volcano plot. The user can select the absolute value of log2FC and p-values, and the differential proteins under this threshold are displayed in the volcano map. The up-regulated protein was marked in red, whereas the down-regulated protein was marked in blue. The number of up-regulated and down-regulated proteins is counted at the top of the volcano map.
Since it takes a certain amount of time to perform enrichment analysis, users can choose whether to perform enrichment analysis based on the differential proteins selected from the volcano map above upon clicking the buttons for enrichment computation. The enrichment analysis interface of GO and KEGG was provided. A bubble chart would be plotted, including the enriched pathways to which the up-regulated and down-regulated proteins, respectively.
A search hyperlink was provided at the top right corner that can jump to the “Analysis by protein” page to allow a convenient search for the interested protein observed from the pathway analysis result.
User can select two disease conditions for cross-dataset differential changed proteins comparison. The fold change and p value can be customized to a user specific threshold.
The data matrices that were available for the disease conditions will be displayed as a table in the sidebar.
The data matrices that can be calculated with differential proteins with specified conditions would be plotted with volcano plots independently.
A set of UpSet plots were used to demonstrate the cross-dataset overlapping of the differentially changed proteins.
For this part we were interested in the proteins that were regulated in different directions of differential expressions in different types of samples (tissue, blood, urine). The users can choose a pair of sample types, followed by two disease subgroups (healthy, non-COVID-19, non-severe, severe, fatal).