Now Reading
Proteomics with OMSSA | Sivome

Proteomics with OMSSA | Sivome

2023-11-11 07:58:02

Proteomics is the large-scale research of proteins (from Wiki :-)). Any -omics is mainly large-scale research of a selected biomolecule beneath research. Extra examples: Giant-scale research of metabolome(s) is metabolomics, large-scale research of lipids is lipidomics and so forth. A few of these molecules (or bio-molecules) i.e., lipids, metabolites, proteins will be quantified utilizing mass-spectrometer. Because the names says, the mass-spectrometer measures the mass utilizing properties of ionized molecules!

To investigate the output of mass-spectrometers, we want a program. OMSSA is one such instrument developed by Lewis Geer at NCBI, Nationwide Institutes of Well being. This program analyzes the info utilizing refined algorithm to finally establish the protein. There are different softwares/packages to research different molecules i.e., metabolites, lipids. Additionally, there are a number of different software program packages to research proteins, akin to Byonic, MODa, PEAKS, pFind3, X!TANDEM, SearchGUI and the listing goes on. Right here, let’s concentrate on OMSSA as I used this instrument prior to now.

Evaluate on mass-spectrometry primarily based proteomics: This is an article by Ruedi Aebersold and Mathias Mann.

OMSSA will be downloaded here
Link to publication on OMSSA

OMSSA obtain folder has a pattern information file (*.dta) for testing. The info that may be analyzed has many codecs, and there are a lot of instrument distributors complicating this. Nonetheless there are some customary information codecs akin to mzML, mzXML (for instance), that may be analyzed by any software program and there are instruments on the market to transform uncooked information to this format!

Let’s have a look at what is accessible in OMSSA obtain:

 Listing of C:OMSSAomssa-2.1.9.win32

02/21/2019  12:14 PM    <DIR>          .
02/21/2019  12:14 PM    <DIR>          ..
02/21/2019  12:14 PM    <DIR>          contrib
12/06/2010  01:31 PM             2,444 disclaimer.txt
12/06/2010  01:31 PM           113,946 mods.xml
12/06/2010  01:31 PM               159 MSHHWGYGK.dta
12/06/2010  01:31 PM           554,832 msvcp80.dll
12/06/2010  01:31 PM           632,656 msvcr80.dll
12/06/2010  01:31 PM            73,134 OMSSA.xsd
12/06/2010  01:31 PM         2,551,808 omssa2pepXML.exe
12/06/2010  01:31 PM         2,953,216 omssacl.exe
12/06/2010  01:31 PM         2,457,600 omssamerge.exe
12/06/2010  01:31 PM            15,584 usermods.xml
              10 File(s)      9,355,379 bytes
               3 Dir(s)  817,232,166,912 bytes free

C:OMSSAomssa-2.1.9.win32>

Right here omssacl.exe is the core script that processes uncooked information to establish protein. MSHHWGYGK.dta is a pattern information file. Merely put, this .dta mass-spectra file has peaks. For those who plot this, X-axis is mass (particularly mass/cost – measure of charged ions) and y-axis is counts (i.e., what number of instances the instrument sees this mass). Let’s have a look at the .dta file.

C:OMSSAomssa-2.1.9.win32>cat MSHHWGYGK.dta
1102.5 1
147.11 10
204.13 10
219.08 10
356.14 10
367.20 10
424.22 10
493.20 10
610.30 10
679.28 10
736.30 10
747.36 10
884.42 10
899.36 10
956.38 10
971.45 10

1st column are the lots and the 2nd column (besides the primary line) are the counts. For simplicity, all peaks have the identical counts i.e, 10.

Let’s plot this with R. Let’s go in particulars about ggplot later.

library(ggplot2) # Device to plot in R
peptide_peaks = learn.csv("./MSHHWGYGK.dta", header = FALSE, sep =" ")
peptide_mass = peptide_peaks[1,1]*peptide_peaks[1,2] # 1st column parts
peptide_peaks = peptide_peaks[-1,]
ggplot(information = peptide_peaks, aes(x=V1, y=V2)) + geom_bar(stat="id") + labs(x="m/z", y = "Depth")

png

Mass-spectrometer generates tons of such .dta information and the purpose of this system is to establish all of the proteins it sees within the uncooked information. Since proteins are big (on common 400 amino acids), these are lower into small items known as peptides, that are then despatched into the mass-spec. Small items would permit for higher ionization and therefore higher identification.

Mass-spec cleaves the peptide additional. Let’s say the peptide is MSHHWGYGK. Mass-spec makes use of fragmentation approach to interrupt these additional into even smaller sub-units i.e., M, MS, MSH, … additionally from the other finish, Okay, KG, KGY .. and so forth. (The determine above reveals the lots of those items M, MS, MSH,……. KGY, KG, Okay from oppsoite finish).

So, the purpose of the software program is to seek out these small items M, MS, then sew them to MSHHWGYGK and use background data to see which protein does the peptide (MSHHWGYGK) match to. This background data is usually given to this system within the type of a fasta file, or a formatted fasta file within the case of OMSSA utilizing makeblastdb.

All this may be present in my git repo
Completely different variants of the fasta file, akin to CA2.fasta.p* are the information created by makeblastdb. This goes as enter to OMSSA as nicely.

See Also

Let’s run this system (lastly!)

omssacl -i 1,4 -mf 3 -mv 1 -f MSHHWGYGK.dta -d CA2.fasta -oc omssa_sample.csv

Extra data on the arguments used will be discovered from the above hyperlinks or utilizing omssa assist file. More info here. Some arguments give particular details about how the pattern is ready. Different arguments inform in regards to the mass-spectrometer traits e.g., what fragmentation approach.

Since it is a quite simple enter, the output if all the things goes proper seems to be one thing like this.
Truncated output from above run
MSHHWGYGK | 2.61923815969567e-008 | 1101.493 | sp|P00918.2|CAH2_HUMAN RecName:

Which means that the height listing, matched to “MSHHWGYGK” (matches the file title as nicely, so extremely seemingly it’s appropriate!). Very low E-value of two.6e-8 additionally confirms the reply is appropriate! The peptide has mass of 1101.5 (dalton), which appear to match nicely to the primary column of the .dta file.

From the fasta file, this system matches the peptide to CAH2_HUMAN RecName.
Most likely quickly, I’ll dive into some large-scale information and in addition begin taking a look at different software program.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top