Calculation of pepitope using MATLAB or Microsoft Excel
pepitope Calculator is a MATLAB file or an Excel file that can be used to calculate a
specific measure of antigenic distance between two strains of influenza. It is
freely distributed under the GNU General Public License.
pepitope is a variable that measures the antigenic
distance between two influenza strains. pepitope shows a better
correlation with known vaccine efficacy data than do the variables that are
currently being used by the CDC and the WHO. These include ferret assays
(pferret) and whole sequence comparisons of the hemagglutinin
proteins of the two strains (psequence). These two methods
show only modest correlation to known efficacy data. We, the Deem group, have
conceived of pepitope (described below) and illustrated how this
pepitope shows a better correlation to the efficacy data. The
program is designed to facilitate use of
pepitope (say, for the design of the annual flu vaccine).
DescriptionThe conceptual transition from psequence to
pepitope is simple. The idea is that there are some regions of the
hemagglutinin protein that are more important than others. These regions are the
epitopes, which are the binding sites of human antibodies. Thus it is assumed
that point mutations in the amino acids of these epitopes would have a great
effect on the ability of the antibodies (made in response to the vaccine strain)
to attach to the surface proteins of the circulating strain. It is also assumed
that point mutations elsewhere in the strain would have little to no effect on
the binding of the antibodies. An additional concept is that of epitope
dominance. This is the idea that some epitopes are more important than others
and that which epitope is most important can change from year to year.
For H3N2/human Influenza, it is
assumed that whichever epitope has the greatest percentage of mutations is
dominant, because the dominant epitope is under the most pressure from the
immune system. For H5N1/avian, the user must know which epitope
With those concepts in mind, we define pepitope to be the
fractional change between the dominant epitopes of the vaccine and the
circulating strains, or as an equation:
pepitope = (number of mutations in the dominant epitope)/(number
of amino acids in the dominant epitope)
The Excel file that can be downloaded below is set up to calculate
pepitope for discrepancies in the hemagglutinin proteins of two
Influenza Type A viruses,
the H3N2 and H5N1 types, respectively. These two hemagglutinin
strains have five epitopes (A, B, C, D, E) with A and B usually being dominant in H3N2.
The algorithm used by this program to compute the value of
pepitope is as follows. First, the user inputs the two amino acid
sequences of the hemagglutinin protein for comparison using one-letter IUPAC
abbreviations. The user also inputs the type of flu that is being compared:
H3N2/human or H5N1/avian. The program is designed to take
input in the form of actual,
pasted sequences or in the form of a text file.
These raw inputs are saved on the Worksheet "Sequences". Some characters are
then deleted from the strains, such as spaces and dashes.
Then, the strains are aligned using a reference strain: A/California/7/2004
(ISDN110647) for H3N2/human and A/Duck/Singapore/3/97 (ISDN49024) for H5N1/avian
influenza. If the program encounters major problems aligning the sequences (i.e. -
if the percentage of matching amino acids between an input strand and the reference is
less than 50%), then an error message will be displayed requesting that the user
try again. The most likely reason for this problem is that the hemagglutinin type of
the input strain does not match that of the reference. If the
alignment of one of the user's strains leaves extra characters before or
after the reference strain, these characters are discarded. If the alignment has
the user's strain starting after or ending before the reference strain, then
these offsets are noted, as they could affect pepitope. The reason
for this is that the program is set up to assume that any missing portions of
the sequences are perfect matches. (This also includes "?" characters.) An error
message is displayed, which differs depending on how much of the sequence is
missing from the beginning or end. If the portions are large enough to possibly
affect pepitope, the error message states that the sequences are
"incomplete". If the portions are not large enough to affect
pepitope, the message states that they are "slightly incomplete".
These aligned and truncated strains, along with the reference strain, are
displayed on the Worksheet "Aligned Sequences".
After the alignment procedure, the two strains are compared and the positions
of any discrepancies are recorded. Then, those positions are cross-referenced
with the positions of the residues in each epitope, which is contained in the
hidden Worksheet "Residues". (The H3 and H5 epitope residues also differ.)
On the Worksheet "Discrepancies", the number of
mutations in each epitope is output along with the positions of each of those
mutations. Then, a p value is calculated for each epitope by dividing the number
of mutations in that epitope by the number of amino acids in that epitope.
For H3N2/human, by the logic of ,
pepitope is the largest P value that is attained, and the
corresponding epitope is deemed dominant. For H5N1/avian,
pepitope is the p value for the dominant epitope. You must
know which epitope is dominant in the vaccine and challenge strains that you
are comparing for H5N1/avian. Since H5N1/avian evolves primarily in birds,
in the absence of a vaccine, there is no theory yet to calculate
which epitope is dominant.
All of this information is output
in the Worksheet "Results".
The MATLAB file that can be downloaded below will install an app in MATLAB
to calculate pepitope for discrepancies in the hemagglutinin
proteins between the vaccine and dominant circulating strains of Influenza
Type A H3N2 viruses and to estimate vaccine efficacy. Additional
functionality for Influenza Type A H1N1 and Type B will be added to a
subsequent version of this app.
On the main "pEpitope Calculator" tab of the app, the user inputs two amino
acid sequences of the hemagglutinin protein for comparison using one-letter
IUPAC abbreviations. The program is designed to take input in the form of
pasted sequences containing valid amino acid characters.
The strains are then aligned using a reference strain: A/Hong Kong/4801/2014
(EPI614406) for A/H3N2 human. If the program encounters major problems
aligning the sequences, then an error message will be displayed requesting
that the user try again. The most likely reason for this problem is that the
hemagglutinin type of the input strain does not match that of the reference.
If the alignment of one of the user's strains leaves extra characters before
or after the reference strain, these characters are discarded. If the
alignment has the user's strain starting after or ending before the
reference strain, then these offsets are noted, as they could affect
pepitope. An error message is displayed that states that the
sequences are "incomplete" if they contain epitope regions.
After the alignment procedure, the two strains are compared and the
positions of any discrepancies are recorded. Then, those positions are
cross-referenced with the positions of the residues in each epitope. A p
value is then calculated for each epitope by dividing the number of
mutations in that epitope by the number of amino acids in that epitope. The
pepitope is the largest p value that is attained, and the
corresponding epitope is deemed dominant. Vaccine efficacy "VE" is
predicted based on the model in .
The GNU General Public License is found on the "GNL" tab.
DownloadThe older program is written as a VBA script in Microsoft Excel
The newer program is written in MATLAB
For usage instructions, see below.
We ask that you cite our paper [1,2] in any publications that result from the
use of the
You can download a textfile of each sequence of interest
from GISAID or
NIH Influenza Virus Resource.
You must enable macros for this program to run! Once you've
enabled macros, a welcome screen will appear. From this screen, you will be able
to choose how you want to input your data (copy/paste or text file), access a
help file and get the contact information of the creators. After the program has
run, or you have exited from it, you can restart it by clicking the "Start
Pepitope Calculator" button on the toolbar. (Don't worry, this toolbar removes
itself as the excel file shuts down.) Note that the program will not allow you to save, so
that any manipulations of the code or of important data in
hidden Worksheets will not be saved. Therefore, all pepitope data
should be saved in a completely different Workbook/Application, and, if you wish
to make changes to the program itself, you must choose the 'Save As...' command.
Let's go back and discuss input. If you would like to paste the sequences,
click on that button and copy them from another location. In order to paste
them, you must use the shortcut key (Ctrl + V). (Userforms in Excel do not allow
you to access the Edit menu or right-click.) Next, make sure that you have selected the
correct strain of influenza (H3N2 or H5N1) corresponding to the correct type of
hemagglutinin (H3N2 or H5N1). Then, just click the Find Pepitope button and see the results.
If you would like to use a text file, do the following:
- Go to the
NIH Influenza Virus Resource.
- Search for sequences by accession numbers or by sequence type.
- Select the strains you want to compare and click "Do multiple alignment."
- Click "Download alignment" to download the sequences in a .fasta file type.
- Next, in Excel, browse and find that text file.
- Then, select the correct flu strain/hemagglutinin type (H3N2/human or
- Click Find Pepitope and await results.
Instructions for installing the pEpitope Calculator Matlab app (requires
Matlab R2012b or later):
1. Download the
pEpitopeCalculator.mlappinstall file to your
2. In Matlab, run this command:
In Matlab, under the "Apps" tab, select "Install App." Select the
pEpitopeCalculator.mlappinstall file for automatic
After installing the app, you can paste strain names and sequences
containing valid amino acid characters for Influenza A H3N2 viruses and
click "Calculate." The value of pepitope and the dominant epitope
are output, along with the positions and amino acids of mutations in each
epitope. The predicted vaccine efficacy and standard error are also output.
The "Reset" button will clear the form for subsequent calculations.
To copy/paste sequences into the program, do the following:
- Go to the
NIH Influenza Virus Resource.
- Search for the sequence of the first strain you want to compare.
- Click its Accession number to view the strain report.
- Select "FASTA" to view and copy the amino acid sequence.
- Next, in MATLAB paste this sequence into the appropriate text box.
- Remove line breaks as necessary.
- Repeat these steps for the second strain.
- Click "Calculate" and await results.
are unsupported software.
Please send questions and comments to
Professor Michael W. Deem
Departments of Bioengineering and Physics & Astronomy
6100 Main Street - MS61
Houston, TX 77005-1892 USA
V. Gupta, D. J. Earl, and M. W. Deem,
``Quantifying Influenza Vaccine Efficacy and Antigenic Distance,''
Vaccine 24 (2006) 3881-3888
2) M. E. Bonomo and M. W. Deem, ``Predicting Influenza H3N2 Vaccine Efficacy from Evolution of the Dominant Epitope,'' Clinical Infectious Disease (2018) 10.1093/cid/ciy323