Calculation of Pepitope using Microsoft Excel

The Pepitope Calculator is an Excel file that can be used to calculate a specific measure of antigenic distance between two strains of influenza. It is freely distributed under the GNU General Public License.


Pepitope is a variable that measures the antigenic distance between two influenza strains. Pepitope shows a better correlation with known vaccine efficacy data than do the variables that are currently being used by the CDC and the WHO. These include ferret assays (Pferret) and whole sequence comparisons of the hemagglutinin proteins of the two strains (Psequence). These two methods show only modest correlation to known efficacy data. We, the Deem group, have conceived of Pepitope (described below) and illustrated how this Pepitope shows a better correlation to the efficacy data. The PepitopeCalculator.xls program is designed to facilitate use of Pepitope (say, for the design of the annual flu vaccine).


The conceptual transition from Psequence to Pepitope is simple. The idea is that there are some regions of the hemagglutinin protein that are more important than others. These regions are the epitopes, which are the binding sites of human antibodies. Thus it is assumed that point mutations in the amino acids of these epitopes would have a great effect on the ability of the antibodies (made in response to the vaccine strain) to attach to the surface proteins of the circulating strain. It is also assumed that point mutations elsewhere in the strain would have little to no effect on the binding of the antibodies. An additional concept is that of epitope dominance. This is the idea that some epitopes are more important than others and that which epitope is most important can change from year to year. For H3/human Influenza, it is assumed that whichever epitope has the greatest percentage of mutations is dominant, because the dominant epitope is under the most pressure from the immune system. For H5/avian, the user must know which epitope is dominant.

With those concepts in mind, we define Pepitope to be the fractional change between the dominant epitopes of the vaccine and the circulating strains, or as an equation:

Pepitope = (number of mutations in the dominant epitope)/(number of amino acids in the dominant epitope)

The Excel file that can be downloaded below is set up to calculate Pepitope for discrepancies in the hemagglutinin proteins of two Influenza Type A viruses, the H3 and H5 types, respectively. These two hemagglutinin strains have five epitopes (A, B, C, D, E) with A and B usually being dominant in H3.

The algorithm used by this program to compute the value of Pepitope is as follows. First, the user inputs the two amino acid sequences of the hemagglutinin protein for comparison using one-letter IUPAC abbreviations. The user also inputs the type of flu that is being compared: H3/human or H5/avian. The program is designed to take input in the form of actual, pasted sequences or in the form of a text file downloaded from the Influenza Sequence Database. These raw inputs are saved on the Worksheet "Sequences". Some characters are then deleted from the strains, such as spaces and dashes.

Then, the strains are aligned using a reference strain: A/California/7/2004 (ISDN110647) for H3/human and A/Duck/Singapore/3/97 (ISDN49024) for H5/avian influenza. If the program encounters major problems aligning the sequences (i.e. - if the percentage of matching amino acids between an input strand and the reference is less than 50%), then an error message will be displayed requesting that the user try again. The most likely reason for this problem is that the hemagglutinin type of the input strain does not match that of the reference. If the alignment of one of the user's strains leaves extra characters before or after the reference strain, these characters are discarded. If the alignment has the user's strain starting after or ending before the reference strain, then these offsets are noted, as they could affect Pepitope. The reason for this is that the program is set up to assume that any missing portions of the sequences are perfect matches. (This also includes "?" characters.) An error message is displayed, which differs depending on how much of the sequence is missing from the beginning or end. If the portions are large enough to possibly affect Pepitope, the error message states that the sequences are "incomplete". If the portions are not large enough to affect Pepitope, the message states that they are "slightly incomplete". These aligned and truncated strains, along with the reference strain, are displayed on the Worksheet "Aligned Sequences".

After the alignment procedure, the two strains are compared and the positions of any discrepancies are recorded. Then, those positions are cross-referenced with the positions of the residues in each epitope, which is contained in the hidden Worksheet "Residues". (The H3 and H5 epitope residues also differ.) On the Worksheet "Discrepancies", the number of mutations in each epitope is output along with the positions of each of those mutations. Then, a P value is calculated for each epitope by dividing the number of mutations in that epitope by the number of amino acids in that epitope. For H3/human, by the logic of [1], Pepitope is the largest P value that is attained, and the corresponding epitope is deemed dominant. For H5/avian, Pepitope is the P value for the dominant epitope. You must know which epitope is dominant in the vaccine and challenge strains that you are comparing for H5/avian. Since H5/avian evolves primarily in birds, in the absence of a vaccine, there is no theory yet to calculate which epitope is dominant. All of this information is output in the Worksheet "Results".


The program is written as a VBA script in Microsoft Excel (download PepitopeCalculator.xls). For usage instructions, see below.

We ask that you cite our paper [1] in any publications that result from the use of the PepitopeCalculator.xls program.


You must enable macros for this program to run! Once you've enabled macros, a welcome screen will appear. From this screen, you will be able to choose how you want to input your data (copy/paste or text file), access a help file and get the contact information of the creators. After the program has run, or you have exited from it, you can restart it by clicking the "Start Pepitope Calculator" button on the toolbar. (Don't worry, this toolbar removes itself as the excel file shuts down.) Note that the program will not allow you to save, so that any manipulations of the code or of important data in hidden Worksheets will not be saved. Therefore, all Pepitope data should be saved in a completely different Workbook/Application, and, if you wish to make changes to the program itself, you must choose the 'Save As...' command.

Let's go back and discuss input. If you would like to paste the sequences, click on that button and copy them from another location. In order to paste them, you must use the shortcut key (Ctrl + V). (Userforms in Excel do not allow you to access the Edit menu or right-click.) Next, make sure that you have selected the correct strain of influenza (H3 or H5) corresponding to the correct type of hemagglutinin (H3 or H5). Then, just click the Find Pepitope button and see the results.

If you would like to use a text file, do the following:


PepitopeCalculator.xls is unsupported software. Please send questions and comments to

Professor Michael W. Deem
Rice University
Departments of Bioengineering and Physics & Astronomy
6100 Main Street - MS61
Houston, TX 77005-1892 USA


1) V. Gupta, D. J. Earl, and M. W. Deem, ``Quantifying Influenza Vaccine Efficacy and Antigenic Distance,'' (2005) submitted. A pdf reprint is available.