Strathclyde The Anti-phishing Scam Web Service
Strathclyde University Associates anti-phishing web service by Christopher Cranston, Department of Computer and Information Sciences, University of Strathclyde, Glasgow.
Although there are existing anti-spam and anti-phishing solutions for end-users, none of them are widely
deployed or fully effective. Rising financial losses and a growing numbers of phishing attacks have led to
anti-phishing extensions to existing Web browsers, but there is little product attention on helping end-users
determine whether a received email is a phishing attempt. This often leaves users relying on their own
judgment when assessing the authenticity of an email.
In this context, we have prototyped an Anti-Phishing Web Service (APWS). This facility analyses users’
emails and advises if they are likely phishing attempts. The APWS operates in a three step process: (1) Users
forward any suspect email to the APWS for analysis; (2) The APWS performs a series of tests on the email,
each resulting in a score. An overall score is derived which indicates a likelihood that the email is a phishing
attempt; (3) The APWS generates an online report for the user.
The APWS has several advantages over existing end-user anti-spam and anti-phishing solutions. Firstly,
the APWS helps the end-user decide if an email is a phishing attempt by applying sophisticated analysis
techniques. Without assistance, users would otherwise have to judge whether an email is genuine using
whatever limited knowledge they may have. Secondly, the APWS may be combined with a spam filter. The
spam filter can attempt to catch all spam and phishing emails. Any emails which pass through can still be
sent to the APWS for analysis. Thirdly, the APWS has no reliance on a database of phishing attempts. This
means that new, un-encountered phishing attempts may be caught. Fourthly, the APWS operates as a network
service and requires no software installation on the users machine.
The goal of the APWS is to determine whether or not an email is a phishing attempt. To achieve this, it
relies on a collection of real phishing emails that were analysed as a basis for test design. Once the tests have
been applied, a report is generated on the results. The systems report function writes out the following email
headers to the html report file: From, To, Date Sent and Subject and adds the total score and corresponding
phishing risk rating for the email in question. The total score of an email begins at 0. Every test that returns
true adds 1 to the total score (this could be altered to weight some tests more than others). A phishing risk
rating is assigned according to the total score for the email.
Strathclyde University Associates anti-phishing web service – The content of test emails is parsed by the APWS in order to check all links, anchor tags and form tags.
Evaluating the credibility of a submitted email is largely heuristic, with a series of seventeen tests applied to
the email message in order to derive its final score. An outline of these tests is given below.
Phishing emails often contain URLs with encoded characters in an attempt to disguise the true link target.
We apply a test on every embedded Web link which returns true if the authority part of the URI contains
encoded characters. Similarly, a test checks each Web link and returns true if the user-info part contains
encoded characters. If the path part, the query part or the fragment part of Web link contains encoded
characters, each of these contributes a positive score to the message result.
A further common ploy in phishing emails is the use of URLs in which the host part is a dotted quad IP
address as an attempt to disguise the true URL. We check each URL for this feature and increment the
positive score if the result is true. Similarly, a positive value is added for any URLs in which the host part is
an IP address expressed as a single decimal number, and for URLs in which the host part is a dotted quad IP
address, with each quad expressed either in octal or hexadecimal.
Emails containing URLs with user-information in the authority part of the URL are often attempting to
obscure the true target, and make it appear as if the link points elsewhere. We test every embedded URL and
return true if the authority part contains user-information. Another tactic used to disguise the true destination
of a Web link, is to use URLs with user-information in the authority part of the URL, and in addition the
user-information itself resembles a URL. We test every URL for this feature and return true if the authority
part has user-information that resembles a URL. Embedded URLs that specify non-standard Web ports are a
further hint of irregularity. For any URL in which the port is not 80, we return an additional positive
The presence of a URL in which the organization domain contains the purported sender’s organization
domain as a substring, is a futher positive score since this is considered an attempt to disguise the link’s true
target. Similarly, URLs in which a subdomain matches the purported sender’s organization domain returns a
positive increment. If a URL has an organization domain that closely matches the purported sender’s
organisation domain, we also increment the positive score. This test is performed on every URL and returns
true if the Levenshtein Distance (LD) between the organization domain and the purported sender’s
organization domain is less than half the length of the purported sender’s organization domain. We do not
return true if the LD in this calculation is zero (i.e. the domains being compared are equal).
Phishing emails often contain anchor tags wherein the text the anchor text resembles a URL, but that
URL points to a different location than the tag’s href attribute. We returns a positive increment for URLs
with such a feature. Finally, we check for attachments with malicious content. This test is performed on every
attachment object and returns a positive increment if the attached file name extension matches one of the
following: ade, adp, bas, bat, chm, cmd, com, cpl, crt, exe, hlp, hta, inf, ins, isp, js, jse, lnk, mdb, mde, msc,
msi, msp, mst, pcd, pif, reg, scr, sct, shs, url, vb, vbe, vbs, wsc, wsf and wsh.