
Python API for Zero Day Phishing Detection Based on Computer Vision - yevpats
https://github.com/phishai/phish-ai-api
======
fredley
This is really great. I think if this gains traction, it could be fascinating
for Computer Vision in general, as inevitably scammers would try to introduce
adversarial elements to fool the CV, potentially by training a CV AI of their
own to try and replicate (mis)classification. The end result of such an arms
race is CV that better mimics human perception.

~~~
yevpats
Thanks!

~~~
btown
Even before it has an impact on CV, this is really important. If widely
adopted, it mitigates against the "script kiddie-ization" of phishing sites,
where it becomes easier and easier to closely mimic a legitimate site by
reusing (perhaps in real time via proxy) its original assets. Such attackers
won't have the ability to create novel adversarial elements, though, and you
could train yourselves on any that made it into common exploit scripts.

That said, your browser plugin is extremely aggressive at uploading
screenshots of potentially sensitive information to your servers (for
instance, someone's stealth mode startup's unindexed Heroku staging URL),
doing so by default if it hasn't been whitelisted [0]. And your privacy policy
[1] permits you to resell "non-personally identifiable visitor information...
to other parties for marketing, advertising, or other uses."

I don't blame you for needing to make money or for launching an MVP, but there
must be better zero-knowledge ways of accomplishing this. It's an important
goal and I wish you all the best... but don't be the source of the data leaks
you intend to prevent.

[0] [https://github.com/phishai/phish-
protect/blob/master/js/back...](https://github.com/phishai/phish-
protect/blob/master/js/background.js#L74)

[1] [https://www.phish.ai/phish-ai-privacy-
policy/](https://www.phish.ai/phish-ai-privacy-policy/)

~~~
yevpats
Hey, Thanks for the extensive reply.

Per the browser extension, we don't upload every screenshot in the free
extension only if you like to scan a particular webpage as we can't afford to
scan every webpage for free. We don't sell any information as this is not our
business model - I got your point with the privacy policy and we will make it
more accurate.

The business model is mainly with API product where you will see more
integration coming up in the next couple of months and as I said the use case
for the API is more for incident response teams and hosting providers that
need to go manually through a lot of URLs and tag them.

------
encoderer
The best I get reading the website is that it ‘proactively crawls’ websites of
“top brands”

And then with the extension it takes a screenshot of every page you visit and
uploads to their server that does something with CV fast enough that it can
block you from submitting data to a phishing attempt.

Anybody have any details of how this actually works?

Seems a little magic.

~~~
yevpats
Hey, So looks like quite a lot of confusion and our content marketing need to
be improved. Anyway, I'll try to explain here:

1) we crawl websites of known brands and take screenshots of them. then we
extract AI & computer vision features from them and create signatures for
every website.

Now there are two products 1) API: you can use this for any type of use case:
incident response, automated phishing classification for hosting providers. or
any other use-case that you can think of. 2) chrome extension: The free
version is not uploading screenshots in real-time as it's very expensive and
resource consuming to process a lot of screenshots so it's available only for
enterprise version. The free chrome extension has two features: unicode
detection and link to scan the current website actively with Phish.AI - so
really nothing to call home about.

Hope it made some things a bit clearer

------
_pdp_
I am not sure this works. I just tested it on a dummy phishing site I made and
it came as clean.

~~~
yevpats
Hey, Thanks for the feedback! We are currently in Beta and we some false
negative/positive can occur Also I'll publish an up-to-date list of brands
that we detect

------
ris
Pipe... all of my email... through some startup's random opaque web
service...?

~~~
yevpats
Why email? It's not an email service at all....I guess we need to improve our
content marketing but we don't have the word email anywhere on our site.
Having said that the main service is an API that you can use for a lot use
cases for example a good use case is an incident response team that goes
manually through a lot of phishing reports to tag them and some hosting
providers that goes through their websites to get down phishing or hacked
website

~~~
ris
So... the idea is (potentially) for a provider's mail filters to extract links
from emails and submit them to your API, raising an alert if it's a close
match for a known site but not on a whitelisted domain?

~~~
yevpats
No it won't work for an email at all. You can't integrate it with email links
because the service needs to surf to the url to take a screenshot and then
analyze the screenshot. If we will click on every link in an email will make
chaos and unsubscribe everyone from everything and decline/accept/maybe all
invitations:)

The use case is more appropriate for Incident-response teams that go manually
through tons of URLs to classify them or Hosting providers that go through
tons of websites to check if phishing websites are hosted or some sites were
hacked. Essentially any use case where you have a feed of urls that you can
access from the web and tag/find phishing websites

------
endymi0n
Old and busted: Phishing attempt leaks all my companies‘ emails.

New and hot: Breach at phish.ai‘s screenshot database leaks all my company‘s
emails.

~~~
yevpats
Hey there, I already answered here in the thread - we are not an email
solution please see my other reply to ris. Hope people will at least read the
thread or the website before posting negative feedback.

------
kijeda
This entire project is less than 20 lines of code:

[https://github.com/phishai/phish-ai-
api/blob/master/phish_ai...](https://github.com/phishai/phish-ai-
api/blob/master/phish_ai_api/__init__.py)

~~~
krapp
That's not impressive when those lines are just API calls to a hosted service
that does all of the actual work.

~~~
wyldfire
I suppose that comment is to indicate exactly what's been linked to and not to
indicate an impressive feat.

So, yes, you are likely both in agreement.

Instead of what content often shows up on HN with lots of technical detail,
this amounts to a service announcement.

This service sounds interesting but a much better HN submission would be one
that talks about how it works, how it was improved, etc. Browsing the phish.ai
website uncovers several (very brief) articles that are more interesting than
this repo.

BTW footnote wrt the API's design -- {"verdict": "clean"} -- this is not ideal
for a non-interactive service. It should either return a numeral (like a
confidence interval) or a boolean value.

~~~
yevpats
Hey, Thanks for the feedback. It's a great idea and I got the same feedback
from more people so we will publish a blog on how it works - I replied with a
short explanation to encoderer. footnote on API design noted and will be
added:) thx

------
jeffnappi
This is a Python API _client_ for a zero day phishing detection service. The
title of this post is misleading and makes it sound like it is an open-source
project.

------
SmooL
This is very cool; as computer vision gets better and the database grows, I
can easily see this being the default way to do phishing detection.

~~~
yevpats
True, the database is the key here. we are still in Beta and our database
grows every day!

