
SpiderFoot: OSINT collection and reconnaissance tool - axiomdata316
https://github.com/smicallef/spiderfoot
======
Accujack
I'm going to offer an opinion not matching the groupthink here.

I think this is a great project... many of us gather intelligence on various
targets and threats by hand, because we're used to doing that, and it's the
way we've always done it, and it works.

A system of automation is more thorough and more complete as well as speeding
up the process, allowing for more searches in less time.

Sure, the software isn't perfect, but it's better than almost anything else in
terms of being more of a product and less of a toolkit. It's open source, so
if you think a data source like Facebook is missing, feel free to add it in.

Well done, I say.

~~~
ArtWomb
Yes, I still mine data by hand. Usually inserted directly into sqlite using
datagrip. And if I use a script its bespoke, specifically crafted for a single
site (or single page) ;)

I applaud this effort as well. The web is a chaotic soup of unstructured data.
And this is a first step towards agents that can surf the web for us. And
return complex queries from natural human language!

~~~
smicallef
Appreciate it, thanks!

~~~
tonyarkles
Yeah, this looks pretty awesome! I ran it against one of my own domains and
there's a fair bit of false positive/flood of information, but the data is
really interesting!

------
josephmosby
I am excited to get into this.

Anecdotal disclaimer: I have been both an OSINT analyst and a developer
supporting OSINT work in the past. I understand completely why OSINT analysts
like to do things by hand - they can understand where the data comes from;
they can cite sources as they go; they can tailor things specifically to their
use case. Understanding the process is indeed important to knowing how we got
to the conclusion.

And yet, we're frustrated when there's a panic situation and we need a report
of any kind in two hours. We're frustrated when we're handed sixteen targets
all at once and have to suss them out over the weekend. We're frustrated that
visualizations suck and we have to custom make them ourselves. Better tooling,
an opinionated process, and automation helps solve all of that, or at least
advance things a bit.

Can't have anything perfect (entity resolution as #1 on the un-perfect list)
but getting a tool together as well thought-out as this one is an excellent
step forward. Kudos, @smicallef.

~~~
smicallef
So happy to hear it helps, thanks a lot!

------
smicallef
Author of the project (not the OP) here. Must say it was quite a surprise to
see this land in my HN feed today! I’ll do my best to answer the points raised
below but if you have any questions or further feedback, I’m glad to hear it!

------
Ansil849
It is nice to see OSINT receive more tooling, but I shudder at the 'most
complete' hyperbole. For starters, this appears to be almost exclusively
centered on web domain OSINT. But what about reverse image searches?
Government records searches? Video analysis? All of these domains appear to
not be represented at all, to name just a few. Even in the domain of web
analysis the tool is missing critical queries such as historic whois and
leaked database lookups.

~~~
smicallef
Actually you can target a bunch of things beyond domains, including IPs,
usernames, phone numbers and more. And historic Whois and leak database
modules are indeed there. In some cases you need API keys though but most
offer free tiers for low volumes.

But yes, it’s not covering some of the other sources you mentioned... yet.

------
KarlKemp
For something that claims comprehensiveness, I am somewhat surprised by the
list of sources. It’s almost entirely restricted to the web domain, i. e.
domain/ip/email data.

That’s understandable, considering how any data you gleam can immediately be
fed into another run of all these tools. But I was under the impression that
"OSINT" has a few more data sources connecting it to the "real world“? Company
data, published books, or the court system come to mind.

(Sorry if I just missed some)

~~~
smicallef
Company data is a little there, using OpenCorporates’ API. The tool originated
with a smaller scope initially and has grown over 8 years of development. I
can imagine how different it will look in another few years.

------
no-dr-onboard
The whole syntax similarity feature in Recon-NG with MSF is really what is
keeping me from trying this tool.

Swiss-knife projects like this are great, but if you don't provide a
relatively intuitive set of commands for flipping through the various modules
and interfaces, you're going to lose a lot of your userbase.

Simply put, no one wants to learn how to use a new tool if they already know
the input and expected output and can't leverage existing navigational habits.

~~~
smicallef
I didn’t even consider the MSF compatibility need for people, so will take
that into consideration for a future release. The sfcli.py CLI was a starting
point for that kind of functionality but not MSF compatible. Thanks for the
feedback though!

------
nannal
Jack of all trades..?

I've not tested spiderfoot to see how it compares to other more specialised
OSINT tools in their specific areas but there is a line of thinking that would
suggest an amalgamation of tools might fare better.

~~~
gatewaynode
It is kind of an amalgamation of tools. Install it, sign up for the dozens and
dozens of different external services it queries, configure those API's in
SpiderFoot. Then fire away. It is really good, but some of the better
visualizations, sorting options, reporting and diffing come with the premium
SAAS version SpiderFootHX.

------
londons_explore
It scans wikipedia for edits from a specific IP, but doesn't scan facebook for
accounts matching names?

This tool seems to do some very niche things, and miss out some very big
things...

~~~
smicallef
It actually does search for social media accounts linked to a name or
username, but does so using Google/Bing APIs and not Facebook’s.

------
ggm
Rdap. Whois is so 1990s

~~~
smicallef
Indeed, Whois sucks for parsing and is losing value as an OSINT source since
GDPR. I’ll take a look at RDAP.

------
egorfine
does it compare to palantir?

------
duckqlz
“ There's something fascinating about hairy little spider feet. They look like
they belong on dogs. Or maybe even cats.

Recently, images of hairy spider "paws" have circulated on social media with
people oohing and aahing about how cute they are and how much they resemble
furry pet appendages”

If you haven’t googled spider feet. Do it.

~~~
bookofjoe
>The black "hairs" on the Araniella villanii spider are innervated, meaning
they are sensory organs, much like a cat's whisker.

Source: [https://www.livescience.com/newly-discovered-math-
spider.htm...](https://www.livescience.com/newly-discovered-math-spider.html)

------
dr_zoidberg
Inside one of the plugins I found this code:

    
    
        def setup(self, sfc, userOpts=dict()):
            self.sf = sfc
            self.results = self.tempStorage()
    
            for opt in list(userOpts.keys()):
                self.opts[opt] = userOpts[opt]
    

And I'm left wondering... Since the class that implements this method has a
opts dictionary that is a class attribute, why didn't the author write a
simple self.opts.update(userOpts) that takes care of it? It does exactly the
same thing, is faster and clearer.

That in itself made me lose a lose a lot of the interest I had in this
project. I'll still pass it around to a few colleagues that work in infosec to
see what they think of the tool from a practical point of view.

I'm also a bit confused that the authors says it's Python 3, but there are a
lot of idioms around that point a 2/3 compatibility. I can understand that the
project might have started with a 2/3 mentality, and then progressed. However,
some of this idioms not only affect readability but can introduce side effects
(anywhere from annoyances to bugs).

