
Analyzing Analytics (Featuring: The FBI) - benryon
https://exploits.run/analytics-analysis-fbi/
======
jcrawfordor
Looking at the Siberian husky site... stdLauncher.js is part of Verint
ForeSee, one of those "would you like to take a survey about our website"
solutions. The AAM analytics code right above the survey and urchin code lists
as domain an IP associated with Sungard AS, an outfit that holds a number of
federal contracts for IT services. This IP, 209.235.0.153, hosted the FBI
website at some point in time. It's oddly easy to figure this out, even
without something like a DomainTools subscription, because there are a lot of
people scraping and archiving the FBI most wanted pages due to their cultural
significance.

Some searching on code samples shows that the AAM section of analytics code is
an exact match for analytics code served up by an older version of the FBI's
most wanted website. Likely that it was also used on older versions of other
FBI websites as well.

In the end I find it unlikely that this website has anything to do with the
FBI, and more likely that the website owner copy-pastad a large section of
source code and accidentally ended up with this result.

One bit of commonality I've noticed is that a lot of websites with the FBI
tracking code were all built with FrontPage. I'm not sure if this is causal or
coincidental, but perhaps it contributes to this that FrontPage allows you to
open a webpage that you saved from IE and edit it... which might lead to some
websites being complete duplicates of FBI websites, except for visible
content, simply because websites like the FBI most wanted were relatively
prominent parts of the early internet.

Edit: I spent a little time riding the WayBackMachine to some of the other
webpages when they were apparently using FBI analytics code. The results are
odd but they're so inconsistent that it's hard to think it was at all
intentional. One interesting finding is that both ohthx.com and ppc-guy.com,
at the time they supposedly had the FBI analytics ID, were apparently hosting
an analytics package called Prosper202 that redirected the WayBackMachine
crawler from the login page to fbi.gov. I have a suspicion that this was a
partially-joking way to deter crawling of the admin interface of the software.
The record that they used the FBI analytics code is presumably just an
artifact of the crawler following the redirect. It seems that this exact
Prosper202 behavior results in the majority of the old hits.

------
three_seagrass
This technique was recently done by some redditors to uncover that the multi-
state COVID reopen protest is being pushed by some guy who uses an antique
shop in FL as a front for his shell LLCs.

They are the websites that are being used on the facebook pages that are
primarily pushing 'reopen' content, and the GA accounts on those pages links
them to a bunch of pro-firearm shell corps as well.

Here's the thread. It got deleted since it was deemed as doxxing (a reddit no-
no) even though Whois data is public:

[http://removeddit.com/r/maryland/comments/g3niq3/i_simply_ca...](http://removeddit.com/r/maryland/comments/g3niq3/i_simply_cannot_believe_that_people_are/fnstpyl)

~~~
maxchehab
Krebs also mentioned this in his recent post
[https://krebsonsecurity.com/2020/04/whos-behind-the-
reopen-d...](https://krebsonsecurity.com/2020/04/whos-behind-the-reopen-
domain-surge/)

A very interesting way to associate the same site owners!

~~~
LegitShady
Look at the updates on that post, nothing is so clear cut. The problem with
internet sleuthing is that everyone gets very excited and innocent people can
be injured in the unnecessary witch-hunt.

>Update, April 21, 6:40 a.m. ET: Mother Jones has published a compelling
interview with Mr. Murphy, who says he registered thousands of dollars worth
of “reopen” and “liberate” domains to keep them out of the hands of people
trying to organize protests. KrebsOnSecurity has not be able to validate this
report, but it’s a fascinating twist to this tale: How an ‘Old Hippie’ Got
Accused of Astroturfing the Right-Wing Campaign to Reopen the Economy

Update, April 22, 1:52 p.m. ET: Mr. Murphy told Jacksonville.com he did not
register reopenmn.com or reopenpa.com, contrary to data in the spreadsheet
linked above. I looked up each of the records in that spreadsheet manually,
but did have some help from another source in compiling and sorting the
information. It is possible the registration data for those domains got
transposed with reopenmd.com and reopenva.com, which included Mr. Murphy’s
information prior to being redacted by the domain registrar.

~~~
panda-giddiness
Right, and this is exactly why reddit bans doxxing. The original reddit poster
was correct that there was a single individual buying most of these domains;
however, other than purchasing the domains, there was no evidence that the
individual was using those domains to promote protests. Let's not forget
reddit's Boston bombing debacle[1].

[1]
[https://en.wikipedia.org/wiki/Sunil_Tripathi#Misidentificati...](https://en.wikipedia.org/wiki/Sunil_Tripathi#Misidentification)

------
gk1
Worth noting that, since the Analytics ID is the publicly visible, anyone can
load Google Analytics on their own site using that ID. No FBI connection
required.

This is called Analytics hi-jacking and it was once (still is) a common spam
technique: Create site buy-my-stuff.net, load a bunch of hijacked analytics
scripts there, and then the owners of those accounts will see “but-my-
stuff.net” in their analytics reports.

Edit: As commenter lmgk reminded me, you don’t even need to make a site, just
use the API to make pageview calls.

~~~
enlyth
Is it not possible to whitelist your own domains in Google Analytics? Forgive
my ignorance, I don't use it at all.

~~~
lmkg
You don't need to host a site. The data format to send data into Google
Analytics is an open API (called the Measurement Protocol). You can just ping
Google's servers directly with the appropriate payload, which include crafted
URL parameters.

------
maerF0x0
In his book "Permanent Record" Edward Snowden[1] describes fake websites used
by government agencies to disguise internet traffic that is actually use for
spy craft stuff.

eg: maybe a website about siberian huskies actually has a hidden login or
hosts another service when contacted on port 80/443 in just the right way?

Now, that would make more sense for the CIA than the FBI, but I think it
illustrates another avenue of interpretation

[1]: [https://www.goodreads.com/book/show/46223297-permanent-
recor...](https://www.goodreads.com/book/show/46223297-permanent-record)

~~~
poyu
That doesn't make sense, why would they even let people know that there's a
connection? The hidden login part may be true, but just not on a sites that
are related so obviously. It could be a smokescreen of some kind though.

~~~
maerF0x0
I agree that having a fbi google analytics would be a gaffe

------
Ozzie_osman
Google analytics used to be called Urchin (they bought Urchin and made it
Analytics). So all the urchin.js code is probably just really old Google
analytics tracking code.

~~~
elbac
The original Urchin was used for log analysis
[https://en.wikipedia.org/wiki/Urchin_(software)](https://en.wikipedia.org/wiki/Urchin_\(software\)).
Which might explain a 'self-hosted' version of the software as well.

~~~
conductr
It had a hybrid log/js approach around the time google acquired it. I believe
one of the first. Was the best product around. As a shared web hosting
provider in early/mid2000s it was becoming more than a competitive advantage
to offer it.

------
bhartzer
The Google Analytics 'trick' (to identify all the sites someone owns) has been
around for quite a while. All you have to do is use a code search engine like
publicwww to search for the snippet of code or the analytics ID.

It's not just the Google Analytics ID or GTM Id, you can also use the Adsense
pub-id or just about anything else that you might think sites have in common.
When you start to also look at backlinks and IP neighborhoods, things can get
interesting, as well.

------
danso
On a related note, I wonder if there are/were common patterns in the sting
sites set up by Dept. of Homeland Security, such as U of Northern New Jersey
[0] and U of Farmington [1]. Both of those were initiated during the Obama
administration and featured fairly nice modern designs, similar in aesthetic
to much of the Obama-era digital overhauls (though a quick skim shows that
they don't share similar CSS naming semantics).

[0] [https://www.nytimes.com/2016/05/06/nyregion/students-at-
fake...](https://www.nytimes.com/2016/05/06/nyregion/students-at-fake-
university-say-they-were-collateral-damage-in-sting-operation.html)

[https://web.archive.org/web/20160327093120/http://unnj.edu/](https://web.archive.org/web/20160327093120/http://unnj.edu/)

[1]
[https://www.freep.com/story/news/local/michigan/2019/11/27/i...](https://www.freep.com/story/news/local/michigan/2019/11/27/ice-
arrested-250-foreign-students-fake-university-metro-detroit/4277686002/)

[https://web.archive.org/web/20180414235355/http://university...](https://web.archive.org/web/20180414235355/http://universityoffarmington.edu/)

[http://archive.is/qLrUi](http://archive.is/qLrUi)

~~~
jcrawfordor
On brief review, one (UNNJ) is running WordPress while the other (Farmington)
doesn't show any evidence of a dynamic CMS. That suggests to me totally
separate provenance. My guess would be that two different contracts were
awarded to two different companies to build the websites, which would both be
consistent with common federal contracting behavior and a good idea from an
OPSEC perspective since it would minimize any similarity in these "sting"
websites.

------
vmception
Now this is the Hacker News I want to see. Just a mere observation using known
meta-analytics with entertaining implications.

------
mimi89999
Or maybe they just stolen code from FBI website to have a feature and pulled
way more code than required without even knowing what it does.

~~~
mimi89999
A coworker sysadmin once told me that when he was inspecting the web server
access logs (for an unrelated reason) he noticed that many requests to a
resource on our website have a strange referer URL that was never present in
requests to pages. He inspected that site and found that they were using our
resource. We didn't really care about it, but that was really interesting.

Maybe it's the same with these sites?

------
vmception
The article says all three fbi.js files were on waybackmachine. I was only
able to download urchin and the other ones are not there. Anyone have a
mirror? Besides the author? pastebin or mega

~~~
jcrawfordor
All three are from commodity commercial software, finding other websites of
the same period that used Urchin/GA and ForeSee should get you more or less
the same files.

------
Pick-A-Hill2019
In the Wayback Machine archived version of triggerParams.js there is an OMB
parameter of “1505-0186" if the client is section 508 compliant (US
accessibility guidelines). A search of that OMB number turns up a Customer
Satisfaction Measure of Government Websites survey from 2008/2009 (which makes
sense if the archived js is from the FBI site). What isn’t clear is if the
same version was used on all of the sites (some of the parameters are hard-
coded) and how it got copied across to a mixture of hobbyist sites, plumbers,
Most-Wanted pages etc. A quick peek at the page source of a random sampling of
the sites in the Wayback Machine show very little similarity with each other
(e.g. style of code, page layout etc.) which strongly suggests that it wasn’t
people just ripping off the FBI page and wrangling it with a text editor. It
is curious.

------
LyndsySimon
The big takeaway from this article for me is that I should probably look for
or write a browser extension that tracks changes to analytics tools and IDs on
sites. If a site is silently taken over, the state actor would either need to
separately gain access to the analytics tool accounts, or would need to modify
the IDs to connect to a new account. I'd love to see how often tracking IDs
change on high-profile sites.

~~~
ryanlol
>If a site is silently taken over, the state actor would either need to ...

Why would they need to do that?

------
londons_explore
Google analytics ID's are tied to the account that created them.

Presumably the FBI doesn't all share just one massive "fbi@gmail.com" email
address.

Even if a bunch of FBI employees decided foolishly to use google analytics on
their honeypot sites, one would expect them to all separately sign up using
different google accounts - either using their real email addresses, or
hopefully throwaway ones.

~~~
lmkg
I think you're confusing _Google_ accounts (email addresses) with _Google
Analytics_ accounts (tracking ID prefixes). A single user can create dozens of
GA accounts.

~~~
londons_explore
But by default, it's a 1:1 mapping...

------
galacticaactual
Pointing back to a government domain is not how nation state monitoring
infrastructure is set up.

~~~
save_ferris
Sure, this isn't a comprehensive strategy, but you'd be amazed at how far
behind some of those agencies are in terms of day-to-day operations for
investigations.

A relative of mine works at FBI and several years back he told me a story
about how an investigation into an organized crime syndicate was blown up
because an agent on the case was dumb enough to check out the target's
LinkedIn profile while he was logged into his own real account. So the target
got a notification that Joe Blow from the FBI had just viewed his profile.
Over a year of work down the drain with a single GET request, crazy.

~~~
galacticaactual
My issue is the confidence with which the author presupposes that the
existence of this code on sites indicates seizure or utilization in an
investigation. It is a lazy position that leaves others (i.e. HN readers in
this thread) with a little more intellectual horsepower to evaluate the other
- and frankly more realistic - alternatives.

~~~
not_a_moth
What are the more realistic alternatives?

~~~
mimi89999
Please see my comment:
[https://news.ycombinator.com/item?id=22970996](https://news.ycombinator.com/item?id=22970996)

------
A4ET8a8uTh0
That is a fascinating read. It sounds like it is also prudent to use separate
analytics ID on your websites if you choose to go that route.

------
ryanlol
This is stupid, people working on investigations obviously aren’t going to
have access to fbi.gov or tbe analytics accounts for that.

------
soobrosa
I mean. You cannot track with Google Analytics. Why would anyone use it for
that.

