

Meanpath - Search the page source, server headers and visible text on 150m sites - adamseabrook
https://meanpath.com

======
carterschonwald
I've heard from their engineers that Haskell has been their secret to being
web scale while having a sane ops and engineering load. :-)

Considering I'm doing my own Haskell based tech in my business (and I'm
somewhat actively in the core community), It pleases me greatly to see
Haskell/ghc used in a rich array of interesting businesses!

~~~
etherael
My favourite bits of the project;

Elasticsearch

ZMQ

Golang & Haskell

Unbound DNS

And a bit of old Python glue to hold everything together.

It's funny how long things can last when you think you would have outgrown
them pretty quickly. A tiny round robin domain server that maxes out at around
60k domains per second on just a single core to distributed spider bots though
is just about easy with Python + ZMQ. Plenty to keep a swarm of far more
complex and well engineered Haskell crawlers humming away and throwing the
results to Golang indexers to push into the Elasticsearch core for ...

Well, I'll leave the rest of it to mwotton's blog post, suffice to say it's
been one of those life changingly awesome projects of doom. I'm sold on the
benefits of embracing polyglot.

~~~
mwotton
yes, I probably should have mentioned the other 90% of the project, hey :)

------
cwings
This is an exact copy of my site NerdyData.com. Are you serious right now, I
was using that layout just a week ago for my homepage!

You stole our idea and our theme verbatim? How can you live with yourself?

~~~
jon_r
The layouts don't really look anything alike and I think they may be using an
off-the-shelf theme for their landing page. Also their site works... have you
launched yet?

~~~
cwings
I apologize for my previous comment, I was reacting out of anger.

This is the page they copied
[http://nerdydata.com/home2.php](http://nerdydata.com/home2.php). It was our
wait list for the last 2 months while we developed our site, which now has a
new homepage we put up last week. [http://nerdydata.com](http://nerdydata.com)

But no hard feelings, maybe you had the idea independently and happened to use
the same exact theme and messaging.

~~~
jon_r
Well I guess that's the risk you take by using an off-the-shelf theme... your
copy differs significantly if that's any consolation...

edit: (this is the theme [https://wrapbootstrap.com/theme/beaker-responsive-
working-la...](https://wrapbootstrap.com/theme/beaker-responsive-working-
landing-page-WB0L34T68))

------
michaelneale
Wow very impressive - I can see so many diverse uses of this (the least
commercial but fun one would be up to the minute javascript framework
popularity contests!).

All the best with the continuing launch.

------
BWStearns
repost from deleted: Wicked cool concept, I can definitely see this being
useful on several fronts. Just out of curiosity, how are you determining the
"links to your competitors but not yours"? The example query is """
(www.nike.com OR nike.com) NOT (www.adidas.com OR adidas.com OR redirect:nike
OR domain:nike) """ I assume this is assuming that "you" are Adidas and "your
competitor" is Nike?

~~~
adamseabrook
Correct. Adidas would be able to see sites talking about their competitor Nike
that are not linking to Adidas and also do not contain Nike as part of the
domain. This makes sure you screen out all the Nike sub-sites like
runwithnike.com which obviously don't link to Adidas.

------
jon_r
Cool idea, I still wonder how you'll cope keeping that entire index in memory
:)

