
Analyzing HN readers' personal blogs - dsalzman
https://www.dannysalzman.com/2020/04/08/analyzing-hn-readers-personal-blogs
======
jacquesm
Of those who responded to the 'what is your blog and why should I read it'
thread. HN is large. Likely larger than many of its visitors realize because
the participants are a relatively small fraction of the readership.

So HN _readers_ are not necessarily contributors. And not all contributors
would plug their blogs in a thread asking them to do so.

If you want to get an idea of the HN readership rather than of the HN
contributors you may want to start off by scraping all the profile pages
instead, it will give you a much larger set of sample data to work with.

~~~
dsalzman
That’s a fair point! I would have to read HN’s terms of use though. Not sure
if that’s allowed or not. I felt good scrapping the comments section since
everyone there “opted in “ to share their website to the broader community.

~~~
jacquesm
You can take it as read that people who write blog posts do that to share them
with the broader community.

Could one of you downvoters please explain why you think people would post
blog posts if they do not want to share them? I fail to see how that would be
possible.

~~~
Jugurtha
And you never foresee the ripples. I wanted to buy books online and trade
stocks when I was a student. MasterCard/Visa in Algiers, Algeria was a rarity,
and banks didn't communicate.

I called every bank in the country that was listed by the central bank, I
knocked on their doors until I found one that offered students a card.

I even had to write a document because they didn't have a form for that case.
I finally got my card after a week of trips to the bank, city hall,
administration, providing documents that were unlisted, etc.

I computed that this wasted time multiplied by the number of people who would
eventually go through the process and the fact almost nobody in the country
had the card warranted detailing the steps. It would save time, and contribute
to reduce the "underbanking". I wrote a blog post explaining how to do it with
steps, as in provide the following documents, take these amount in these
currencies. I attached the document/form I wrote so people could fill it and
take it directly to the bank and save a trip. That asymmetry in information
bothered me.

That post 2013 post is read hundreds of times per day. It had more than a
thousand comments, although it only lists 700+. People asking me questions,
then getting their own cards, then themselves answering other people's
questions, then updating me on what has changed.

People I would meet in real life would tell me I looked familiar, and then
they'd put it together and tell me they followed the post to get their card.
Sometimes people referred me to my own post in case I wanted to get a card.

I received emails from people freelanced online who wanted to bring money back
here. One person even sent me credentials to their online account with
thousands of dollars in and asked me if I can find a way to transfer it here
(I told that person not to do that again and she said she felt she could trust
me).

Many times I'd receive a phone call from a friend who'd say they wanted to get
a card, looked online, found a great post, then saw my name and laughed out
loud because they knew me.

And of course, I met interesting people.

~~~
jacquesm
Yes, indeed. I have a few like that and I always wonder why those were the
ones that got legs. This one for instance generates a couple of emails per
week still years after:

[https://jacquesmattheij.com/if-you-have-nothing-to-
hide/](https://jacquesmattheij.com/if-you-have-nothing-to-hide/)

This one too, even inspired a play!:

[https://jacquesmattheij.com/trackers/](https://jacquesmattheij.com/trackers/)

~~~
Jugurtha
I read the two pieces sequentially: from tragic to tragicomic.

------
Anwarasseef
Link to the raw file is broken - Try this [http://dannysalzman.com/files/hn-
blogs.csv](http://dannysalzman.com/files/hn-blogs.csv)

------
kristianc
Interesting to see the high numbers for GA. A bit of ‘do as I say, not as I
do’ going on?

~~~
pcr910303
Probably because there isn’t a good free analytics service that is easy to use
(no need for self-hosting) and is able to collect the info that one wants. GA
is free, and easy.

~~~
arbol
Piwik is free and does everything GA does. You can self host the analytics
dashboard so no 3rd party privacy issues.

~~~
seanwilson
Self hosting is about as far from easy as you can get though.

------
ereyes01
What does the "Programming Languages" section mean? Blogs that discuss the
languages? Use the languages in snippets within posts?

~~~
input_sh
I'm assuming the programming languages that CMSs people are using are written
in, hence the dominance of PHP and Node.js.

------
stared
I like the analysis. However, I am curious why entries are not sorted by
counts. As a rule of thumb, sorting alphabetically makes no sense!

Also, 382 sounds like a very small number, given the size of HN. I did try to
find many blogs I read, but couldn't find one. So, crucially - was it a random
sample? Or sample from top-liked, or from a particular month?

some findings (e.g. the prevalence of Wordpress) may depend on this procedure.

~~~
jacquesm
At the top he writes how he got the data, from the 'what is your blog and why
should I read it' post.

~~~
stared
I missed that thing, without clicking the link. (I understood it was an
inspiration rather than "Analysis of self-posted blog post to thread X".)

------
kickscondor
If you’re interested in more about what these blogs contain, I review the
links in all these threads at
[https://kickscondor.com/hrefhunt/](https://kickscondor.com/hrefhunt/).

For example, here’s one from last year’s thread at this time:
[https://www.kickscondor.com/hrefhunt/6/](https://www.kickscondor.com/hrefhunt/6/)

~~~
dmvaldman
[off topic] FYI, i found this comment by subscribing to the RSS feed of your
HN comments on Fraidycat (by linking to
[https://edavis.github.io/hnrss/](https://edavis.github.io/hnrss/) for your
username)

Very cool!

~~~
kickscondor
Oh hey - just saw this. Thank you for this link - I’ve been looking for just
such a thing!

------
dewey
I think I can spot my blog in the Analytics section, I guess I'm the only one
using Gauges as analytics.

~~~
spalas
Ha -- I'm one of the 4 hosted with Caddy web server!

All of my other attributes were in the majority (Hugo, Google Analytics,
etc...)

------
tuananh
> Static Site Generators

how can that be detected ? I'm curious.

~~~
Lammy
“generator” meta tag?
[https://html.spec.whatwg.org/multipage/semantics.html#meta-g...](https://html.spec.whatwg.org/multipage/semantics.html#meta-
generator)

~~~
saagarjha
Jekyll only adds that if you use their “SEO” plugin:
[https://github.com/jekyll/jekyll-seo-
tag/blob/0943563d0aac60...](https://github.com/jekyll/jekyll-seo-
tag/blob/0943563d0aac6013187c742f38b90fc576075fda/lib/template.html#L6). (I
don’t use that plugin, so I add it manually to mine.)

~~~
Etheryte
If I'm not mistaken, Jekyll also adds it in the RSS feed metadata, if you use
that plugin, but that's already a bit far fetched.

------
mshafer
That’s a helpful list of Google Analytics alternatives to check out, thanks!

There are 13 websites using Parse.ly, which starts at $500 per month! For a
personal blog??

~~~
jacquesm
The vast bulk of those are medium sites, possibly a volume deal with medium?

[http://robotoverlordmanual.com/](http://robotoverlordmanual.com/)
[https://medium.com/@andzwa](https://medium.com/@andzwa)
[https://medium.com/build-ideas](https://medium.com/build-ideas)
[https://medium.com/chrismarshallny](https://medium.com/chrismarshallny)
[https://medium.com/ing-blog/tech/home](https://medium.com/ing-blog/tech/home)
[https://medium.com/@matthagy](https://medium.com/@matthagy)
[https://medium.com/modern-nlp](https://medium.com/modern-nlp)
[https://medium.com/open-factory](https://medium.com/open-factory)
[https://medium.com/@romanorac](https://medium.com/@romanorac)
[https://medium.com/@soatok](https://medium.com/@soatok)
[https://medium.com/@zakjan](https://medium.com/@zakjan)
[https://sidstechcafe.com/](https://sidstechcafe.com/)
[https://vladaionescu.com/](https://vladaionescu.com/)

------
forrestthewoods
Neat. Here’s my blog if you do another run.
[https://www.forrestthewoods.com/blog/](https://www.forrestthewoods.com/blog/)

It’s artisanally hand crafted HTML with a little VanillaJS on a few pages. No
static generator used. Also hosted on Netlify. Although I use BunnyCDN for
large media. I post very infrequently.

------
mnemnc
I‘m surprised and a bit disappointed that only 4 of them use Matomo. I assumed
this to be the quasi standard alternative to Google...

~~~
tiborsaas
It's not an alternative if it's not free.

~~~
jotm
The self-hosted version is free.

------
vector_spaces
I suspect that the larger-than-one-might-expect representation here for Erlang
and Cowboy (an Erlang web server) is caused by sites hosted on Heroku, which
would return Cowboy/Vegur in the Server header, rather than because many HN
users are actually maintaining Erlang application servers to serve their
blogs.

~~~
40four
Interesting, I didn’t know Heroku used that stuff. It could be from Elixir/
Phoenix websites right?

------
xwdv
I would like to see a word frequency count gathered from all the articles
written on all these blogs and see if there’s any interesting patterns.

Bonus if you can segment the data by date as well so we can see trends over
time.

A person that builds such a system would have access to some pretty useful
data.

------
thanksforfish
> ended up copy and pasting all of the text from the entire post and then
> using regex on the command line to spit out a list of URLS.

I had a chuckle. This is how so much data analysis happens in practice.
Nothing like the command line for quickly cleaning some data.

Great work!

------
foxhop
I missed the original HN post, added my comment to the thread. Great work on
the analysis.

------
BigBalli
Thank you for putting this together!

Surprising insights: many actually use Google AdSense, nginx over apache

I'd definitely be curious in: PageSpeedInsight (score, load time and size),
post frequency/length past 12months, external link density

~~~
BigBalli
I answered some of my own questions:

    
    
      "time": {
        "minTime": 0.7,
        "maxTime": 53.5,
        "meanTime": 5.5,
        "medianTime": 3.1
      }
    
     "size": {
      "minSize": 1,
      "maxSize": 33484,
      "meanSize": 1536,
      "medianSize": 1565
     }

------
jamieweb
It would also be interesting to analyse the domain name registrars and DNS
hosting providers used.

Based on my experience with the HN crowd, I'd predict that Gandi + Cloudflare
would be a common one, with NameCheap closely behind.

------
stevekemp
Looks like the link to the CSV file is broken, which is a shame.

~~~
progre
Its here [https://www.dannysalzman.com/files/hn-
blogs.csv](https://www.dannysalzman.com/files/hn-blogs.csv)

------
shortformblog
One interesting thing that this didn’t catch was my CMS, Craft—which I think
is a signifier that Craft didn’t leave a lot of fingerprints.

------
jacobpake
It would be interesting, in a future part of this series, to cross-reference
the performance data with the technology used.

------
djhworld
Not sure if I added my blog late to that thread but it's not in the dataset.

Granted there were a lot of URLs in that post!

------
stblack
It's interesting, and telling, that Windows/IIS doesn't appear at all.

Or did I miss something?

~~~
ComputerGuru
We run all our website on ASP.NET Core, but we have a forum and blog which
basically automatically mandate MySQL. We played around with PHP on IIS [0]
for many years before giving up. We also tried the opposite and hosted our app
on Mono before giving up on _that_ due to bugs that would randomly cause
compilation errors [1].

Long story short, two completely separate backends each running on most
reliable platform for the stack. And nginx is in front of it all. Ping the
root domain and you’ll think you are on a big-standard Linux/nginx confit,
even though it definitely is not.

So don’t assume no IIS!

[0]:
[https://neosmart.net/blog/s=php+iis](https://neosmart.net/blog/s=php+iis)

[1]:
[https://neosmart.net/blog/tag/mono/](https://neosmart.net/blog/tag/mono/)

------
k__
I would have thought that more blogs were on dev.to than these technology
analytics suggest.

Their stack is Rails+Preact.

------
bobbydreamer
Nice work.

