Hacker Newsnew | past | comments | ask | show | jobs | submit | thyrox's commentslogin

I've been eagerly awaiting access to the tool for quite a while.

I would definitely be willing to pay to try it out and provide feedback in addition. I'm genuinely surprised by this news.


I often think that the reason why PHP is still so popular compared to Perl is because php never stopped evolving.

If php had paused development or even slowed it down like this it would also have been far less popular as probably even frameworks like Symfony and Laravel won't be possible with the older versions of php.

So thank you to the maintainers but I wish Perl evolved more quickly like PHP.


I actually prefer Perl's stability. And, anyway, it's a dynamic language: you can create stuff without requiring changes to the compiler.

For a long time I worked with Java and it became very tiresome to have to learn a new framework for every new job. Java was my main tool between versions 1.02 and 1.6.


PHP had a huge range of intrinsics specifically targeted at processing web input with web like patterns and configuration. PERL is about one thing: munging text data with minimal syntax. Granted you can do all sort of things with it, but its main strength has been as a practical extraction and reporting language.


I feel that for music. During my teenage years I used to listen to the same songs over and over on my walkman. I still remember 99% lyrics to the whole albums. The music used to have such a strong connection.

Nowadays there are 1000+ songs in my playlist I can't even recollect 20% lyrics nor there is a chance in hell that I would listen to every single song on the entire album let alone every single song by tthe same artist.

It's like if I don't like the first 10 seconds, it's hide song and Spotify makes sure I never have to listen to that again. Even though some of my all time favorites are songs I hated at first but then there was no hide song button.

Sorry I digress but yeah the connections you make in childhood are really something. I just hope it's my age and not the technology responsible for this and the youth of today feel the same connection too.


I still try to find time to sit down and listen to an album from start to finish - it's kind of like the musical equivalent of spending an hour in a gallery exhibition of a single artist's work.


Surprised the headline didn't mention it was a Boeing 777. Points to bbc for not going for the obvious clickbait.


Why would the 777 be clickbait? 737 maybe.


The 787 has been in the news very recently for two possibly related incidences of sudden nose-down resulting in similar injuries. They aren't turbulence related, but writing "Boeing" and "injury" in the same headline is going to draw clicks.


"Boeing" would be the clickbait.


I clicked just to check if it was a Boeing so not mentioning Boeing is also click bait?


The bar is truly low is we're congratulating the media that.


Very nice. Since Hn data spawns so many such fun projects, there should be a monthly or weekly updates zip file or torrent with this data, which hackers can just download instead of writing a scraper and starting from scratch all the time.


It is very easy to get this dataset directly from HN API. Let me just post it here:

Table definition:

    CREATE TABLE hackernews_history
    (
        update_time DateTime DEFAULT now(),
        id UInt32,
        deleted UInt8,
        type Enum('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
        by LowCardinality(String),
        time DateTime,
        text String,
        dead UInt8,
        parent UInt32,
        poll UInt32,
        kids Array(UInt32),
        url String,
        score Int32,
        title String,
        parts Array(UInt32),
        descendants Int32
    )
    ENGINE = MergeTree(update_time) ORDER BY id;
    
A shell script:

    BATCH_SIZE=1000

    TWEAKS="--optimize_trivial_insert_select 0 --http_skip_not_found_url_for_globs 1 --http_make_head_request 0 --engine_url_skip_empty_files 1 --http_max_tries 10 --max_download_threads 1 --max_threads $BATCH_SIZE"

    rm -f maxitem.json
    wget --no-verbose https://hacker-news.firebaseio.com/v0/maxitem.json

    clickhouse-local --query "
        SELECT arrayStringConcat(groupArray(number), ',') FROM numbers(1, $(cat maxitem.json))
        GROUP BY number DIV ${BATCH_SIZE} ORDER BY any(number) DESC" |
    while read ITEMS
    do
        echo $ITEMS
        clickhouse-client $TWEAKS --query "
            INSERT INTO hackernews_history SELECT * FROM url('https://hacker-news.firebaseio.com/v0/item/{$ITEMS}.json')"
    done
It takes a few hours to download the data and fill the table.


May I hijack this thread for a related q. I love the public up-to-date hn dataset.

I saw recursive cte blog post..but this doesn't seem to work your hn dataset

https://play.clickhouse.com/play?user=play#V0lUSCBSRUNVUlNJV...

Are recursive ctes disabled on this instance or am i doing something wrong?


Done, and now it works perfectly.


what was broken?


This is unclear to me, I will ask the author.


The reason is trivial - I disabled the new feature flag on the playground service long ago (when it was in development). I will enable it back and send an example.


While trying the script, I am getting the following error -

<Trace> ReadWriteBufferFromHTTP: Failed to make request to 'https://hacker-news.firebaseio.com/v0/item/40298680.json'. Error: Timeout: connect timed out: 216.239.32.107:443. Failed at try 3/10. Will retry with current backoff wait is 200/10000 ms.

I googled with no luck. I was wondering if you have a solution for it.


It makes many requests in parallel, and that's why some of them could be retried. It logs every retry, e.g., "Failed at try 3/10". It will throw an error only if it fails all ten tries. The number of retries is defined in the script.

Example of how it should work:

    $ ch -q "SELECT * FROM url('https://hacker-news.firebaseio.com/v0/item/40298680.json')" --format Vertical
    Row 1:
    ──────
    by:     octopoc
    id:     40298680
    parent: 40297716
    text:   Oops, thanks. I guess Marx was being referenced? I had thought Marx was English but apparently he was German-Jewish[1]<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Karl_Marx" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Karl_Marx</a>
    time:   1715179584
    type:   comment


Also, a proof that it is updated in real-time: https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...


There is a public dataset of Hacker News posts on BigQuery, but it unfortunately has only been updated up to November 2022: https://news.ycombinator.com/item?id=19304326


I have a daily updated dataset that has the HN data split out by months. I've published it on my web page, but it’s served from my home server so I don’t want to link to it directly. Each month is about 30mb of compressed csv. I’ve wanted to torrent it, but don’t know how to get enough seeders since each month will produce a new torrent file (unless I’m mistaken). If you’re interested, send me a message. My email is mrpatfarrell. Use gmail for the domain.


As a starting point, that project has Apache Arrow files. I don't know if they'll update them though.

https://github.com/wilsonzlin/hackerverse/releases/tag/datas...

The comments text table is 13 GB, to give you an idea. Can definitely be processed on a laptop.


I very much support this idea. Put them on ipfs and/or torrents. Put them on HuggingFace.


I’ve had this same thought but was unsure what the licensing for the data would be.


that's a nice idea


The company I work for absolutely must have everyone using it as it's supposed to be much safer to Windows. So I had to spend over $2k for a stupid laptop that won't even run half the apps without hacks like Rosetta. I also hate that there are 3 special keys and command and ctrl are separated.

I am myself a Linux user and using a Mac makes me appreciate Linux so much more. It could be the #1 reason I want to start my own business so I don't have to use a Mac anymore.


> I also hate that there are 3 special keys and command and ctrl are separated.

Having cmd and ctrl distinct keys is great in a terminal though, no more conflict between "copy" and "kill".

Also you should not "hate" stuff like this, just accept that there are differences between all OSes, they’re tools. There’s tons of things that annoy me in tools I use (including macOS which I still use voluntarily) but it’s just easier to embrace it and use it as intended that trying to mold it in something it’s not.


Rosetta’s not a hack.


On one hand I agree that this should be flagged and taken off homepage (as it is right now), but. Otoh more people need to be aware of such scams and simply not trust reddit, hn, etc as good source of information anymore.


I have been seeing this guy David Sinclair a lot in my Youtube feed lately with raving reviews. But after reading your comment did a bit of research and my god what a scam(1).

There is so much misinformation and gaming going on in every industry and with A.I. generated content it's going to get so much easier.

(1) https://www.youtube.com/watch?v=Xn0EJQPyxkA


For every person like you who is on HN and willing to do followup research and change their mind, there are at least a hundred who will accept the first thing they're told at face value.

I mean, how could a Harvard Professor be a scammer ... right?!


Guess it's time to jump to macOS finally.

I've always wanted to try that but due to prohibitive costs never did.

But with more and more Windows shenanigans I guess it's time to bite the bullet and make the switch .


I don't think the right solution for feeling hostage of a proprietary OS is to switch to another proprietary OS.


as someone with about 15y exclusive (and then on and off basis) windows experience and then 10y of mac experience I'm finally contemplating moving to Linux... before I moved to mac I already tried to use open source, multi platform apps (libre office, thunderbird for example) so the switch was relatively painless. And I also stopped playing that many games (age and mac limitations). I also use a lot of Linux (mostly servers and mac shell) and got steam deck recently and oh gosh it's so nice - everything just works, you can run virtually any game and it's not so imposing as windows or mac


Thanks for sharing this link. My god there is something about this writing style that's just incredibly entertaining to read.

Most of the time I get so bored reading such long articles, like the New Yorker ugh. But this is the first time I've enjoyed such a lengthy read! I hope I can decrypt what the author is doing to make reading such fun.


Literally all of his posts are like this. It’s crazy.

These two are more life-advice-y, but he also does some fairly deep dives on various problems he sees with scientific research (and psychology in particular), and his writing is no less entertaining.

In fact, one of his posts is actually about how he thinks scientists should make more effort to make their papers fun to read!

It’s fantastic, and I also hope to emulate it.


Probably because the author is not trying to appeal to a corporate organization but just writing in his own tone of voice.


> Most of the time I get so bored reading such long articles, like the New Yorker ugh.

I must be boring AF; I usually like the New Yorker's articles.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: