I think my favorite part is just how short some of these articles really are once you remove all the nonsense and extra crap in the web pages.
Some articles are actually... 8 sentences. That is it. How on earth does it then take 10 seconds to scroll and parse all the fake inserts to finally realize that this is a poorly researched snippet masquerading as news...
> I think my favorite part is just how short some of there articles are...
In an attempt to enjoy this effect more broadly, I have Reader Mode set to enabled by default on Safari mobile.
On Firefox desktop I often use the Reader View button on news stories. There is an extension to enable this by default, but I have a hard time trusting browser add-ons.
This is also a great way to work around many paywalls, as when reader mode is on by default, it kicks in before the JavaScript that throws up the paywall.
It’s reasonable to have short articles in a newspaper, where you want to fit in a bunch of short factual stories, or on a newswire where you just want to quickly send out some facts before competitors. In the former case you just put lots of stories on one page. In the latter case, it’s often expected to be short.
I guess a lot of the BS filler text is added just so they can fit more ads around the text, huh? The goal for online publications isn't to inform you and save you time... It's to make you click on ads. They want to have you see as many ads as possible, and make sure that you stay on the page as long as possible. If the news article was just a one-paragraph snippet, you couldn't fit 8 ads on the page, it would look absolutely laughable.
A newspaper or news website would source most of their journalism from Reuters or Associated Press, and then fluff it up to fit their editorial stance.
The same articles would seem longer in print because they're formatted in such narrow columns, wrapping around images. There's some thought that goes into the layout.
Of course, the internet breaks that particular illusion. And I'm sure that if a marketing department could do to their printed paper what they do to their website, design and readability be damned, they would jump at the chance.
Nostaliga really kicked in here - seeing things like this for the first time, and feeling the unfurling of the future and a thousand new ideas in front of you, one so new and beyond all of your sci-fi expectations, and yet so real.
I feel so incredibly fortunate to have been old enough to see and understand the start of all of this, and later, to be a part of all of it.
I don’t think I had any feelings of nostalgia related to this at all, even though it must have been about when I started using the web. I guess I was still too young to get really attched to it.
When I looked at it before it had a Netscape-gray background; now it's white (updated? different browser? experiment?) and so doesn't capture that same feeling at all
Great! For even better results, please, set the background-color to `#C0C0C0`. (Netscape default. However, I'm not sure, if this was also the default on Windows, as well.)
Agreed. Blue text on white background is jarring. And I was wondering what the original Netscape Grey was!
Argh. Hoping "Godzilla vs Kong" reviews were going to be better. When will Hollywood learn the secret to a good kaiju film = less humans, more monsters ;)
It's so refreshing to have pages load instantly. Websites get so bogged down with loading resources from 12 different places. It'd be nice if a static webpage was the default and every change that slowed down loading was explicitly laid out to stakeholders in terms of marginal load time and resources required.
What's sad about that page and the articles it links to is that it could be soooo much better designed and still be clean and fast.
The main list could have dates and times and categories, so it's not just a dump of text links.
Each article could have a reasonably sized image or two without compromising the load speed.
Finally, a single "sponsored by" link could be included in the page to provide revenue via advertising.
It's insane that media companies feel that their sites need to be a bloated mess or barely there, and nothing in between.
Regardless, the fact that the URL isn't served via https is an indicator to me the this is a forgotten service and will eventually disappear the next time CNN does an overhaul of its web servers.
I guess the moment you start adding those "features" the whole process starts again and in six months you might end up where we are today if you let your feet (fingers) run (type) away with you.
Chrome has a "reader mode" semi-hidden feature which does this to any article on any site, and in my experience works perfectly 99% of the time. On mobile it's a huge saver, especially since Chrome by default doesn't have ad-block, and articles on mobile nowadays have become utterly unreadable with all the crap they throw at you.
This is another reason I like AMP in general, despite all of its issues, it generally is much cleaner [1] than the non-amp alternatives [2]
Cool project! My only complain is it defaults font to Courier (monospaced) which is harder to read and takes more place. You might use ‘Georgia’ which is web safe and pretty legible. Or of course, Times New Roman (just like site on this post)
You're right, but I think this depends on the browser too. I actually pass a stylesheet in the HTTP response header to make text appear white on a black background. Firefox respects this, but Chrome doesn't (at least for plain text files):
Will have to test again when I get some time to see what the options are. Might be a bit of a hack as there really aren't any HTML elements to target, so it might be that Firefox applies the CSS after inserting the text into an HTML template.
I'm curious where the data get fetched from. The Author mentions that Mozilla Readability and SimplePie are used.
Readability to parse the content. SimplePie to fetch the data (I assume). Dat from RSS feeds?
In case you want to make something similar, I recently wrote a blog on where you could get news data for free [1]
(self-promo) I'd recommend to take a look at my Python package to mine news data from Google News [2]. Also, in 3 days we're releasing an absolutely free News API [3] that will support ~50-100k top stories per day.
Thanks for the link. Technical part from that interview:
> On a technical level, the site obtains stories through the existing Google News RSS feed, which are then processed with some PHP trickery. "Google News has a very nice RSS feed, for each topic, language and country. So I thought I could connect to that feed, and write some code to simplify the result way down to extremely basic HTML, targeting only tags available in the HTML 2.0 specification from 1995," said Malseed.
> "So I used a PHP library called SimplePie to import the feed, and wrote some PHP code to simplify the results into a nice front page, using Netscape 2.0.2 on my 1989 Mac SE/30 to make sure it loaded fast and looked nice. The articles were a little more difficult, because they are on all sorts of different news sites with different formatting.
> "So I found that Mozilla has an open-source library called Readability, which is what powers Firefox's reader mode. I used the PHP port of this, and then wrote a proxy that renders articles through Readability, and then I added some code to strip the results down even further to extremely basic HTML."
Looks fine? It is a big blob of undistinguished text. If you squint your eyes everything looks the same. The lack of column widths means your eyes have to do a lot of scanning especially on a bigger/wider screen. The lack of color/font/size differentiation means there's almost no information hierarchy to it. While we all might complain about advertisements and loading times, I for one are very glad we have moved on so significantly.
I assembled a similar decruftifier for the Washington Post specifically, using html-xml-utils (https://www.w3.org/Tools/HTML-XML-utils -- and some sed/awk) to strip only core article content & metadata (head, byline, dateline). Result was typically <5% of original HTML.
I've come to realise that most online commercial publishing does not even use bold within body text, giving another filter trigger for stripping cruft.
Viewing the source of that webpage really takes you back. Plain old HTML. It's nostalgic beauty.
I recently fixed up an old 486 I purchased off eBay but it was bittersweet when I managed to get it connected to the 'net. Most websites were inaccessible due to the lack of support for today's encryption protocols, those that were had numerous JavaScript issues.
Were you just using the browser/OS it came with, I guess like Windows 3.1 or 95?
Given their well-publicized insistence on building for a ton of obscure arches, I'd expect you could run modern Debian on such a machine no problem, with a modern web browser. Might be a little slow, especially if you stick with the original disk, but should be perfectly usable.
> I'd expect you could run modern Debian on such a machine no problem...
Nope. Current builds of i386 Debian require a Pentium Pro or later -- I believe it's because they're compiled with the CMOV instruction, which wasn't present in the Pentium or earlier.
Ah, good point. Looks like Debian Jessie might have worked for it? In any case, distrowatch suggests a few others like Alpine and TinyCore that might have the proper support.
Nitpick 3: The <meta> tag is only a band-aid for shitty webhosting where you cannot access the webserver config to make it send the correct Content-Type in the actual HTTP response headers. The modern <!DOCTYPE html> instead implies a default of UTF-8 which works well for most.
Nitpick nitpick: the html doctype doesn't imply UTF-8. Valid modern HTML documents must be encoded using UTF-8, but the standard also requires that the encoding be specified somehow.
> The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it... If an HTML document does not start with a BOM, and its encoding is not explicitly given by Content-Type metadata, and the document is not an iframe srcdoc document, then the encoding must be specified using a meta element with a charset attribute or a meta element with an http-equiv attribute in the Encoding declaration state.
Oh I actually quoted the website's source. They have that DTD meta crap in there.
But I think you can just do <html> nowadays and it empirically just works. Seriously, screw the anti-DRY people that want me to put some !DOCTYPE or xmlns tags with some W3C links or some DTD nonsense inside ... I should only have to specify "html" exactly once, no more.
If I had designed the spec I would have just made it
Incredibly more readable, and memorizable. A markup language (literally), by virtue of being a markup language, should not be impossible to memorize. Making scary strings like "-//W3C///DTD" part of the spec is counterproductive.
That's SGML tag inference at work (theoretically at least, since browsers have HTML parsing rules hardcoded). SGML knows, by the DOCTYPE declaration, that the document must start with an "html" element, so it infers it if it isn't there. Next, by the content model declared for the "html" element (normally obtained via the ugly public identifier that sibling comments complain about), a "head" element is expected, so SGML infers it as well if it's declared omissible, and so on.
In the "old days", web pages were often just the bare content (no html, head, body containers, no DOCTYPE declaration). A few sites also featured just the body tag (and respective content) for setting the background attribute for the page background color.
E.g., this is the entire code of Netscape's first home page:
<TITLE>Welcome to Mosaic Communications Corporation!</TITLE>
<CENTER>
<A HREF="MCOM/index2.html"><IMG SRC="MCOM/images/mcomwelcome1.gif" BORDER=1></A>
<H3>
<A HREF="MCOM/index2.html">Click on the Image or here to advance</A>
</H3>
</CENTER>
At the same time, it's interesting to see those tags used for what otherwise looks like a pretty un-styled page.
Like, part of the premise of CSS was progressive enhancement, where just the semantic structure alone would provide an adequate experience with however the browser might choose to render those elements by default. Basically my question is, if the font size tags were taken out and just bare h3/h4/p used instead, would that still render a usable page on Netscape 1.1? Could you then supply font overrides via a <style> tag in the header which could be applied by later browsers?
Obviously it would be a different kind of experiment as the result would no longer be identical across all the "supported" browsers, but might be an interesting comparison point.
"Don't use tables, use CSS!" was a big message. But CSS's tools for tabular layout were extremely poor and difficult to use, leading to much frustration. It was a joke how hard it was to create a simple responsive three column layout in CSS, a thing easily accomplished with tables and very common on the web. Getting that three column layout right seemed like black magic in CSS1.
It was, but in hindsight maybe the message needed to be elaborated more. Perhaps it should've been "Don't use tables for layout, use CSS!"
Besides that, CSS IMO accelerated more complex and visually pleasing websites, and arguably spurned on the Web 2.0 look. Unfortunately, due to the message not specifying the for layout bit, it took a while for many devs to unlearn that tables are bad – tables aren't, they should just be used for tabular data.
But this was exactly the problem. They said "don't use tables", but then didn't have anything that could do what you could do easily with tables. Instead you got many hours of arcane and confusing combinations of float: and clear: tags. Especially if you were trying to avoid absolute positioning and wanted the page to be responsive.
I feel quite the opposite now, a table feels like so much typing compared to a quick flexbox layout. Even when I legitimately need a table, I dread it.
It's surprisingly tractable to plug 90s machines into the internet via ethernet adapters or little serial gadgets that can do SLIP or pretend to be hayes modems, but the modern web full of crypto and execution environments that can bring modern computers to their knees is not kind to them.
This is great! I so don't miss the err, personalization and targeted content. If you can track down RSS news sources I'd recommend having a look at Newsbeuter https://github.com/akrennmair/newsbeuter
Neat project. I use NetNewsWire for rss. It’s interesting that the format people prefer to consume news in versus the format that news is delivered in are now so different.
Newspapers went through the same thing. The older papers are all stories and were funded through the price of the paper, then ads invaded the margins.
Based on the network request, it's a single 24.88KB HTML file hosted on nginx and a cache HIT status and current age of 725 seconds. There's no CSS, JS etc. The HTML file is pure html from what I can tell with no inline CSS/JS.
I achieved something similar for my site by putting the HTML file generated every few minutes on a single S3 bucket and putting CloudFlare in front with caching page rule on the HTML too (by default HTML is not cached by CF).
This is perfect! I have news.google.com blocked on my PC because I'm trying to block a bad habit of idly typing it in and getting sucked into the void. Came to HN and still found a way to get the news drip without as much distraction. :)
To be devil's advocate, I feel like those serif fonts were easier to read on a low resolution monitor because they were sharper due to the pixels being very apparent.
Here not even on a 4K, I find it difficult to read the headlines.
It would be great to get this working on each of our personalized Google News feed. But I suspect it would require either a userscript stylesheet or Google sign-in (if that even gives you a personalized feed).
Subjective, but I wonder if browsers had defaulted to sans serif font instead of serif, people wouldn't complain about "how bad this webpage looks". I get the document origins of course.
This is wonderful! As pages get more bloated and new crypto is used for https my old computers lose access to more and more of the web, bookmarking this to browse from mac os 9 later on today.
This is great! And retro, your channel seems cool. I think I have a 3com Audrey Ergo in the closet if my wife didn't toss it. It you're interested I'll send it to you.
I'm going to hazard a guess, that because ancient browsers sometimes displayed html tags they didn't understand, the author has deployed a hack to ensure the correct character encoding is used on a new browsers without soiling the rendering on older systems.
Also, I was hoping that 68k.news supported HTTP 1.0, but it doesn't, it's a virtual host on the IP, so needs the host: header variable set, which is HTTP 1.1 - that's a bit of a shame as it means the original browsers such as Mosaic can't access it.
Some articles are actually... 8 sentences. That is it. How on earth does it then take 10 seconds to scroll and parse all the fake inserts to finally realize that this is a poorly researched snippet masquerading as news...