Heck, with the cost of storage so low, recording every webpage you ever visit in searchable format is also very realistic. Imagine having the last 30 years of web browsing history saved on your local machine. This would especially be useful when in research mode and deep diving a topic.
EDIT: I forgot to mention https://github.com/webrecorder/webrecorder (the best general purpose web recorder application I have used during my previous research into archiving personal web usage)
Since starting with replacing bookmarks, I've moved other forms of reference info in there, and now have a whole GTD setup there as well, which is extremely handy since I can search in one place for reference info and personal tasks (past and future). Only downside is I'm dependent on Evernote, but hopefully it manages to stick around in some form for a good while, and if it ever doesn't, I expect I'll be able to migrate to something similar.
I was an Evernote user when I was on macOS. When I switched to Linux, a proper web clipper was something I really missed. I'm now on Joplin and it does everything I used to use Evernote for and then some.
It even has vim bindings now!
As far as longevity goes, I think they got their archive / backup format right - it's just a tarball with markdown in it.
Also people need to move away from those esoteric reactjs, angular, vuejs and plethora of CMS as API or static site generators relying on some js framework which won't last even 2-3 years. Use a static site generator which can generate a plain html, like static site generators built on pandoc, python docutils or similar.
Personally I like restructuredText as the preferred format for content as its a complete specification and plain text. So the only thing in this article I will change is that content can also be in rst format and then generate html from it. Markdown is not a specification as each site implements their own markdown directives unlike restructuredtext specification and most of the parsers and tooling are little different from each other.
Not by that name... https://commonmark.org/
If you are in restructuredText world there is one specification and all implementation adhere to it, be it pandoc, sphinx, pelican, nikola. The beauty of it is that it has extension mechanisms which provides enough room for each tool to develop it. But markup can be parsed by any tool.
It's better than "designed by a committee" standards, but it lacks elegance or maybe craftsmanship.
You don't really get bit by its lack of a standard and extensibility until after you've bought in.
It's essentially designed by the opposite of a committee -- rather than including everything but the kitchen-sink, it contains support for almost no usecases except the one. Which is very appealing, when you only have the one usecase.
So markdown needs to thank the popularity of Wikipedia for its success, as rst did not have any application like Wikipedia. But still rst is used widely enough with its killer Sphinx, readthedocs and now its kind of de-facto documentation writing markup in Python and many open source software world.
I have used rst intensively on a project. A few years later, I would be hard pressed to write anything in ti and would need to start with a Quick Start tutorial. With all its faults, Markdown is simple enough that it can be (and is) used anywhere, so there is no danger of me forgetting its syntax (even if it wasn't much simpler to star with).
So personally I would prefer md over rst anytime.
Joplin is free and open source.
Joplin is open source, which is a big part of the sell to me. It definitely isn't the best of all possible note taking systems that could ever exist, but it's the best open source one I've found so far, and I don't have time to write a better one at the moment.
> why not build it into browsers. I have seen Firefox and Chrome can download web pages. So it will be nicer if they can download the bookmarked pages and store in a local html, css, image folder. I think it's pretty easy to achieve
This is solving a different problem though. WARC/MHT and other solutions can do this. Joplin is more of a note taking system that allows ingesting content from the web into one's own local notebook, which is relevant to what the GP post was talking about - Evernote.
However, it would seem that "the modern web" is the now popular standard. 10 years ago it might have been Flash or Java web applets or whatever. Now it's JS. I'm not convinced that JS is any better than what it has replaced. However, people keep paying developers to write them, so presumably someone likes them.
> Also people need to move away from those esoteric reactjs, angular, vuejs and plethora of CMS as API or static site generators relying on some js framework which won't last even 2-3 years. Use a static site generator which can generate a plain html, like static site generators built on pandoc, python docutils or similar.
Agreed, but that's also not a problem that Joplin, Evernote, or any other such tool is going to be able to solve. Unless you are complaining that Joplin is an Electron app? That's my biggest issue with it personally. It runs well enough, but is definitely the heaviest application I use regularly, which is a little sad for a note taking program. On the other hand, I haven't found a better open source replacement for _Evernote_. There are lots of other open source note-taking programs though.
> Personally I like restructuredText as the preferred format for content as its a complete specification and plain text. So the only thing in this article I will change is that content can also be in rst format and then generate html from it. Markdown is not a specification as each site implements their own markdown directives unlike restructuredtext specification and most of the parsers and tooling are little different from each other.
reST is indeed very nice. At one point, I kept my personal notes as a Sphinx wiki with everything stored in reST. I found this to be less ergonomic than Evernote/Joplin, although in principle it could do all the same things that Joplin can do, and then some.
Safari does this. pages added to the reading list archive the content for offline reading.
I have used Evernote and OneNote, but have finally, after a long interim period, resorted to using only markdown.
I have a "Notes" root folder and organize section groups and sections in subfolders. VSCode (or Emacs), with some tweaks, shortcuts, and extensions, provides a good-enough markdown editing experience. Like an extension that allows you to paste in images, storing it in a resources folder in the note's current location (yes, I see small problems with this down the road when re-organizing, but nothing that can't be handled).
For Firefox, I use the markdown-clipper extension the few times I would like to copy a whole article, it works well enough. Or I copy/paste what I need; mostly, I take my own summarized notes.
For syncing, I use Nextcloud, which also makes the notes available both for reading and editing on Android and iOS (I use both).
Up until very recently, I used Joplin, which also uses markdown, but there were two things I could not live with: it does not store the markdown files with a readable filename, e.g., its title, and being tied to a specific editor.
If you are mostly clipping and not writing your own notes, I can imagine my setup won't work well, or be very efficient.
I want to use a format that has longevity, and storing in a format that I cannot grep is out of the question.
You bookmark in Pinboard or Delicious and ArchiveBox saves the page. Handy.
I had recently participated in a discussion on the problem of forgetting bookmarks.
Copying my workflow from there,
1. If the entire content should be easily viewed, then store via pocket extension.
2. If a partial content should be easily viewed i.e. some snippet with link to entire source, then store in notes (Apple).
3. If the content seem useful in the future, but it is okay to forget it; then I store it in the browser bookmarks.
But, my workflow doesn't address the problem raised by Mr. Jeff Huang; if Pocket app or notes disappear so goes my archives. I think self hosted archive as mentioned by the parent is the way to go, but I don't think it's a seamless solution to a common web browser user.
I frequently see something and want to try it out the next time I want to do something else. So I emulate User Agent strings and append lots of "like [common thing I search for a lot]" to the bookmark. When I start typing into the search bar for those other things I'll be reminded of the bookmark.
For example, since file.io is semi-deprecated I decided to try out 0x0.st . But I kept forgetting when I actually needed to transfer a file, so I made a bookmark titled "0x0.st Like file io".
As a side note, I have a similar bash function called mean2use that I use to define aliases that wrap a command and ask me if I'd like to do it another way instead or if I'm sure I want to use the command. I've found this is a nice way to retrain my habits.
Disclaimer: needgap is a problem validation platform I built.
It is true that it is propietary software but it is worth mentioning that all the content can be exported as an .enex file, which is xml.
So, the data can be easily exported.
Have you actually looked at such an xml: https://gist.github.com/evernotegists/6116886
Exported sure, it's all there. But importing that into your new favorite notes application is not going to be trivial, especially not for regular users.
That's why I've decided to stick mostly to regular files in a filesystem.
Joplin, for example, can import notes exported from Evernote. It's just a menu option that even regular users should have no trouble employing.
I know about DevonThink, i read good recommendations. But it's IOS/Mac only.
Any Evernote alternative for Win/Android with great search ?
> Only downside is I'm dependent on Evernote
No special software nor database required.
2. it works with any browser
3. I can move it to any machine and use it
4. It is not transmitted to the browser vendor
5. Being a text file, it is under my control
6. I can back it up
7. I don't need some database manager to access it
8. I can add notes and anything else to the file
9. It's stupid simple
So simple! Thank you!
Edit: doesn't appear to be down, they're just using a self-signed cert.
And I still remember the modem days where I would download entire websites because the ISP charged by the hour, and I'd read them offline to save money.
I think this says something kind of profound about information and capitalism and whatnot.
Or in some random script: "from iridb import iri_index" "data = iri_index.get('https://some/url')" I'm skipping lots, you can ref by hash, url, url+timestamp. It hands back a fh, you dont know if the data you are reffing even fits in memory. Extensive caching, all the iri/url quirks, punycode, PSL etc.
Some random pdf in ~/Downloads, "import doc.pdf" and dmenu pops up, you type a tag, hit enter and the pdf disappears into the hash tree, tagged, and you never need to remember where you put it. Later on you only need to remember part of the tag, and a tag is just a sequence of unicode words.
Chunks are on my github (jakeogh/uhashfs, it's heka-outdated dont use it yet), I'll be posting the full thing sometime soonish.
> SingleFileZ is a fork of SingleFile that allows you to save a webpage as a self-extracting HTML file. This HTML file is also a valid ZIP file which contains the resources (images, fonts, stylesheets and frames) of the saved page.
I need to do a handful of experiments to see exactly how this interacts with Sync, even though I've (foolishly) already synced the important profiles. I'd hope that the cloud copies of the places database just keeps growing, but in any case, I'd rather combine them all offline anyway.
Anyway, thanks for the guide.
This could be done in better and more specialized ways, one problem is browser extension APIs don't provide very good access to the browser's webpage-saving features.
That was in 1989 and today I mostly search my computer using find and grep commands, since that's what just keeps working.
It requires Python 3.8 (just the stdlib) and wget.
I have 3633 bookmarks, for a total of 1.5 Go unziped, 1.0 Go zipped (and we know we can get more from better algo and using links to files with the same checksum like for JS and css deps).
This seems acceptable IMO, espacially since I used to consider myself a heavy bookmarker and I was stunned by how few I actually had and how little disk they occupied. Here are the types of the files:
But it would be fantastic: by doing this experiment I noticed that many bookmarks were 404 now, and I will never get their content back. Beside, searching bookmark, and referencing them is a huge pain.
So definitely something I wish mozilla would consider.
There used to be this neat little extension called Read It Later that let you do just that. Bookmark and save it so you could read it when you were offline or the page disappeared. Later they changed their name and much later Mozilla bought it and added it to Firefox without a way to opt out. It was renamed to Pocket.
Bookmark integration would mean one software, with the same UI, on every platform, and only one listing for your whole archive system.
Just pay the yearly subscription so pinboard can cache your bookmarks.
The bus factor is high, but I suspect that Maciej has a plan that'll let us download our archive even if he does get grabbed by the mainland Chinese government let alone a forecasted going out of business action.
The point is to make a webpage that lasts. So people can link to it and get the page. That means making a maintainable webpage and a url that does not change.
It is great that you can archive every page you visit for yourself, but that is not the same as making a lasting web.
Lets make something that others can use too.
It's better than nothing but it's also increasingly frustrating to deal with.
I am currently dealing with the problem of parsing large mht files (several megabytes and up). A regular web browser would hang and crash upon opening these files and most ready made tools I could find struggle with the number of embedded images. It's very much a neglected format with very little support in 2019.
Edit: it can also auto-save pages, like SingleFile.
I still keep an old ESR with this plugin for archiving, and accessing MHTs.
Not useful to me personally, but useful to someone!
The main disadvantage was disk space. This is particularly true when some pages are 10 MB or larger. I would periodically prune the archive directory for this reason.
I stopped using Shelve when I started running out of disk space, and now I can't use Shelve because the addon is no longer supported. The author of Shelve has some suggestions for software with similar functionality:
(†): short of direct plist manipulation
Most websites I used to visit during demoscene high days, are now gone.
WARC as a format seems promising, but at least last I checked, open-source tooling to make it a pleasant and/or transparent experience is not really there, and worse, at least as of several months ago, doesn't really seem actively worked on. Definitely an area you'd expect to be further along.
Everyone is free to fork a browser and apply any changes they want. Allowing extensions to change anything at all essentially is the same as forking and merging your changes with upstream every update.
I guess “with a reasonably stable hook api” was supposed to be implicit in my statement.
(I believe that for historical purposes, enough complaining about ads and tracking will survive that future historians can easily deduce the existence of this practice)
I believe the name for that experience was/is "Microsoft Windows".
I tend to do that, I also save a lot of scientific papers, ebooks and personal notes. I've found that doing so does not help me at all. The main problem I have is that when I need to look something up (an article, a book, a bit of info) I reach for google first, usually end up finding the answer and go to save it, only to find that I had already found the answer beforehand (and perhaps already made clarifying notes to go along with it) and then forgot about it.
This, and not dead links, is the fundamental problem with bookmarks for me. Not only bookmarks, it extends to my physical notes and pretty much everything I do. If I haven't actively worked on something for a couple of months, I forget all about it and when I come back to it I usually have to start from scratch until I (hopefully) refresh my memory. Some of it is also usually outdated information.
I think this is a big, unsolved problem and I'm not even sure how to go about starting to solve it. I can envision some form of AI-powered research assistant, but only in abstract terms. I can't envision how it would actually work to make my life better or easier. It would need to be something that would help blur the line between things I know and things that are on my computer somehow. If I think of my brain like it has RAM and cache, things I'm working on right now are in the cache and things I've worked on recently or work on a lot are in RAM, but what's for me lacking is a way to easily move knowledge from my brain-RAM to long term storage and then move that knowledge back into working memory faster than I can do so now. I'm not even talking about brain uploading or mind-machine interfaces, but just something that can remind me of things I already know but forgot about faster than I can do so by myself.
I am convinced that figuring out how to do this will lead to the next leap in technological development speed and efficiency. Not quite the singularity that transhumanists like to talk about, but a substantial advancement.
What I've found is that I need to spend more time deciding what is important, and less time consuming frivolous information. That's hardly a technology problem.
For things I really don't want to forget, I'm using Anki , a Spaced Repetition System (SRS). Anki is supremely good at knowing when you're about to forget an item and prompting you to review it.
Spaced practice and retrieval practice, both of which are used in SRS, are two learning techniques for which there is ample evidence that they actually work .
You still need to decide what is worth remembering, but that's something technology can't help with, I think.
There are a few issues to consider:
- Any comprehensive archive of your activity is itself going to be a tremendously "interesting" resource for others -- advertisers, law enforcement, business adversaries, and the like. Baking in strong crypto and privacy protections from the start would be exceedingly strongly advised.
- That's also an excellent reason to have this outside the browser, by default, or otherwise sandboxed.
- Back when I was foolish enough to think that making suggestions to Browser Monopoly #1 was remotely useful, I pointed out that the ability to search within the set of pages I currently have open or have visited would be immensely useful. It's (generally) a smaller set than the entire Web, and comprises a set of at least putatively known, familiar, and/or vetted references. I may as well have been writing in Linear A.
- Context of references matters a lot to me. A reason I have a huge number of tabs open, in Firefox, using Tree-Style Tabs, is that the arrangement and relationships between tabs (and windows) is itself significant information. This is of course entirely lost in traditional bookmarks.
- A classification language for categorising documents would be useful. I've been looking at various of these, including the Library of Congress Subject Headings. A way of automatically mapping 1-6 high-probability subjects to a given reference would be good, as well as, of course, tools for mapping between these.
- I've an increasing difference of opinion with the Internet Archive over both the utility and ultimately advisability of saving Web content in precisely the format originally published. Often this is fragile and idiosyncratic. Upconverting to a standardised representation -- say, a strictly semantic, minimal-complexity HTML5, Markdown, or LaTeX, is often superior. Both have their place.
On that last, I've been continuing to play with the suggestion a few days ago for a simplified Washington Post article scrubber, and now have a suite of simple scripts which read both WashPo articles and the homepage, fetching links from the homepage for local viewing. These tend to reduce the total page size to about 3-5% of the original, are easier to read than the source, and are much more robust.
I'm reading HN at the moment from w3m (which means I've got vim as my comment editor, yay!), and have found that passing the source to pandoc and regenerating HTML from that (scrubbing some elements) is actually much preferable, for the homepage. Discussion pages are ... more difficult to process, and the default view in w3m is unpleasant, though vaguely usable.
Upshot: saving a WARC strictly for archival purposes is probably useful, but generating useful formats as noted above would be generally preferable in addition.
With the increasing untenability of mainstream Web design and practices, a Rococco catastrophe of mainstream browsers, the emergence of lightweight and alternative browsers and user-agents (though many based on mainstream rendering engines), the tyranny of the minimum viable user attacking any level of online informational access beyond simple push-stream based consumption, and more, it seems that at the very least there's a strongly favourable environment to rethinking what the Web is and what access methods it should support. Peaks in technological complexity tend to lead to a recapitulation phase in which former, simpler, ideas are resurrected, returned to, and become the basis of further development.
"The more libraries incorporated into the website, the more fragile it becomes" is just fundamentally untrue in a world where you're self-hosting all of your scripts.
"Prefer one page over several" is diametrically opposed to the hypertext model. Please don't do this.
"Stick with the 13 web safe fonts" assumes that operating systems won't change. There used to be 3 web safe fonts. Use whatever typography you want, so long as you self host the woff files.
"Eliminate the broken URL risk" by... signing up for two monitoring services? Why?
I think this list of suggestions does a great disservice to people who just want to be able to post their thoughts somewhere. There's an assumption here that you'll need to be technically capable in order to create a page "designed to last" and frankly that is not what the internet is about. Yes, Geocities went away. Yes, Twitter and Facebook and even HN will go away. But the answer sure as hell isn't "I teach my students to push websites to Heroku, and publish portfolios on Wix" because that is setting up technical gatekeeping that is completely unnecessary.
There are more problems though. older library versions might be vulnerable to XSS attacks, or use features removed by browsers in the future for security reasons (eval?). Or you might want to change something involving how you use the API but the docs are long gone. Generally, libraries imply complexity and when it comes to reliability, complexity will always be your enemy.
I have run into this problem trying to migrate very old web pages or blog posts off of SaaS sites that are shutting down or just decaying. It's not just that complicated sites make it difficult to extract the content in the first place; it's difficult to publish that content on another site in a high-fidelity, and sometimes even readable, way.
The hard part isn't keeping the old site (page) running (although that's not always easy either). The hard part is when you want to do something _else_ with that content -- more complicated means less (easily) flexible.
I agree a couple of the points seem out of place (the monitoring service one made me laugh. visiting my website is the first thing I do after uploading a new page), but the intent of this article I wholeheartedly agree with:
Reduce dependencies, use 'dumb' solutions, and do a little ritualistic upkeep of your website to keep it around for a decade or more. The things you propose are the norm and the reason nothing sticks around, IMO.
I think what you want is not just monitoring your internal links, but also external ones - if a page you linked to in your article starts 404-ing or otherwise changes significantly, it's something you'd likely want to know about. That said, just like preferring GoAccess over Google Analytics, it's something I'd like to have running locally somewhere (on my server, or even on my desktop), instead of having to sign up to some third-party service.
> "Prefer one page over several" is diametrically opposed to
> the hypertext model.
Indeed. 10 years ago, “font-family: Georgia, Serif” was guaranteed to work and look the same on pretty much all computers out there. Windows had all of the “web core” fonts (Georgia, Verdana, Trebuchet, Arial, even Comic Sans). Macintosh computers had all of the “web core” fonts. Even most Linux computers had them because it was legal to mirror, download, and install the files Microsoft distributed to make the fonts widely available.
In the last decade, Android has become a big player, and the above font stack with Georgia will look more like Bitstream Vera than it looks like Georgia on Android.
The only way to have a website have the same typography across computers and phones here in the soon-to-be 2020s is to supply the .woff files. Locally (because Google Webfonts might be offline some day). Either via base 64 in CSS or via multiple files; I prefer base 64 in CSS because sites are more responsive loading a single big text file than 4 or 5 webfont files. Not .woff2: Internet Explorer never got .woff2 support, and we can’t do try-woff2-then-woff CSS if using inline base64.
Even with very aggressive subsetting, and using the Zopfli TTF-to-WOFF converter to make the woff files as small as possible, this requires a single 116 kilobyte file to be loaded with my pages. But, it allows my entire website to look the same everywhere, and it allows my content to be viewed using 100% open source fonts.
Then again, for CJK (Asian scripts), webfonts become a good deal bigger; it takes about 10 megabytes for a good Chinese font. In that case, I don’t think it’s practical to include a .woff file; better to accept some variance in how the font will look from system to system.
Edit In terms of having a 10-year website, my website has been online for over 22 years. The trick is to treat webpages as <header with pointers to CSS><main content with reasonably simple HTML><footer closing all of the tags opened in the header> and to use scripts which convert text in to the fairly simple HTML my website uses for body content (the scripts can change, as long as the resulting HTML is reasonably constant). CSS makes it easy for me to tweak the look and fonts without having to change the HTML of every single page on my site, but as the site gets older, I am slowing decreasing how much I change how it looks.
If you're worried about fonts changing out from under a site you should surely also be worried about bitrot in, say, jQuery.
- CSS Stylesheets: 3
- Animated gifs: 1
- Individual JS files: 11 (around 2MB of JS decompressed (but not un-minimized))
- Asynchronous Requests: 14 (and counting)
And that's with uBlock Origin blocking 12 different ad requests.
That's not simple in any form. So, the possibility of something on this page breaking? High. There's a lot of surface area for things to break over time. And that's not counting what happens when the NPR's internal APIs change for those asynchronous requests.
In my case it also has the added benefit of being able to cache JS for a long(er) period of time, with users only having to download maybe 0-30kb of JS when only 1 component is updated instead of invalidating the entire JS served (Way under 1MB however)
I think I agree with you here, in that much of the power of hypertext lies in the hierarchical "tree" model.
And yet, I think it has not been used properly up to this point ...
I hesitate to post this as this is not quite finished, but here goes - this is something called an "Iceberg Article":
... wherein the main article content is, as the article suggests, a single, self-contained page.
And yet ... that is just the "tip" - underneath is:
" ... at least one, but possibly many, supporting documents, resources and services. The minimum requirement is simply an expanded form of the tip (the "bummock"), complete with references, notes and links. Other resources and services that might lie under the surface are a wiki, a changelog, a software repository, additional supporting articles and reference pages and even a discussion forum."
 Neither wiki nor forum exist yet, but the bummock does...
(1) a single, self-contained page, (2) that is just the "tip", and (3) linked within it is all the stuff mentioned
That site has various opinions about the "tip" being uncluttered of links, etc, but that's just an opinion (and one I disagree with).
Wikipedia articles can be quite long (and justifiably so) - perhaps scrolling many pages.
The "tip" of an "Iceberg Article" is ".. a single page of writing ..."
Perhaps confusing because I don't mean a "single (web)page" I mean, an actual single page.
or use Google Web fonts, and set let last option in your font-family to be "serif" or "sans-serif" to let an appropriate typeface be used if your third-party font is unreachable. That's the beauty of text, the content should still be readable even if your desired font is unavailable.
For typical "the words matter more than how the words look" content...can someone explain to me why we care about including the font?
See here: https://news.ycombinator.com/item?id=21841011
There’s also layout issues caused when replacing a font with another font, unless the metrics are precisely duplicated. There’s a reason RedHat paid a lot of money to have Liberation Sans with the exact same metrics as Arial, Liberation Serif have the same metrics as Times New Roman, and Liberation Mono have the same metrics as Courier New.
Google Fonts also isn’t blocked but I recall it being hit-and-miss in terms of responsiveness when I was working on a website that targeted Chinese audience a few years ago. However, I just tried resolving fonts.googleapis.com and fonts.gstatic.com on a Chinese server of mine, and they both resolve to a couple of Beijing IP addresses belonging to AS24424 Beijing Gu Xiang Information Technology Co. Ltd., so it’s probably very much usable now.
I do occasional web dev from within China and had to eliminate external references to get manageable page load times. At least from where I work pulling in practically anything from outside the Great Firewall will have a high probability of killing page load time. Anything hosted by Google in particular will often have you staring at your screen for 30 seconds.
The users who actually know how to change the default font also know how to use stylish.
Telling him to back off and let you cook because he can't know better than you (his user) would be absurd.
Same thing with design and typography. It requires skill and taste, and hopefully people will be delighted or simply consume the content for what it is, because the design/cooking just reveals that content in a convenient/useful shape.
Personally, I've been setting my browser to use only DejaVu fonts with a 16pt minimum for years (maybe a decade now) and every time I briefly use a default browser profile I notice the fonts and think not just "this is bad" but "how can people live like this?". Even with the usually minor issues that often appear, setting my own fonts is a way better experience than not doing so. My default experience is much closer to Firefox reader mode than it is to what the page specifies in most cases.
IMO, font speicification should be limited to serif, sans-serif, or monospace and let the user or browser set the actual font. Desingers should not rely on exact sizes of fonts or use custom icon fonts.
I think most fonts that get your attention suck, the best ones are invisible and get you directly to the meaning of text, without getting in the way. So maybe there's a kind of bias (selection or sampling bias?) operating here?
These days, people either use their social network’s unchangeable CSS, or they use a Wordpress theme with an attractive and perfectly readable font. Even Merriweather, which I personally don’t care for, is easy enough to read.
The only time I have seen a page use obnoxious fonts in the 2010s is when the LibreSSL webpage used Comic Sans as a joke to highlight that the project could use more money:
Edit It may be a case that the parent poster likes using a delta hinted font, either Verdana or Georgia, on a low resolution monitor, and doesn’t like the blurry look of an anti-aliased font on a 75dpi screen.
Indeed, typography is a skill. Most designers should have it though, which is why I asked more information to OP.
> The only time I have seen a page use obnoxious fonts in the 2010s is when the LibreSSL webpage used Comic Sans as a joke to highlight that the project could use more money
Ah, the infamous Comic Sans. It's a shame because as a typeface on its own, in its category, it is pretty good. Sadly, it's misused all the time in contexts where it's not appropriate at all.
> It may be a case that the parent poster likes using a delta hinted font, either Verdana or Georgia, on a low resolution monitor, and doesn’t like the blurry look of an anti-aliased font on a 75dpi screen.
Without more details we cannot guess. You're right: a lot of things can go wrong and ruin a typeface, regardless of how the characters are designed. Anti-protip: a reliable way to make any font look like shit is to keep the character drawings as they are and mess up the tracking (letter-spacing) and kerning.
> fonts that get your attention suck
If I can tell the difference between your font and the system default font, your font sucks; if I can't tell the difference, what's the damned point?
The web standards allow a website to use any WOFF (or WOFF2) font they wish to use. Please see https://www.w3.org/TR/css-fonts-3/
 There are some rendering issues with Dillo, with made the mistake of trying to support CSS without going all the way, making sure that http://acid2.acidtests.org renders a smiley face, but even here I made sure the site still can be read.
It doesn't; I only use lynx when someone tricks apt-get into updating part of my graphics stack (xorg, video dirvers, window manager, etc) and researh is needed to figure out how to forcibly downgrade it, and then only because I can't use a proper browser without a working graphics stack.
This is subtly but critically wrong; I am saying that it is necessary than web browsers do not render websites 'correctly'. The correct behaviour is to actively refuse to let websites specify hideous fonts, snoop on user viewing activity, or execute arbitrary malware on the local machine.
> Browsers with modern CSS support see [...] the serif font for body text.
My point exactly.
For myself, I wish that people would leave Arial, Verdana, Helvetica Neue, Helvetica, &c. out of their sans-serif stack, having only their one preferred font and sans-serif, or better still sans-serif alone; but as a developer I understand exactly why they do it all.
font-family: system-ui, Helvetica, sans-serif;
font-family: ui-monospaced, Menlo, monospace;
Anything self-hosted is already fragile: it will go away when you don't continue to actively maintain it (paying for a domain, keeping a computer connected to the Internet etc.) or when you die.
You can design and mark-up content that will still be useful and readable in 100 years. You might be able to preserve the presentation logic (CSS-style) for 100 years.
You probably won't be able to preserve the interaction design for 100 years (without a dedicated effort -- that's why they bury computers along with the software in time capsules).
But I think it is optimistic to think that _most_ SasS hosts are going to archive content for 100 years. Preserving digital content is an _active_ process. It takes resources and requires deliberate effort.
Postscript: I'm trying to think of modern companies that would preserve content for 100 years, assuming they make it that for.
Facebook is the only significant current platform that I can even imagine preserving content for 100 years, but even that seems like s stretch. Historians might step in to archive it, but is there real value to Facebook to maintain and publish 50 year old comments on 2.5 billion unremarkable walls?
Twitter won't. Certainly Insta, SnapChat, WhatsApp etc. won't. Flickr probably could do it relatively easily but won't. YouTube maybe, but there's more to store. Something like GitHub maybe?
I can see a deletion heuristic that considers both account activity and repo activity being deployed within the next 10 years.
However I expect another evolution in SCM in the same timeframe.
Also maybe cvs/svn/git repo generally don't contain content worth preserving for 100 years. There are some historically significant or interesting repos, but for the most part you'll have a bunch of unremarkable (and duplicated) code that may not have run then and certainly won't run now.
100 years is a long time, but I do run 20+ years old Common Lisp libraries and expect them to work without modification; I'd be really pissed if they disappeared from the Internet because someone thought that 5 years of inactivity means something doesn't work anymore.
As a digital native I never gave it a thought, but she told me that there is a collective memory gap in films that have been shot or stored digitally. With stuff that has been stored on film, there was always soem copy in some cellar and they could make a new working copy from whatever they found. With digital technology this became much much harder and costly for them, because it often means cobbling together the last working tape players and maintaining both the machines and the knowledge of how to maintain them. With stuff on harddrives a hundred different codecs that won’t run on just any machine etc this combined to something she called the digital gap.
I had never thought about technology in that way. Nowadays this kind of robustness, archiveability and futureproofing has become a factor that drives many of my decisions when it comes to file formats, software etc. This is one of the main reasons why I dislike relying soly on cloud based solutions for many applications. What if that fancy startup goes south? What happens to my data? Even if they allow me to get it in a readable format, couldn’t I just have avoided that by using something reliable from the start?
I grew to both understand and like the unix mantra of small independent units of organizations — trying as hard as possible not to make software and other things into a interlinked ball of mud that falls apart once of the parts stops working for one or the other reasons. Thinking about how your notes, texts, videos, pictures, software, tools etc. will look in a quasi post apocalyptic scenario can be a healthy exercise.
Some tape of master's were infamously reused to store other contents. Beside the whole archive problem come from the reusable nature and scarcity of the chosen storage. I think I've read something about reusing paper as well in medieval time.
This mostly happened with parchment, not paper, but otherwise you are right. It is called a palimpsest. Sometimes the writing under the writing can be reconstructed as happened with the oldest copy of Cicero's Republic.
Newer methods of storing information tend to be progressively easier to write, and progressively less durable.
(The following is not really in chronological order)
You'll never look at stone tablets the same way again. As primitive as they are, their longevity can be amazing. Ancient emperors and tyrants knew what they were doing. Trajan's column from 113 AD is our main source on roman legionary's iconic equipment.
Cuneiform tablets were heavy and awkward, but they were 3D so there was no paint to worry about.
Parchment tends to be more durable than papyrus, and paper. Perhaps the best known among the Dead Sea Scrolls was made out of copper.
Iron Age culture artifacts are harder to find than Bronze Age one, because bronze is more resistant to corrosion.
CD's, especially(?) from home burners, are reported to oxidize after several years. That may still be better than tapes, hard drives and other magnetic media (SSD?) which can be wiped by an EMP pulse. The internet era information storage appears to come with an upkeep cost! Slack practically doesn't archive messages by default. Until Gmail, it was typical for email servers to delete old messages.
People get used to novelty and things being ephemeral. Capitalism supposedly requires low durability goods so people keep buying them, including tools and clothes. Houses are poorly built break down pretty quickly.
I find it amazing people used to decorate their homes, tools, clothes with ornaments, engravings etc. You'd be a fool to do that today, you don't even know how long that thing is going to last.
Post internet, most content is globally replicated. Someone somewhere will find time and energy to make an Amiga simulator with exactly correct bugs, to run the program you want. Amount of content lost proportion to amount of content created must have gone down dramatically.
I think something might be getting lost in translation. Could she have meant “electronic” rather than “digital” (which to me suggests digital media such as DVD etc)
This whole anecdote makes more sense to me with this substitution.
"that formerly beloved browser feature that seems to have lost the battle to 'address bar autocomplete'."
But at least in firefox, if you type "*" then your searched terms in the URL bar, it actually queries your bookmarks !
There are many such operators, you can search in your history ("^"), your tags ("+"), you tabs ("%"): https://support.mozilla.org/en-US/kb/address-bar-keyboard-sh...
My favorite is "?", which is not documented in this link. It forces search instead of resolving a domain name.
E.G: if I type "path.py", looking for the python lib with this name, Firefox will try to go to http://path.py, and will show me an error. I can just add " ?" at the end (with the space) and it will happily search.
It's a fantastic feature I wish more people knew about.
It very well done as well, as you can use it without moving your hands from the keyboard: Ctrl + l gets you to the URL bar, but Ctrl + k gets you to the URL bar, clears it, insert "? ", then let you type :)
It's my latest FF illumination, the previous one was discovering that Ctrl + Shirt + t was reopening the last closed tab.
Mind you, MRU switching is still useful behaviour; Vim has Ctrl+^ to switch to the alternate file which is much the same concept, and Vimperator et al. used to do the same (on platforms where Alt+number switched to the numbered tab, rather than Windows’ Ctrl+number), no idea whether equivalent extensions can do that any more. I have a Tree Style Tab extension that makes Shift+F2 do that, and it suits me.
Additionally, you don't need an extension to jump to a tab anymore. Command-[1-8] goes to that number tab in the current window, where 1 is the leftmost tab. Command-9 goes to the rightmost tab.
Invite them to crawl it, verify the crawl was successful, and even talk about that link on your page.
It removes the risk of domain hijacking, hosting platforms shuttering, and the author losing interest. P.s. The internet archive is doing excellent work. Support them.
If the link still works when it gets clicked on that's a bonus, but it shouldn't need to be available for the content you're reading to be understandable.
Will you still be allowed to do that in ten years? Or will aggressive takedown policies have forced a shift?
I guess paraphrasing or summarising hasn’t been prohibited.
> will aggressive takedown policies have forced a shift?
Shifting to prarphrasing or to summarising doesn’t sound too bad.
- take an 'archive copy' of anything you link to so you can host it yourself if it goes away (copyright issues to consider, of course)
- automate a link-checking process so you at least know as soon as a target disappears
- only link to 'good' content (you can feed back results from previous step to approximate this on a domain-basis over time)
(although these things require a build process, which the article's author is against)
And on that note - consider donating to them.
I think using a static site generator might be OK. Common headers and footers help, and RSS might definitely be a good thing, but that seems to be dying.
One idea from this article I liked was "one page, over many". I don't think he meant have one single page on your website, but rather one per directory, and like he has with this article have one directory for a thought or essay or piece of something you want documenting, and just have an index.html in it.
I like this because I think the one thing that has killed off most personal websites is not the tech tool chain, but that "blogging" created an expectation of everybody becoming a constant content creator. The pressure to create content and for it to potentially "go viral" is one of several reasons I just tore down several sites over the years.
Around this time of year I take a break from work and think about my various side projects, and sometimes think about "starting a blog again". I often spend a few hours fiddling with Jekyll or Hugo, both good tools. Then I sit and think about the relentless pressure to add content to this "thing".
I like this idea instead though. No blogs. No constant "hot takes" or pressure to produce content all the time. Just build a slowly growing, curated, hand-rolled website.
I still think there might be a utility in having a static site build flow with a template function, but a simple enough design could be updated with nothing more than CSS.
Bit to think about, here... interesting.
In comparison, the websites that I've built and hosted or deployed myself, have constantly required periodic work just to "keep the lights on". I went out of my way to make this as minimal and cheap as possible, but even then, it hasn't been nearly as simple as the content I've published on wordpress.
At some point, people's priorities change. Perhaps due to new additions to the family, medical circumstances, or even prolonged unemployment. And when that happens, even the smallest amount of upkeep, whether it is financial, technical or simply logistical, becomes something they have no interest in engaging with.
If we really want our content to last, not just for 10 years but for a generation, our best bet is to publish it on a platform like wordpress.com. One which requires literally zero maintenance, and where all tech infrastructure is completely abstracted away from you. I know this isn't going to be a popular idea with the HN crowd, and I do not blame anyone at all for wanting to keep control over their content. People are free to optimize along whatever dimensions they wish. But if I had to bet on longevity, I would bet every time on the wordpress article over the self-hosted one.
You view on timeline is too short. We're not talking about keeping something online for 7 years, but for 70. If I had followed your advice a few years ago, I would have deployed on Geocities. Do you know what happened to those websites?
The question is, is wordpress going to be around in 70 years? No one knows. But that static HTML page will still render fine, even if it is running in a backward compatibility mode on your neurolink interface.
The question isn't whether wordpress will be around in 70 years, but whether it will outlast your self-hosted website. Anything that is self-hosted requires significantly more financial/logistical maintenance, and what is the likelihood of someone continuing to do that for 70 years?
Also, in most cases even you would have a very hard time accessing them, unless you somehow "pinned" them not to be far far down the scroller.
It'll live forever...
Yes, from a technical pow, but what about deplatforming? I think it is a bigger risk to lose data than any framework/technology deprecation. I would definitely not rely on any platform keeping my data.
I don't have a workflow for scraping and archiving snapshots of external links, but if someone hasn't already developed one for org I would be very surprised.
In another context I suggested to the hypothes.is team that they should automatically submit any annotated web page to the internet archive, so that there would always be a snapshot of the content that was annotated, not sure whether that came to fruition.
In yet another context I help maintain a persistent identifier system, and let me tell you, my hatred for the URI spec for its fundamental failure to function as a time invariant identifier system is hard to describe in a brief amount of time. The problem is particularly acute for scholarly work, where under absolutely no circumstances should people be using URIs or URLs to reference anything on the web at all. There must be some institution that maintains something like the URN layer. We aren't there yet, but maybe we are moving quickly enough that only one generation worth of work will vanish into the mists.
That works for some, even most people. Unfortunately, the content I create will inevitably cite material in languages other than the main document language. That means that I have to heavily use HTML span lang="XX" tags to set the right language for those passages, so that (among other things) users with screenreaders will get the right output. As far as I know, org-mode lacks the ability to semantically mark up text in this way.
I use the tree command on BSD to do just that. It has the option of creating html output with a number of additional options.
An example: tree -P *.txt -FC -H http://baseHREF -T 'Your Title' -o index.html
Don’t most web servers do this already?
Deploying a web site with (S)FTP works as well as it ever did... and is just as obscure to non-technical people as it ever was. Ease of use means loss of control.
It'd be a cool challenge to build something so simple that even a non-tech person could use which allows them to maintain control and ownership. Any good examples of tech in general that is highly approachable like this? Even things like WordPress are too complicated for most - maybe if not self-hosted it's not so difficult, but still falls short in terms of being complex and not just simple text or html (at the most)..