For a command line web browser that doesn't compromise on modern web features, I use browsh[0] especially in low-bandwidth situations. In conjunction with tmux and mosh it's just fantastic because you get the full colors and features of the web with minimal bandwidth, since the rendering happens server-side and all the information that is sent to your computer is terminal half-width characters.
You missed one important point: edbrowse does not depend on those absurdely and grotesquely massive and complex c++ web engines (blink/geeko, financed by gogol, and webkit, financed by apple).
Funny seeing this here as I've been thinking a lot about text-based browsers lately. Just a couple days ago I tried to build this one from source, but I put it aside due to the dependencies on PCRE and a JavaScript engine. (I am running a hand-rolled Linux "distro" so I can't just install ready-made binary packages.)
I do really appreciate that this one uses libcurl on the backend. Surprisingly few browsers do this--Lynx, Links, and w3m all have their own networking code. They have bespoke HTML parsing and rendering as well. I'm lately thinking I want to see a text-mode browser that just glues together libcurl, curses, simple HTML rendering, and maybe an existing HTML parsing library. No text-based HTML rendering library exists that I'm aware of.
Also these classic text browsers have their own implementations of FTP, NNTP, and some other legacy cruft. I'm thinking most of this could easily be provided by libcurl (if at all).
> I'm lately thinking I want to see a text-mode browser that just glues together libcurl, curses, simple HTML rendering, and maybe an existing HTML parsing library.
I had a similar idea a while ago, except mine was to glue together components from the nim stdlib.
So I wrote something like that, then I thought "hey, why not implement some CSS too?" and that sent me down the rabbit hole of writing an actual CSS-based layout engine... I eventually also realized that the stdlib html parser is woefully inadequate for my purposes.
In the end, I wrote my own mini browser engine with an HTML5 parser and whatnot. Right now I'm trying to bring it to a presentable state (i.e. integrate libcurl instead of using the curl binary, etc.) so I can publish it.
Anyways, if there's a moral to this story it's that writing a browser engine is surprisingly fun, so go for it :)
It depends on quickjs for the JavaScript implementation, which should be fairly simpler to compile on a hand rolled Linux. I'm not so sure about PCRE though
Oh I'm sure the actual work to compile those packages is not much. It's more to do with keeping the number of packages on my system to a minimum.
Actually I would not be surprised if the JavaScript engine can be omitted with just a little bit of patching work... assuming there's not actually a build configuration that leaves it out. I've found that with some software projects and their dependencies, "required" does not always mean required.
Call it Unixy or something - unix philosophy of having each program do something separate.
Makes more sense, that's what this guy does anyways with the js engine?
> Surprisingly few browsers do this--Lynx, Links, and w3m all have their own networking code
I think people are suspicious of curl because it is a common utility, and they think it can't possibly have got it right - plus there's something mildly fun about figuring out how to monitor a socket and send/receive IP packets for the first time.
I have played around a bit with the Curl code a bit, in part I also suspect other programs do it to get "closer" i.e. being able to manage/dispatch events from a thread directly instead of some signal from a curl thread, probably something about security and thread safety too...
The main reason for the aforementioned browsers not using libcurl is mostly historical, as it simply didn't exist back when they were created. (The newest of them is links, first released in 1999 - and according to the curl website, the first libcurl release with a proper interface was in 2000.)
w3m even uses its own regex engine for search, because there was no free regex engine with Japanese support the author could've used back then.
Instead of only "thinking a lot about text-based browsers", I have been actively using them on a daily basis for the past 26 years.
Links already uses ncurses. I am glad that it does not use libcurl and that it has its own "bespoke" HTML rendering. In over 25 years time, I still have yet to see any other program produce better rendering of HTML tables as text. I have had few if any problems with Links versions over the years. I am quite good at "breaking" software and for me Links has been quite robust. The source code is readable for me and I have been able to change or "fix" things I do not like, then quickly recompile. I can remove features. Recently I fixed a version of the program so that a certain semantic link would not be shown in Wikipedia pages. No "browser extension" required.
Links' rendering has managed to keep up with the evolution of HTML and web design sufficiently for me. Despite the enormous variation in HTML acrosse the www, there are very few cases where the rendering is unsatisfactory.^1 I cannot say the same for other attempts at text-only clients. W3C's libwww-based line-mode browser still compiles and works,^2 although I would not be satisifed with its rendering. Nor would I be satisfied with edbrowse, or something simpler such as mynx.^3
I use Links primarily for reading and printing HTML. I use a variety of TCP clients for making HTTP requests, including djb's tcpclient which I am quite sure beats libcurl any day of the week in terms quality, e.g., the programming skill level of the author and the care with which it was written. This non-libcurl networking code is relatively small and does not need oss-fuzz. I do not intentionally use libcurl. It is too large and complex for my tastes. For TLS, I mainly use stunnel and haproxy.
Hey thanks for your perspective and a couple of mentions of software I'd not heard of (like tcpclient).
I agree that curl is pretty big and bloated. I would not call it a deficiency that Links et al. don't depend on it.
I mostly just was thinking that since I already have curl on my system, it'd be nice to have a browser that reuses that code. Especially since curl has upstream support for the much smaller BearSSL rather than depending on OpenSSL/LibreSSL.
I like the idea of BearSSL but it has no support for TLS1.3.
I am not a fan of TLS but alas it is unavoidable on today's www. Keeping up with TLS seems like a PITA for anyone maintaining an OpenSSL alternative or even a TLS-supported application.
This is why I pick stunnel and haproxy. These are applications that seem to place a high priority on staying current. Knock on wood. I am open to suggestions for better choices if they exist.
There are many TCP clients to choose from. Before TLS took over the www, it was more popular to write one's own netcat.
I have focused on writing helper applications to handle the generation of HTTP. Thus I can use any TCP client, including old ones that do not support TLS.
The "web browser" is really the antithesis of the idea underlying UNIX of small programs that do more or less only one thing. Browsers try to do _everything_.
This is not appealing to me. I try to split information retrieval from the www into individual tasks. For example,
1. Extracting URLs from text/html
2. Generating HTTP requests
3. Sending HTTP requests via TCP
4. Forwarding requests over TLS
5. (a) Reading/printing HTML or (b) extracting MIME filetypes such as PDF, GZIP or JPG
The cURL project's curl binary combines all these steps. It has a ridiculous number of options that just keeps growing.
For me, step 5 really does not need to be combined with steps 1-4 into the same binary. I am able to do more when the steps are separated because it allows me more flexibility. To me, one of the benefits of the "UNIX philosophy" is such flexibility. No individual program needs to have too many options, e.g., like curl. Programs can be used together in creative ways. I see the presence of a large number of options in a program like curl as _limiting_, and creating liabilities. If the author has not considered it as something a user "should" want to do, then the program cannot do it. Adding large numbers of options is also a way of catering to a certain type of user with which I generally do not agree. It is a form of marketing.
For step 4, curl is overkill. It has always suprised me that UNIX has not included a small utility to generate HTTP. Thus, I wrote one.
For step 5(a), Links has served me well. I am open to suggestions for a better choice but there are few people online who are _actual_ daily text-only www users that comment about the experience.^1 An HTML reader/printer, without any neworking code, is another small program that should be part of UNIX.
For step 5(b) I have written and continue to write small programs to do this, sort of like file carvers such as foremost but better, IMO. However I will often use tnftp for convenience.
I used tnftp for many years as the default ftp client on NetBSD and prefer it over (bloated) curl or wget. It is small enough that I can edit and re-compile if I want to change something. Because it comes from NetBSD project the source code is very easy on the eyes.
1. IMO, no sane _daily_ text-only www user today would use Lynx. Whenever anyone mentions it as a text-only browser option then I believe that person is not likely to be a _daily_ text-only www user. Lynx is bloated and slow compared to Links and the rendering is inferior, IMHO.
"h1b" is a HOSTS file entry for a localhost TLS-enabled forward proxy
"yy025" is a small program that generates HTTP.
Interestingly I think curl was modified in recent years to detect binary data on stdin. I just tested the following and it extracted the PDF automatically.
However, one thing that curl does _not_ do is HTTP/1.1 pipelining. I use pipelining on a daily basis. That is where these programs become useful for me.
cat > 052.l
/* PDF file carver */
/* PDFs can contain newlines */
/* yy045 removes them so dont use yy045 */
#define echo ECHO
#define jmp BEGIN
int fileno(FILE *);
xa "%PDF-"
xb "%%EOF"
%s xa
%option noyywrap nounput noinput
%%
{xa} echo;jmp xa;
<xa>{xb} echo;jmp 0;
<xa>.|\n|\r echo;
.|\n
%%
int main(){ yylex();exit(0) ;}
^D
I have used it, although it was many years ago. I am not sure about the availability of prior versions of w3m but this is one thing I like about Links. I often compile-edit-recompile early versions and this helps me experiment and understand the development of the program over time. I like that w3m is also called a pager. Ideally I want an HTML reader/printer, a pager, with no networking code. Unless w3m has changed, Links does a better job with HTML tables.
Thanks. What I like about w3m is 1) opening images via an external viewer if I want to, and 2) the UI, where everything is done on a command line at the bottom of the screen, vi-style. No input boxes like in Links. That aside, I remember thinking that Links did render some HTML elements better than w3m, though.
EDIT: mynx looks interesting, I wasn't aware of it. Really close to my dream browsing experience: A browser that renders HTML as text, has only a few control keys (w3m has quite many, and it can cause confusion at times). Customizing would only be possible via config.h, including handles for viewing images, PDF files etc. I wonder why mynx lacks a "back" key, though.
Opening files in an appropriate "external viewer" is how I remember browsers used to work. The assumption was that computer users had different dedicated programs to handle different MIME extensions. Links still purports to allow for using external viewers, though I do not use it that way. I do most ww retrieval _outside the browser_. Today so-called "modern" browsers are 150MB audio and video players, among a countless other things. The concept of the external viewer seems to have been lost.
There are things I dislike about Links. Certainly the NCurses menus and dialog boxes are less than ideal. But as an HTML renderer/printer it is the best program I have found. I recall that Elinks experimented with the vi-style command line. Elinks also created Lua bindings to allow for scripting. As an experiment, I started using Tmux to script Links. It surprised me how well this works. But overall, I have no need to script a browser because I prefer to work _outside the browser_.
Yes, I also (vaguely) remember an ELinks branch with some kind of command line. I think I even tried to build it, but it felt too experimental for comfortable usage. Still a good effort, though.
I started to look into Links and ELinks again after reading (and upvoting :) many of your previous comments. I also got really curious about netcat. HTTPS won't work directly, but has anybody ever written a rudimentary, less/more-like front-end to actually browse the web while relying on netcat?
The way you separate browsing into different steps is really inspiring to me, thanks for sharing. Like, you're actually using the web in such a modular way. I'm afraid I won't be capable enough to replicate any of this for my needs (I'm a more of a hobbyist with a soft spot for lean, terminal- and text-based workflows, and abusing an old Dell Mini 9 in framebuffer mode as my main machine). But it does get me thinking, heavily, again. Watching a screencast of you "browsing" the web with your helper tools would be interesting.
I suppose with all these hand-tailored helpers, using the internet is a much more "focused" experience: looking for specific things vs the aimless browsing that contemporary tabbed browsers encourage. Easier to leave the internet alone when you rely on those narrowly focused tools, I guess.
As for lean browsers, Dillo with FLTK was also an extremely enjoyable experience under X. Really easy to switch off CSS, a nice config file for hand-tailoring search agents, etc. Using Dillo was when I first realized that I don't need to know how the website was intended to look like by the author. I'm fine with just rendering the body text with a tolerable, consistent font face.
It almost feels like that in 2022, the major thing why regular people need to update their systems is because the web browser "doesn't work". But, end of rant.
The second two are included only as examples of how some people write "interactive" shell scripts. I prefer non-interactive scripts myself. I write programs to help me use the web _non-interactively_. "Tech" companies and graphical browser authors are always advocating for "interactive" web use (eyeballs) because that is what is most suitable for selling advertising services. As _Hobbit wrote in 1995, "The web sucks." Graphical browsers are to aid those seeking to make money from the "dismal kludge".
From one text-only web user to another, what do you think about Links' single key shortcuts, e.g., backslash to view source or asterisk to show image links. While I think Links' menus are somewhat cumbersome and slow, not to mention they can change from version to version, these single key shortcuts are very fast.
By staying on the command line, the web (and computer use in general) IMO can indeed be a more focused experience and it is easy to avoid aimless browsing. However I think that this can involve slowing down in a sense. If I were to share a tyepscript of me using the web _interactvely_ through the command line, without a mouse, without using a graphics layer (no X11, etc.), or even a framebuffer, without a terminal emulator (e.g., no cut/paste), let alone a graphical web browser, IMO it could not compare in speed to someone using all those conveniences. Like you, I am using underpowered computers with limited resources. People doing "screencasts" always seem to have very fast computers, for lack of a better term. Graphical browsers and the web look very snappy in those videos. Alas, this has not been my experience with graphical browsers over the last 25 years, at least not the popular ones we are forced to use.
Thankfully I am not trying to the same things as one does with a graphical web browser. I do not have to compete with those videos. I am not working on a different way to "browse", I am working on an alternative to "web browsing". I am using the web in ways that do not require a web browser, e.g., using a sitemap to HTTP/1.1 pipeline all of a website's pages over a single TCP connection to a single text file that can be split into chunks and read/searched with less/more (or Links). AFAIK, this cannot be done with a graphical web browser, no matter how "modern". And it is unlikely a "modern" web browser will ever facilitate it. Because it allows the www user to read www content offline, safely out of reach from "programmatic advertising".
"Interactive" vs non-interactive browsing is spot on,
thanks very much for this.
The way you outline these things (also considering the
code presented in many
of your previous comments) is a pristine example
of Unix-y design principles as laid out by E. S. Raymond.
I would probably be really happy with an internet that
only consists of FTP and email, so I'm definitely in the
"non-interactive" boat, too. But, due to shallow knowledge
(curiosity, but no CS education), the "outside the browser"
experience has mostly been ssh, ftp, and wget'ing things
with some simple
scripts. The code you've posted in your
comments is a huge inspiration to me, really.
It is also interesting how we (or, I) tend to consider a
web browser the beginning point of internet use.
Downloading PDFs, etc -- everything starts from a HTML page
that is rendered to us by the browser.
In your examples, it is the exact opposite: the text
browser or pager appears to be the ending point of
an internet session. Because of your tiny, modular tools,
you can decide "on the run" what to do with the data
during the next step. This is simply following old,
time tested Unix principles, but it is fascinating to
really see something like this in action, in such a
streamlined way, when it comes to web browsing.
Perhaps it is even more precise to say that, by definition,
the Unix pipes
have neither an end point nor a starting point?
It's all just stream of data (text), directed to where you want
in each turn with your helper scripts. Utilizing this with day-to-day
web usage is something people rarely do in 2022, I guess.
So, yeah, I confess being somewhat blown away by your stuff.
Re: (E)Links' shortcut keys: funny you bring this up.
I took a fresh look at ELinks over some years, and
I did think that * and \ are really nice just yesterday.
ELinks does render some things better than w3m,
and my distro (Tiny Core Linux) has a build with zero
dependencies (0.3 MB, just bare minimum of features, no TLS/SSL
etc). With this setup, I'm really tempted to try out the "outside the browser"
internet experience you've described. And that I've been
thinking about for years.
As for the UI, I think I'm really only annoyed by the way
URLs are entered in ELinks, into the curses dialog box in the middle
of the screen. In this regard, w3m's command line feels
more natural. I might try the :ex mode command line again,
and the lua scripting, even though I don't want to have
a big build with nonessential options.
Then again, configuring ELinks via the menus is
actually a fairly pleasant experience IMO. In w3m, I'm
always afraid of pressing some key I didn't remember.
In Elinks, I can always bring up the menu and fix things
when I messed something up by accident. So it's actually fine.
Also, the minimal ELinks build in my distro keeps the menus
clean and simple, there's no "feature creep".
And, obviously, a browsing experience that doesn't include
the (though only barely irritating) lag caused by
"accepting cookies" is extremely nice.
This is very cool, but dedicated newsreaders (like trn) were easier to use to browse USENET than edbrowse is to browse Hacker News..
If you were forced to surf the web using a teletype, this is your tool.
Edit: actually it's growing on me. My one complaint is that I wish it would retain the printing mode when you follow a link. For example, type "g2" to follow the 2nd link on the line. Right now I should be able to hit Enter or z to display lines from this link, but annoyingly you have to enter something like "1" to start it going.
I think the main "selling point" of edbrowse for sighted people might be its scripting capabilities. You can automate a lot of browsing with it if you take the time. See the user guide: http://edbrowse.org/usersguide.html#script
I've been curious about edbrowse for a long time, finding it a fascinating and inspiring project. It's a somewhat hidden gem for sure. Then again, I failed to build it on my (tiny) distro, and I occasionally think that maybe the internal scripting language, mail client, etc create "too many" temptations for a casual user.
A simple line-mode website pager that encourages Unix pipes for any user-defined task (maybe even the javascript engine could be built as an external application?) would be enough for a modest hobbyist coder like me, I guess. Something like the nmh mail handler, but for web browsing: https://www.unix.com/man-page/linux/7/nmh/
This is by no means meant as a harsh critique or whining, though. As indicated, the project, Karl Dahlke, Chris Bannon and other devs do deserve tremendous respect. It is quite possible that a heavily pipes-oriented web browsing would be too fragmented for actual, day-to-day usage.
The line-editing paradigm nonetheless has its beauty and a compelling simplicity, even in 2022. Keep it going!
That said, I do compile edbrowse with quickjs and pcre on my hand-rolled glibc/linux distro. I have even removed the build systems from edbrowse, quickjs and pcre to use my own. Quickjs is aiming the one-compilation-unit model (good!).
Almost no javascript-ed sites run with it (but I am still testing, for instance on gogol noto font download page), but I like to keep track of its advancement, loosely, but I do it anyway.
The bottom of the pb is big tech javascript-ed web engines: blink/geeko (financed by gogol) and webkit (apple).
We all know the only reasonably fix for this issue is to regulate HARD towards the availability of a noscript/basic (x)html portal for all pertinent web sites (which are close to 100% anyway).
[0] https://www.brow.sh/