I was happy to find that this article ended up focusing on the need for better web automation tools specifically. That's a very big problem right now. HTML is becoming more and more impenetrable to the end user. It's essentially just a delivery vehicle to get low-level presentation instructions to the browser. That gives a lot of control to the owners of the web page at the expense of the user. "With software, either the users control the program or the program controls the users", I think is the relevant quote.
HTML's building blocks are too low-level for our contemporary needs. We should gradually move to higher-order representations of the data we're exchanging online, and put pressure on companies/governments to expose their data using these representations. My browser (or the extensions I add to it) should be able to natively determine whether it's looking at a product listing, or a person's bio, or a social media post, because those blocks of information would adhere to a standard schema ("hooks", to use the article's term).
Naturally, companies that are in the business of putting things in front of your eyeballs don't want us to move in this direction, because it gives users much more control over what they see than they have today. If you don't want to see ads, tell your browser to skip rendering the <Advertisement /> element. (And if a company places ads outside of an <Advertisement />... I dunno, fine them?)
(Those of you who've been around long enough are probably wincing and rubbing your RDF scars, or your XML scars, or what have you. Yes, it's the same old fight. We have to just keep pushing until we make some headway.)
A few years ago, I implemented UI automatic tests on a bunch of iOS apps. The Apple go-to framework was UIAutomation. It was a pretty good tool for this job, albeit limited.
What I liked a lot was that the framework relied upon accessibility. To automate the testing of an app, you firstly had to make sure that the app was really accessible and then you could built upon accessibility attributes to add automatic testing.
I found it very smart because developers were incentivized to make their app accessible. Accessibility was, in a way, a tool to describe semantically a UI, so a screen reader, a test framework, the OS could understand how the UI was structured and acted upon.
> It's essentially just a delivery vehicle to get low-level presentation instructions to the browser.
Nothing about HTML is low-level.
HTML (and most of web tech) suffers from being both not low-level enough and not being high-level enough.
You can't override the browser and provide your own rendering pipeline if you wanted to actually do your UI [1] You can't tell it to batch rendering/update instructions for a subset of elements on the page. You can't compose elements. You can't use existing building blocks to combine them and build a proper new element [2] You look at a web-page funny, and it does a layout re-calculation [3]. And that layout recalc? It's abysmally slow. That's why you can't even animate anything useful. And so on and so forth.
[1] It's now possible wit Canvas and WebGL, but this is not web tech, not really. It's desktop tech that has been around for ages, and is now being bolted on to browsers.
[2] No, not really. Try and make a full dropdown/combo box that is properly customizable, keyboard-accessible, etc. Many have tried, all are abominations.
[3] Even things like border widths and some times even border colors wil cause a layout recalc: https://csstriggers.com
I meant low level in terms of the semantic meaning that HTML can convey.
The examples you gave of places where browsers are high-level in terms of what pixels get put where? I see those as features, not bugs. I'm actually happy that web browsers historically haven't given pages a lot of control over low-level rendering, and I am distressed by the ongoing push by large corporations to increase how much of that control they have access to.
I agree with your point about not being able to compose higher-order elements, though.
Google said a couple years ago that they would penalize sites with dark patterns (like forcing users to sign-in to view public pages). Doesn't look like Google enforced anything since all the major sites are using the same dark patterns now.
Google can’t really enforce things as much due to the anti-monopoly/competitive political threat. They can adjust search rankings but anything major can be used as an argument to break google up.
Yes but Google already penalizes websites that are slow to load, not secure, using black hat SEO, etc. Google could easily penalize dark patterns under the same approach, but they don’t. Google themselves said they would implement this years ago[1] and it has yet to happen.
Ideally we'd institute government regulation requiring website operators to offer the user-friendly formats, in the same way that GDPR places certain requirements on sites today.
Has anybody seen a successful version of this with web components? It seems like we should be able to test the theory of higher level web elements and see how far it goes.
There is a middle ground between automation and full-blown programming languages.
In the past, this was represented by Visual Basic 6. For the use case presented - large organizations with considerable time spent in repetitive tasks - probably there has always been in the org a non-programmer who could hack together a VB6 program for the benefit of the colleagues. I remember even a person describing himself a programmer because he could use Excel (which can be considered close to the middle ground position in the automation<>programming languages spectrum).
There were/are lot of small tools around programmed in VB6, which are probably very low quality from an engineering perspective, but do the job.
As a matter of fact, I wonder why the VB6 spirit hasn't been carried to the present. As far as I read, Delphi's the closest, but I've never used it.
I agree. One of the great things with VB was you had an instant "programming environment" when you opened it from Excel. But I think learning VB feels like a waste since you can only use it in a limited domain.
Learning Python is just as easy as VB and can be used across many domains, but it's hard to set up an environment (install packages, managing env's, etc.). I thought it'd be cool if you could make Python work like VB from Google Sheets, which lead to Wax[0]. It lets people run Python from Sheets with zero infrastructure. If anyone has internal apps that are heavily dependent on Sheets, I'd really appreciate your feedback.
I see the VB spirit coming back with projects like streamlit. Super easy to prototype simple things, logical order of processing. Really reminds me of the good old BASIC. No function definitions, no classes, no callbacks.
There was a time when VB came in two flavors. I believe VB5 was this way: The regular version that most of us bought allowed us to use classes, but not to create them. I know for myself, this enabled me to ease my way into OOP, because I could experience the benefits without any of the major pitfalls. If you wanted to create classes, you had to buy or download an extra package, that I never bothered with. I was basically a procedural programmer who could use classes supplied to me by somebody else.
Turning VB into a full blown professional development language akin to C# was, in my view, a mistake. It kind of erased the reason for "the rest of us" to use Basic. I switched to Python instead.
My theory on this is that GUI automation is way harder because most solutions requires the developer to explicitly spend more time on implementing something that is not usually useful to sales (non-automatable applications are the norm, and people don't decide to buy apps based on whether it supports automation or not).
CLIs are much easier because the primary interface (standard out) to communicate with the user is basically the automation API (whether it is stable or not). If you're a command line program, (unless you're doing something super wary), you're automatable.
My personal opinion is that the best way should support automating based on GUI interfaces… although I don't have any great ideas how to support various interfaces that are e.g. modal or contextual. The clipboard is basically poor man's pipes in the GUI world so we could take some ideas on how the clipboard and the source/destination application negotiate data types, and there probably are better ideas.
I've always thought that the Unix philosophy could do well here; if you start by building the rawest version of your app to operate on the command line, and then make the UI an interface to that, you can trivially automate anything the UI can do.
so in this scheme, the cli's "do one thing and do it well" is driving the program, while the UI's "one thing" is providing an interface to the cli.
the other nice thing about this is that it provides a far more debuggable build. also enforces good habits vis a vis separating abstractions.
not always possible in all contexts, though. I would have trouble implementing this for an app like Photoshop (although their command palette implementation is super close)
> I've always thought that the Unix philosophy could do well here; if you start by building the rawest version of your app to operate on the command line, and then make the UI an interface to that, you can trivially automate anything the UI can do.
That why open source software have the best UIs. /s
I thought I would disagree, but I very much agree that we need something like zapier for custom internal processes. I’ve automated thousands of hours worth of things with zapier and gsheets.
But if there isn’t a zap for it, and you can’t hack your way around it, nontechnical users are generally out of luck.
I would love for gsheets to be able to extend further into the browser and my desktop, because there is so much value and potential there. For example, a problem I worked on yesterday: I receive an email every day that has an attachment with the .eml file type. In that attachment is a link to a csv. I need the data from the csv in excel or sheets for daily use. There are no zaps to convert a .eml to any filetype - so I had to automate the attachment downloading to gdrive on my desktop, have a local automation run to convert .eml to .txt, then back in zapier grab the url from the .txt and deliver it to gsheets. Then the gsheet can just grab the data from the csv url.
Problem is - I couldn’t figure out how to automate the .eml conversion, so I’m stuck!
I think what you are doing is an overkill tbh. Google provides a nice little wrapper around their APIs via Google Apps Script, which allows you to use JS to interact directly without much overhead. You can use a JS eml parser [1] for this convert it into a single file dependency using esbuild [2], which will allow you to easily work inside the App Script environment
Offtopic: the design of Google Apps Script puzzles me.
It’s done in an extremely verbose OO style (everything must be treated as an active object, even things that are by their nature dumb data), tries to compensate for that by providing shortcuts for the resulting multiple-level accessor chains, but with no rhyme or reason wrt which shortcuts exist and which don’t. There are also some things that ought by the nature of the problem exist in the implementation, but are not exposed, and the official docs literally have you reimplementing them (IIRC an official way to see which Sheets rows are selected by the currently active Sheets filter was only provided relatively recently, and you still have to redo date formatting yourself). And maybe the verbosity is no big deal for the original consumers of the API, but in the GAS environment my impression is that every method call is an RPC that takes tens of milliseconds. The bottom line for me was that fetching a couple thousand cells from Sheets as a JS array and then just processing them in JS without touching any of the Apps interfaces turned out to be a couple of orders of magnitude faster than trying to figure out exactly which cells I needed and fetching then individually, even if I only really needed a couple dozen in the end.
I guess the real question is: why would anyone do it this way? I recognize that what the API is doing is in fact much more complex than it appears, because it’s a complicated distributed system (then again, when it was a VBA macro running on my desktop it didn’t need to be...), but why does it give the impression that noöne actually cares about the ergonomics? Even its original Javaish ergonomics, let alone given the fact that JavaScript is not Java.
> The bottom line for me was that fetching a couple thousand cells from Sheets as a JS array and then just processing them in JS without touching any of the Apps interfaces turned out to be a couple of orders of magnitude faster than trying to figure out exactly which cells I needed and fetching then individually, even if I only really needed a couple dozen in the end.
Weird, when I had worked with Google Apps Script, it had been significantly faster than working with the APIs by an order of magnitude or so. (although the feeback loop is still painfully slow)
Not what I meant: in GAS, batch-fetching everything (and processing it in JS) was faster than individually fetching the things I needed (and doing much less processing on them). Does Google want me to run more JS on their machines?..
Agreed, GAS is a great idea, horribly executed. It's part of the reason I made Wax[0]. It lets people run Python + SQL from Sheets with zero infrastructure. It's like GAS built for less technical people that already know some Python or SQL. It's also already integrated with other tools (e.g. databases, Slack, etc.).
If anyone has internal apps that are heavily dependent on Sheets, I'd really appreciate your feedback.
My coding knowledge is limited to barely editing existing code :). Everything I was doing is with no code, similar to what the article is about.
But yeah if there were a zap for .eml->.txt or to grab text from inside an email’s attachment the whole thing would take 1 step.
Often times without coding you have to take roundabout ways to accomplish something, but as long as it’s not slow and won’t break it doesn’t really matter.
IIRC an eml is basically raw MIME data, so you need a MIME parser of some sort. The Python stdlib has one, for example. (Why can’t you then just push the CSV into Sheets using the Sheets API? Admittedly it’s supremely unpleasant, slow, and the client libraries are massive, but it does work.)
Hmm. OK. I’m not sure how to say this without sounding pretentious, and that’s especially frustrating when my point is that the whole thing is much simpler than it appears, but I’ll try and hope you give my communication skills some slack.
I think you should just take the plunge.
By that I don’t mean that you must or that you ought to. For example, if the problem you described is one you hate so much that you’d rather not spend another second thinking about it, by all means, disregard everything I say here, go and do whatever it is you’d rather be doing. If you have a quarterly report to prepare by the end of the week, just get it done in whatever way you know you can. I don’t mean that you should stop.
I mean that your skill has surpassed your tools.
Imagine someone who needs to have a short, mostly predictable conversation in a foreign language. Understandably, they turn to a phrasebook and maybe a machine translation engine and cobble together enough phrases for their task. Maybe those phrases weren’t perfect, maybe they weren’t even grammatical, but that didn’t really matter: they needed to communicate, they succeeded, end of story. Moreover, their approach was genuinely better than going to a language school: it would’ve taken months, at least, to get at the necessary phrases that way, whereas phrasebook plus MT probably got them there in hours.
If that person then needed to communicate some more, maybe on a slightly different topic, repeating the same approach would be completely reasonable. It might even get faster and easier the next time around, now that they have some idea of what they are looking for and how things work generally. The third time, it’s probably still the same, even if they need to go to the library and go through a couple of phrasebooks to find the necessary sentence templates. At some point, investing in a better, spiffier phrasebook becomes reasonable.
Eventually, though, the whole thing becomes silly. The tasks are probably not getting any simpler, but the phrasebook is just as limited as before. Sentence structures are simplistic, necessitating chopped speech and contortions to get the causal or temporal structure across. Idioms are lacking, leading to verbatim translations and circumlocutions that need several rounds of back-and-forth before understanding is reached. Bizarre pronunciation means listeners need to strain their ears and get annoyed. At the same time, a substantial amount of knowledge has probably been absorbed, just through repeated exposure, it’s just that it doesn’t neatly fit into the first chapter of a textbook but is more smeared over several years’ worth of conventional teaching aids.
This is not the point to conclude that the language is just weird and occasionally insane and requires painful workarounds to express some things. Even if it is so, a bit, no amount of phrasebooks is enough to give an accurate idea of where and in which ways.
That is the point to go find an introductory course. Moreover, this is the point to shine in an introductory course, because things will just click here and there and former mysteries will just dissolve. (Alternatively, this is the point to hire a private tutor who can find the pain points and hit them, given enough stubbornness or arrogance to not get discouraged.)
It seems to me that this is the point you are at in terms of automating things with computers.
The thing is, coding is easy. Programming is hard, it’s a craft that you can study for a lifetime and still have a lifetime’s worth of improvement left. If there’s an authority who can tell you how to go about learning to program, I’m not it. But coding satisfactorily is a matter of a couple of months, and getting out the first lines of code that visibly do stuff is literally a matter of typing them in.
In other words, using text to tell a computer to do something is completely straightforward; that’s literally what it’s for, after all. The difficult part is thinking of a way it should go about accomplishing what you want it to, because it’s really, really dumb and literal, so it needs really detailed and formalistic instructions.
Fortunately, that is the part you already have practice in! You might not have used letters and punctuation to phrase your programs, and the environment you’re used to fitting your objectives into might have simple and complicated things entirely the wrong way around compared to how they really are at the fundamental level, but you do have the mindset for describing tasks in a formal framework, and that’s really the only thing that’s painful to learn.
Sincerely,
a guy who got started by failing to understand a (really trashy) VBA (Office macros) doorstopper, then learned by looking at the results of the macro recorder and attempting to make thingies in PowerPoint go whee.
[This text was originally meant as a preface to a short Python script that would extract CSV attachments from emails and push them to Google Sheets, but it kind of got out of hand and I need to sleep. Watch this space, hopefully I’ll post a prototype here tomorrow.]
Having spent many years automating enterprise web apps professionally and random websites for fun and profit, I came to realize a lot of automation is just about the data itself, or rather the movements of it (new service order etc)
This realization led me to create Monitoro[0], a no code tool to abstract websites as a reactive data structure, allowing you to create events based on specific data changes from any website/web app.
What happens after these events is flexible, from no-code alerts and integrations to your own custom code triggered with the event’s data via webhooks.
I believe narrowing down the focus to data and events only (instead of open ended automation) leads itself better to a no-code offering, and the resulting solutions are more robust (web automations are famously fragile due to state changes).
This is great! Are you working on Monitoro (great name and logo by the way) full time? Seems like there'd be plenty of demand for it. A couple questions:
1. How do you handle auth (e.g. if I need to be logged into LinkedIn to see some data)?
2. Why did you add supabase as an integration? Seems like there wouldn't be much demand for that considering their size.
I've been hacking on Wax[0] which lets people run Python + SQL from Sheets with zero infrastructure. It's more in the low-code (vs. no-code) space and built for less technical people that already know some Python or SQL. It's also already integrated with other tools (e.g. databases, Slack, etc.).
> Are you working on Monitoro (great name and logo by the way) full time?
Thank you and yes!
1. We allow cookies to pass through in Local mode, i.e. on your computer (we don't store them, literally we try to make Monitoro's layer transparent to the browser). We don't support auth in Cloud extraction.
2. We built Monitoro on Supabase for quite some time and think there's great synergy between an easy platform to store and serve data (Supabase) and a platform to source that data in the first place (Monitoro). It's also a great alternative to Airtable when your database starts growing beyond the 50K record limit.
Wax looks really cool, I love it! Sheets is really the holy grail of personal computing and begs to be used for all its power.
Translating between all these layers was probably not easy, kudos. I'm a bit curious though regarding Google API limits. Did you hit any speedbump with that due to the volume of data you have to sync?
PS: Apologies for the slow reply here! Let's connect on twitter: twitter.com[slash]Omar[nothing]Kamali
Shortcuts is a pretty solid option for users with Apple Devices. Its available actions range from simply changing settings to running SSH commands, all packed in a user interface that is much more approachable than AppleScript or Automator.
I’ve started digging into Alfred (alfredapp.com) automations recently and have had similar experiences. It’s pretty easy to create simple automations using their workflow interface, and I was pleased to discover they made it really simple to publish the resulting workflow using git. In a few minutes I was able to create a workflow to pull a highlighted value and push the user to a variety of different internal websites depending on a regex match.
I’ve poked at Automator and Shortcuts a bit and found them similarly useful. The only issue is that not every app exposes functions to them yet, so there are a number of workflows I’d like to create that won’t work until the app dev adds functions to support “tell this app to do this thing”. KeyboardMaestro works well for filling that gap, but it’s a little awkward to figure out the “click at this spot on the screen” bits.
Too bad it feels like 30 seconds for a shortcut to run, including rather useless notifications. And if the camera is involved, it's unusable until the shortcut has completed its song and dance.
I do hope they make this faster and less obtrusive.
Edit: also, no idea how to run most of them. I've installed a shortcut. Now what :)
There's a handful of ways to trigger a shortcut to run:
- Manually via a home screen app icon
- A Share Sheet action (good for manipulating links or images, etc.)
- Automation event (time/schedule based, NFC tag scanning, or a HomeKit accessory changed)
Just looked through the Shortcuts app, it's impossible to deduce context and purpose :D
I do hope they improve this. Even as a tech person, I looked at this and went: nope, no idea what it is, it's intended purposes, or any useful use cases.
BTW. I did finally figure out that the shortcuts my Audi app added can only be used by saying their name exactly, and they will trigger via Siri :D Searching for them doesn't work. "Details" menu item (menu itself is hidden behind long tap) is unavailable etc.
I also like that it focused on GUI automation and not others. But instead of trying to automate GUI elements that will change anyway or be very hard to maintain, it's better to offer GraphQL endpoints. Meaning everyone can use the website for everything, but if you need to do repetitive tasks or do any quality checks again, you use the endpoint behind. I did not say REST because GraphQL has already pre-implemented documentation, types and only input correct data. It can be used as a query language with one single endpoint that helps change stuff in the backend without interfering with the user and many more valuable features. I think it's still on a low level and, to some degree, would need programming, but it could be abstracted higher and also still, it's just a kind of JSON format, so I guess it's less complex than the old VB coding that some mentioning here .
Every single "lowcode/nocode" tool suffers from 2 problems that are imo unsolveable:
1) Sooner ~~or~~ rather than later, there is a use case that goes beyond what the tool can do "easily"
2) Just because tools make coding easier doesn't mean its not coding any more.
I hate this ideology and I always have. Human civilization progresses in part due to specialization, that we as a society don't have to learn to do everything in order to keep things functional.”
Well, I hate your ideology :p programming is like language studies, math, sciences. It’s a fundamental way of living, interacting and, yes, functioning in the world. You don’t have to ask everyone to do it. There are plenty of people who communicate, but would struggle to write in their native language a few pages on a topic. Plenty of people say they’re “bad at math” and avoid the subject, and rely on the cash register to tell them how much change they should receive. You’re holding everyone to a ridiculous standard forming this opinion.
/rant
I love the topic. I’ve made scripts at work and trained non-programmers to use CLI tools to execute them.
We should be talking about protocols for user friendly APIs and yet uninvented data pipeline endpoints which require less technical skill.
For example, How can we let users create, try, fail with a database just as easily as they can with files on their PCs?
HTML's building blocks are too low-level for our contemporary needs. We should gradually move to higher-order representations of the data we're exchanging online, and put pressure on companies/governments to expose their data using these representations. My browser (or the extensions I add to it) should be able to natively determine whether it's looking at a product listing, or a person's bio, or a social media post, because those blocks of information would adhere to a standard schema ("hooks", to use the article's term).
Naturally, companies that are in the business of putting things in front of your eyeballs don't want us to move in this direction, because it gives users much more control over what they see than they have today. If you don't want to see ads, tell your browser to skip rendering the <Advertisement /> element. (And if a company places ads outside of an <Advertisement />... I dunno, fine them?)
(Those of you who've been around long enough are probably wincing and rubbing your RDF scars, or your XML scars, or what have you. Yes, it's the same old fight. We have to just keep pushing until we make some headway.)