💾 Archived View for dioskouroi.xyz › thread › 29440077 captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Coding for non-programmers: Why we need better GUI automation tools

Author: dgudkov

Score: 63

Comments: 35

Date: 2021-12-04 10:57:01

Web Link

________________________________________________________________________________

notpachet wrote at 2021-12-04 15:27:55:

I was happy to find that this article ended up focusing on the need for better _web_ automation tools specifically. That's a very big problem right now. HTML is becoming more and more impenetrable to the end user. It's essentially just a delivery vehicle to get low-level presentation instructions to the browser. That gives a lot of control to the owners of the web page at the expense of the user. "With software, either the users control the program or the program controls the users", I think is the relevant quote.

HTML's building blocks are too low-level for our contemporary needs. We should gradually move to higher-order representations of the data we're exchanging online, and put pressure on companies/governments to expose their data using these representations. My browser (or the extensions I add to it) should be able to natively determine whether it's looking at a product listing, or a person's bio, or a social media post, because those blocks of information would adhere to a standard schema ("hooks", to use the article's term).

Naturally, companies that are in the business of putting things in front of your eyeballs don't want us to move in this direction, because it gives users much more control over what they see than they have today. If you don't want to see ads, tell your browser to skip rendering the <Advertisement /> element. (And if a company places ads outside of an <Advertisement />... I dunno, fine them?)

(Those of you who've been around long enough are probably wincing and rubbing your RDF scars, or your XML scars, or what have you. Yes, it's the same old fight. We have to just keep pushing until we make some headway.)

jicea wrote at 2021-12-04 16:02:30:

A few years ago, I implemented UI automatic tests on a bunch of iOS apps. The Apple go-to framework was _UIAutomation_. It was a pretty good tool for this job, albeit limited.

What I liked a lot was that the framework relied upon accessibility. To automate the testing of an app, you firstly had to make sure that the app was really accessible and then you could built upon accessibility attributes to add automatic testing.

I found it very smart because developers were incentivized to make their app accessible. Accessibility was, in a way, a tool to describe semantically a UI, so a screen reader, a test framework, the OS could understand how the UI was structured and acted upon.

One of my (really) old post about UIAutomation:

http://blog.manbolo.com/2012/04/08/ios-automated-tests-with-...

agustif wrote at 2021-12-04 16:40:28:

react-testing-library also emphasises this albeit not so much focused on accessibility, but writing good semantic code with stuff like dom-testing-library

https://testing-library.com/docs/dom-testing-library/intro/

theturtletalks wrote at 2021-12-04 17:27:00:

But who will enforce these guidelines?

Google said a couple years ago that they would penalize sites with dark patterns (like forcing users to sign-in to view public pages). Doesn't look like Google enforced anything since all the major sites are using the same dark patterns now.

soared wrote at 2021-12-04 22:03:29:

Google can’t really enforce things as much due to the anti-monopoly/competitive political threat. They can adjust search rankings but anything major can be used as an argument to break google up.

notpachet wrote at 2021-12-04 18:11:54:

Ideally we'd institute government regulation requiring website operators to offer the user-friendly formats, in the same way that GDPR places certain requirements on sites today.

pfraze wrote at 2021-12-04 18:42:31:

Has anybody seen a successful version of this with web components? It seems like we should be able to test the theory of higher level web elements and see how far it goes.

dmitriid wrote at 2021-12-04 20:43:13:

> It's essentially just a delivery vehicle to get low-level presentation instructions to the browser.

Nothing about HTML is low-level.

HTML (and most of web tech) suffers from being both not low-level enough _and_ not being high-level enough.

You can't override the browser and provide your own rendering pipeline if you wanted to actually do your UI [1] You can't tell it to batch rendering/update instructions for a subset of elements on the page. You can't compose elements. You can't use existing building blocks to combine them and build a proper new element [2] You look at a web-page funny, and it does a layout re-calculation [3]. And that layout recalc? It's abysmally slow. That's why you can't even animate anything useful. And so on and so forth.

[1] It's now possible wit Canvas and WebGL, but this is not web tech, not really. It's desktop tech that has been around for ages, and is now being bolted on to browsers.

[2] No, not really. Try and make a full dropdown/combo box that is properly customizable, keyboard-accessible, etc. Many have tried, all are abominations.

[3] Even things like border widths and some times even border _colors_ wil cause a layout recalc:

https://csstriggers.com

notpachet wrote at 2021-12-04 21:10:11:

I meant low level in terms of the semantic meaning that HTML can convey.

The examples you gave of places where browsers are high-level in terms of what pixels get put where? I see those as features, not bugs. I'm actually happy that web browsers historically haven't given pages a lot of control over low-level rendering, and I am distressed by the ongoing push by large corporations to increase how much of that control they have access to.

I agree with your point about not being able to compose higher-order elements, though.

pizza234 wrote at 2021-12-04 15:05:52:

There is a middle ground between automation and full-blown programming languages.

In the past, this was represented by Visual Basic 6. For the use case presented - large organizations with considerable time spent in repetitive tasks - probably there has always been in the org a non-programmer who could hack together a VB6 program for the benefit of the colleagues. I remember even a person describing himself a programmer because he could use Excel (which can be considered close to the middle ground position in the automation<>programming languages spectrum).

There were/are lot of small tools around programmed in VB6, which are probably very low quality from an engineering perspective, but do the job.

As a matter of fact, I wonder why the VB6 spirit hasn't been carried to the present. As far as I read, Delphi's the closest, but I've never used it.

ta988 wrote at 2021-12-04 15:45:08:

I see the VB spirit coming back with projects like streamlit. Super easy to prototype simple things, logical order of processing. Really reminds me of the good old BASIC. No function definitions, no classes, no callbacks.

Narishma wrote at 2021-12-04 15:54:26:

> No function definitions, no classes, no callbacks.

But VB had all those things.

analog31 wrote at 2021-12-04 16:06:18:

There was a time when VB came in two flavors. I believe VB5 was this way: The regular version that most of us bought allowed us to _use_ classes, but not to create them. I know for myself, this enabled me to ease my way into OOP, because I could experience the benefits without any of the major pitfalls. If you wanted to _create_ classes, you had to buy or download an extra package, that I never bothered with. I was basically a procedural programmer who could use classes supplied to me by somebody else.

Turning VB into a full blown professional development language akin to C# was, in my view, a mistake. It kind of erased the reason for "the rest of us" to use Basic. I switched to Python instead.

pcr910303 wrote at 2021-12-04 15:43:38:

My theory on this is that GUI automation is way harder because most solutions _requires_ the developer to explicitly spend more time on implementing something that is not usually useful to sales (non-automatable applications are the norm, and people don't decide to buy apps based on whether it supports automation or not).

CLIs are much easier because the primary interface (standard out) to communicate with the user is basically the automation API (whether it is stable or not). If you're a command line program, (unless you're doing something super wary), you're automatable.

My personal opinion is that the best way should support automating based on GUI interfaces… although I don't have any great ideas how to support various interfaces that are e.g. modal or contextual. The clipboard is basically poor man's pipes in the GUI world so we could take some ideas on how the clipboard and the source/destination application negotiate data types, and there probably are better ideas.

pksebben wrote at 2021-12-04 21:19:18:

I've always thought that the Unix philosophy could do well here; if you start by building the rawest version of your app to operate on the command line, and then make the UI an interface to that, you can trivially automate anything the UI can do.

so in this scheme, the cli's "do one thing and do it well" is driving the program, while the UI's "one thing" is providing an interface to the cli.

the other nice thing about this is that it provides a far more debuggable build. also enforces good habits vis a vis separating abstractions.

not always possible in all contexts, though. I would have trouble implementing this for an app like Photoshop (although their command palette implementation is super close)

soared wrote at 2021-12-04 16:24:09:

I thought I would disagree, but I very much agree that we need something like zapier for custom internal processes. I’ve automated thousands of hours worth of things with zapier and gsheets.

But if there isn’t a zap for it, and you can’t hack your way around it, nontechnical users are generally out of luck.

I would love for gsheets to be able to extend further into the browser and my desktop, because there is so much value and potential there. For example, a problem I worked on yesterday: I receive an email every day that has an attachment with the .eml file type. In that attachment is a link to a csv. I need the data from the csv in excel or sheets for daily use. There are no zaps to convert a .eml to any filetype - so I had to automate the attachment downloading to gdrive on my desktop, have a local automation run to convert .eml to .txt, then back in zapier grab the url from the .txt and deliver it to gsheets. Then the gsheet can just grab the data from the csv url.

Problem is - I couldn’t figure out how to automate the .eml conversion, so I’m stuck!

mynameismon wrote at 2021-12-04 17:16:05:

I think what you are doing is an overkill tbh. Google provides a nice little wrapper around their APIs via Google Apps Script, which allows you to use JS to interact directly without much overhead. You can use a JS eml parser [1] for this convert it into a single file dependency using esbuild [2], which will allow you to easily work inside the App Script environment

[1]:

https://www.npmjs.com/package/eml-parser

[2]:

https://stackoverflow.com/a/68379707/9496502

soared wrote at 2021-12-04 21:56:22:

My coding knowledge is limited to barely editing existing code :). Everything I was doing is with no code, similar to what the article is about.

But yeah if there were a zap for .eml->.txt or to grab text from inside an email’s attachment the whole thing would take 1 step.

Often times without coding you have to take roundabout ways to accomplish something, but as long as it’s not slow and won’t break it doesn’t really matter.

mananaysiempre wrote at 2021-12-04 18:19:52:

Offtopic: the design of Google Apps Script puzzles me.

It’s done in an extremely verbose OO style (everything must be treated as an active object, even things that are by their nature dumb data), tries to compensate for that by providing shortcuts for the resulting multiple-level accessor chains, but with no rhyme or reason wrt which shortcuts exist and which don’t. There are also some things that ought by the nature of the problem exist in the implementation, but are not exposed, and the official docs literally have you reimplementing them (IIRC an official way to see which Sheets rows are selected by the currently active Sheets filter was only provided relatively recently, and you still have to redo date formatting yourself). And maybe the verbosity is no big deal for the original consumers of the API, but in the GAS environment my impression is that every method call is an RPC that takes tens of milliseconds. The bottom line for me was that fetching a couple thousand cells from Sheets as a JS array and then just processing them in JS without touching any of the Apps interfaces turned out to be a couple of orders of magnitude faster than trying to figure out exactly which cells I needed and fetching then individually, even if I only really needed a couple dozen in the end.

I guess the real question is: why would anyone _do_ it this way? I recognize that what the API is doing is in fact much more complex than it appears, because it’s a complicated distributed system (then again, when it was a VBA macro running on my desktop it didn’t need to be...), but why does it give the impression that noöne actually cares about the ergonomics? Even its original Javaish ergonomics, let alone given the fact that JavaScript is not Java.

mynameismon wrote at 2021-12-04 18:43:29:

> The bottom line for me was that fetching a couple thousand cells from Sheets as a JS array and then just processing them in JS without touching any of the Apps interfaces turned out to be a couple of orders of magnitude faster than trying to figure out exactly which cells I needed and fetching then individually, even if I only really needed a couple dozen in the end.

Weird, when I had worked with Google Apps Script, it had been significantly faster than working with the APIs by an order of magnitude or so. (although the feeback loop is still _painfully_ slow)

mananaysiempre wrote at 2021-12-04 18:47:34:

Not what I meant: _in GAS_, batch-fetching everything (and processing it in JS) was faster than individually fetching the things I needed (and doing much less processing on them). Does Google _want_ me to run _more_ JS on their machines?..

mananaysiempre wrote at 2021-12-04 17:59:51:

IIRC an eml is basically raw MIME data, so you need a MIME parser of some sort. The Python stdlib has one, for example. (Why can’t you then just push the CSV into Sheets using the Sheets API? Admittedly it’s supremely unpleasant, slow, and the client libraries are massive, but it does work.)

soared wrote at 2021-12-04 21:58:35:

Haha well like in the article, I don’t know how to code and am just using automation tools with guis.

omneity wrote at 2021-12-04 20:04:04:

Having spent many years automating enterprise web apps professionally and random websites for fun and profit, I came to realize a lot of automation is just about the data itself, or rather the movements of it (new service order etc)

This realization led me to create Monitoro[0], a no code tool to abstract websites as a reactive data structure, allowing you to create events based on specific data changes from any website/web app.

What happens after these events is flexible, from no-code alerts and integrations to your own custom code triggered with the event’s data via webhooks.

I believe narrowing down the focus to data and events only (instead of open ended automation) leads itself better to a no-code offering, and the resulting solutions are more robust (web automations are famously fragile due to state changes).

[0]:

https://www.monitoro.co

adrum wrote at 2021-12-04 14:12:24:

Shortcuts is a pretty solid option for users with Apple Devices. Its available actions range from simply changing settings to running SSH commands, all packed in a user interface that is much more approachable than AppleScript or Automator.

castillar76 wrote at 2021-12-04 15:57:40:

I’ve started digging into Alfred (alfredapp.com) automations recently and have had similar experiences. It’s pretty easy to create simple automations using their workflow interface, and I was pleased to discover they made it really simple to publish the resulting workflow using git. In a few minutes I was able to create a workflow to pull a highlighted value and push the user to a variety of different internal websites depending on a regex match.

I’ve poked at Automator and Shortcuts a bit and found them similarly useful. The only issue is that not every app exposes functions to them yet, so there are a number of workflows I’d like to create that won’t work until the app dev adds functions to support “tell this app to do this thing”. KeyboardMaestro works well for filling that gap, but it’s a little awkward to figure out the “click at this spot on the screen” bits.

mhb wrote at 2021-12-04 14:47:13:

Can I have one that turns on or off Location Services? Not the last time I looked.

adrum wrote at 2021-12-04 14:56:16:

I do not believe so. I'm longing for turning on specific VPNs and toggling auto-lock for the screen.

On-Demand VPN via WireGuard isn't quite what I'm looking for.

mynameismon wrote at 2021-12-04 17:18:00:

The Windows version to that would be Microsoft PowerAutomate, which is also rather decent

dmitriid wrote at 2021-12-04 14:39:30:

Too bad it feels like 30 seconds for a shortcut to run, including rather useless notifications. And if the camera is involved, it's unusable until the shortcut has completed its song and dance.

I do hope they make this faster and less obtrusive.

Edit: also, no idea how to run most of them. I've installed a shortcut. Now what :)

adrum wrote at 2021-12-04 15:00:26:

There's a handful of ways to trigger a shortcut to run:

- Manually via a home screen app icon

- A Share Sheet action (good for manipulating links or images, etc.)

- Automation event (time/schedule based, NFC tag scanning, or a HomeKit accessory changed)

It just depends on the context and the purpose.

dmitriid wrote at 2021-12-04 20:31:23:

> It just depends on the context and the purpose.

Just looked through the Shortcuts app, it's impossible to deduce context and purpose :D

I do hope they improve this. Even as a tech person, I looked at this and went: nope, no idea what it is, it's intended purposes, or any useful use cases.

BTW. I did finally figure out that the shortcuts my Audi app added can only be used by saying their name exactly, and they will trigger via Siri :D Searching for them doesn't work. "Details" menu item (menu itself is hidden behind long tap) is unavailable etc.

xtiansimon wrote at 2021-12-04 20:42:25:

“Why not just teach everyone to program?

I hate this ideology and I always have. Human civilization progresses in part due to specialization, that we as a society don't have to learn to do everything in order to keep things functional.”

Well, I hate your ideology :p programming is like language studies, math, sciences. It’s a fundamental way of living, interacting and, yes, functioning in the world. You don’t have to ask everyone to do it. There are plenty of people who communicate, but would struggle to write in their native language a few pages on a topic. Plenty of people say they’re “bad at math” and avoid the subject, and rely on the cash register to tell them how much change they should receive. You’re holding everyone to a ridiculous standard forming this opinion.

/rant

I love the topic. I’ve made scripts at work and trained non-programmers to use CLI tools to execute them.

We should be talking about protocols for user friendly APIs and yet uninvented data pipeline endpoints which require less technical skill.

For example, How can we let users create, try, fail with a database just as easily as they can with files on their PCs?

usrbinbash wrote at 2021-12-04 16:45:19:

Every single "lowcode/nocode" tool suffers from 2 problems that are imo unsolveable:

1) Sooner ~~or~~ rather than later, there is a use case that goes beyond what the tool can do "easily"

2) Just because tools make coding easier doesn't mean its not coding any more.

pictur wrote at 2021-12-04 18:52:38:

automation is beautiful when minimal. but we want resolve so many issues in one tool. and this is not suitable solution most of the time

unixbane wrote at 2021-12-04 16:21:59:

automation DSLs are moot. just use a general purpose language and the "non-programmer" will still be able to copy and paste crap justfine. skimming through this article, it seems whatever thing he's talking about is no different than autohotkey on windows. autohotkey sucks because it's not a real language, except it is, but just very poorly designed due to trying to avoid being a real language which it inevitably still needs to be.

same story for bash. get rid of shell scripting too

come think of it, the hardest part of programming is using whatever hare brained dependency management system any language has. automation tools probably win literally just because the user only has to use a single installer, instead of installing PL #5 and then automation lib #3 (and debugging some subdependency conflict)

nikkinana wrote at 2021-12-04 17:14:13:

I'll bet Chris and Andy Cuomo are learning to code now.