Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Kasaya – A scripting language and runtime for browser automation (github.com/syscolabs)
386 points by hliyan on Feb 20, 2020 | hide | past | favorite | 106 comments

All those kind of projects share the same flaw: it uses a DSL instead of an existing language, so tooling/documentation/testing/support/modules are going to be very weak, and 100% depending of the creator for the first years, in a domain that is already a niche.

It could be a library with good API instead. Or even a special env setup for an existing language with injected built-in and automatic imports.

Now you could argue that the goal is to have this "simple DSL that looks like english" so that normal people can use it.

This argument has existed since DSL exist, and the result is always the same:

- the simplicity of the language will, in the end, limits its usefulness. Not every domain is like SQL/HTML/CSS, where descriptive is enough. Complex domains need branches, loop, namespaces, etc. Eventually you will be forced to add features in a twisted way to something that was not designed for it and replace its simplicity with ugliness, or restrain yourself and be limited forever. See Ansible DSL for a good, or rather, horrible, example of that.

- end users that can't use a normal programming language won't suddenly become tech saavy because your DSL is simple, but their use case will never stop at simple. So they will start building complex systems as soon as they use it for real, with a limited DSL and their limited ability. The system will become a monstrosity, and you'll have no tooling to help.

- successful dsl are usually not simple. Css is not. Sql is not. Html is an exception because it has no logic. This has logic.

- it looks so cool. It's like candy for tech lovers. Even to me: I find it so sexy. And so people will adopt it, ignoring the arguments above. Ignoring the decades of such attempts that ended up in pain behind so many corporate firewalls. There will be many tweets at adoption to talk about how nice it is, but no blog post 2 years later to admit it was a bad idea. And then it will happen again with $new_shiny_dsl_based_tool.

All these kinds of HN comments share the same flaw: they unnecessarily detract from the work the OP has done by attacking use cases that the project never claimed to solve.

DSL's, for all their problems, do simplify complex programming logic, and people use them every day to do great things. DSL's like Ansible and Chef save people an order of magnitude of time for server provisioning hence why they are wildly popular.

It seems reasonable to me that someone would build a DSL with the same goals for web browser automation. The screen recording example gif they have looks so intuitive even a non-programmer could do it.

I don't look at this project and expect that it solved every problem with web scraping and reading the README it doesn't look like they are claiming it does either.

All these kinds of HN comments share the same flaw, in failing to recognize that the primary obsession of hacker news is in discussing the flaws of things posted on hacker news.

I would argue that thats the benefit.

I want people more knowledgable than me to comment and point out the flaws in products/reports/theories. Thats what I love about HN.

How critical are those flaws? It's up to you. If you take the approach of "there is no right or wrong, only opinions" when you read comments, it's much better off.

I would argue that trying to find flaws is flawed (atleast in this context). Much better to use a neutral approach where you are equally open to strength and flaws.

Using Ansible is what made me lose faith in DSL's in all the ways mentioned (eventually wanted loops, conditions, variables and namespaces...). Ansible is an API over a domain and if it were originally just represented through a Python language library I don't see why it would be less accessible or productive. Even something as simple as YAML can be screwed up and turned complicated.

Even though I know how to program, I really like Ansible and probably wouldn’t have gotten into it if it had the same functionality but used Python instead of YAML.

Building infrastructure is complicated. Have you found an easier way then Ansible to accomplish building infrastructure?

Ansible does make yaml more complicated, however, you don't need to use much of that complexity for simple projects. Compared to Terraform, Chef, Google Deployment Manager, and Windows Desired State Configuration, Ansible is by far the simplest to get up and running on to do real work with.

AWS CDK would be an example of using a programming language rather than a DSL to build infra.

There's also Pulumi that covers a wider range of infrastructure than AWS only.

Have you tried SaltStack?

It uses YAML too, but always felt much more cleanly composed, and smoothly integrates with scripting languages.

I've never truly broken into orchestration and am still salty about that so please, take my post with a grain of salt.

(Full disclosure: I have literally no professional affiliations whatsoever.)

> Even something as simple as YAML can be screwed up and turned complicated.

Your problem is that you think YAML is simple.

Some of us legitimately believe DSLs are rarely the right choice. We don’t say this to shit on people’s work, we say it to kick off a conversation about API design.

I’ve written DSLs in the past. I was a Rubyist! I have come to some perspective on those choices in the intervening years. I’m a big fan of functions that take arguments, and I am skeptical of hidden shared state.

You can be skeptical of DSL's but still recognize that they provide a time/complexity trade-off. I agree that it's probably not the right decision to write your own DSL for something but to dogmatically crusade against all DSL's feels silly to me when there are so many popular projects using them.

Ansible is popular because it is accessible to devop type roles (previously known as sysadmin or IT services) that may know some programming but would fail to produce results nearly as fast if they were simply left with python libraries. It would not be as popular as it is today if it were simply a python library.

Criticizing this project for using a DSL as it's interface over some programmer purity test BS is completely missing the forest for the trees.

> they provide a time/complexity trade-off

I think that’s too specific of a claim for such a broad category as DSLs.

I think in specific cases there is a time-complexity tradeoff you can exploit, those are the (IMO extremely rare) cases where a DSL is worth considering. But in a lot of cases it’s just added complexity. Functions are pretty powerful.

And they are also easier to reason about. Often the DSL adds “time” because you’re like “why the heck isn’t this working” and it’s because of some weird state you can’t inspect that’s deep inside your parse tree. “Guess I better start reading the source code of my DSL runtime. Oh look, someone hacked a corner care into the parser.”

My rule of thumb would be something like:

1) if you have time to make your DSL superbly executed, and 2) you are very certain it can make an entire class of problems go away, and 3) you don’t expect weird corner cases to ever show up in a future project that break the model ...then it’s worth considering.

As a professional coder though, I don’t see a lot of situations like that. In particular the “build this module superbly” is generally not an option on the table. Nor is “you can model the domain of future needs thoroughly” But I work at startups, I bet at a BigCo if you are a very senior person DSLs are on the table more often.

Your whole argument boils down to "use a DSL only when appropriate" - no argument here on my end.

Still, I don't think that makes a DSL a bad fit for this specific project, even by your own standards.

> 1) if you have time to make your DSL superbly executed, and 2) you are very certain it can make an entire class of problems go away, and 3) you don’t expect weird corner cases to ever show up in a future project that break the model ...then it’s worth considering.

#1 yes, just like any code you write, right?

#2 yep, this is the same thing as complexity/time trade off

#3 maybe? you can still have edge cases that you can't solve with your DSL but that doesn't invalidate its usage if the benefits from #2 are great enough.

And #4, which is the biggest reason and the reason to use it on a project like this: it provides a more accessible interface to your project for a broader audience of people.

I completely agree (not related to this project, but a rant in Ruby). I used to thing metaprograming and DSLs were the best part of Ruby. Now I think it is the worst part. It makes code almost impossible to inspect, has hidden shared state everywhere, and makes things like stack traces end up saying `at line 66 in method_missing`. Or if you can locate the method, it is running it in `instance_eval` making it really hard to know what the state is when it is run. Ruby is actually a very functional language and now if I write Ruby code it is as functional as I can make it.

I tried using Ansible once and could never escape the fact that anything complex like installing files or raising docker containers requires the use of external modules with actual code and templated YAML. Every time something "templated" shows up in a supposedly "declarative" language it feels like the designers had to add on features from an actual programming language to do what they wanted, because the DSL was too declarative and static to accomplish anything useful.

For example, look at this Azure deployment template:


Notice all the inline function calls. They basically modified JSON with their own domain-specific "functions" like "concat" and have an entire section dedicated to "variables" because in standard JSON this is not possible to express. Already this sounds like a scripting language, except terribly watered down and specific only to Azure services.

As a personal anecdote, I once made the dumb mistake of choosing a static config file instead of a dynamic one when I was developing a mod system for a game. The problem became that there were things like function callbacks that I wanted to add in, so that meant I ended up writing inline "pointers" to a place where a Lua callback existed as an identifier, instead of inlining it as Lua code, then it went through an unnecessary transpilation step to Lua data structures. My mistake was assuming that the DSL was expressive enough for my needs, which included loops, inline function callbacks and code generation, when clearly it was not.

Sometimes the effort people expend to make declarative configs work out are significant. The OpenAge project (an engine rewrite of Age of Empires II) uses a declarative language called nyan[1] which describes the changes of things like unit health or cost. It's entirely custom-made for the engine.

The moment that I realized that declarative configs were not for me was when I realized that this declarative language, which the author wrote an entire PhD thesis over and was specifically designed for the purpose of game modding, would still not be expressive for the things I was envisioning. In the end Lua, a general-purpose language that had existed for decades, was the better choice. There was no need to muck around with writing new languages or munging the declarative data into the shape I wanted - the data could just be output from a script.

On the other hand I understand if having a Turing-complete language for configuration is a bad thing because of security or too much expressiveness that hinders static analysis because of unseen edge cases. It depends on if the "scripting" features are hacks intended to get around the fundamental limitations of DSLs or a deliberately constructed featureset.

Also the reason I became so attached to declarative configs in the first place was the prospect of writing an editor frontend to interactively create new data entries. That's probably the reason Azure went with JSON instead of a programming language - they have a template editor in the web portal for filling in the parameterized variables the config declares. This is probably an ease-of-use tradeoff so people don't need to program to deploy things. As someone who uses scripts as configs, how to properly write an editor for them escapes me. I was thinking of having to parse the scripts into an AST and use heuristics to discover where data is inserted, but of course you can program anything so this won't always work.

Still, in my opinion I would choose something like Lua any day over the nth domain-specific proprietary reimplementation of 2% of JavaScript on top of JSON.

[1] https://github.com/SFTtech/nyan

The whole point of this is the scripting language, there are already lots of other browser automation tools out there.

A scripting language is not a goal in itself (although in the case of a DSL, it often is unfortunately because of the coolness factor), solving a problem is.

Here the problems it tries to solve are:

1 - to have a nicer API to do automation

2 - for non tech saavy users to be able to create tests

My take on this is that DSL are a wrong answer to this.

1 can be solved with a better lib, with a better API, or a dedicated env.

2 is probably not solvable: automatic testing it hard, you need to be tech saavy to do it.

But let's say you still want to solve 2, a better approach would be to solve 1, then solve 2 by creating a GUI that let users record their actions and generate code from 1.

Such API and GUI exist, as you said, but they could be improved by tenfold.

It's boring work though, and working on a DSL is much more cool.

> A scripting language is not a goal in itself

Agreed, I think this is a case where people forget “what is the problem I’m trying to solve” and instead get lost in a means not an ends.

Depends if the goal was to make a scripting language and at the same time tackle a problem (or not). Some people like building stuff for the hell of it. A language by itself is an interesting non-trivial project.

I actually kind of really like this DSL, I think it is going to be easier for my QA team to use this, than asking them to use javascript.

Seriously, how is this DSL better? With JavaScript you get method completion and debugging. Are parentheses really so terrible?

You can setup a DSL in scala.js and you can get the best of both worlds (though, parentheses are fine for me so I would just use one of the many fine existing JS libraries).

> With JavaScript you get method completion and debugging.

a) now your QA team needs to know JavaScript

b) method completion is crap when you don't know the language.

c) "You can setup a DSL in scala.js and you can get the best of both worlds" OMG You want to add Scala to the mix too? AND write your own DSL from scratch, and take on its maintenance to support your QA team's needs instead of leveraging an existing one?

your solution:

1. teach the QA team Javascript interaction with custom scala created dsl

2. have the dev team spend time creating a dsl

3. have the dev team devote some of its resources to maintaining the new DSL

their solution:

1. use english-like DSL much easier to learn than JS

2. that is already created

3. that someone else maintains

both solutions are going to have limitations in that what you want the DSL to do will be more than the DSL can _currently_ do.

The other choice is that your QA teams knows this DSL tho. Which is the point: eventually most DSLs converge on the complexity of programming languages and you’re left asking: why not just use a well known programming language in the first place?

We’ve beat the Ansible example to death but it’s a really good one. It started as something simple and then with every edge case came the need for increasingly complex logic until it resembled a programming language just with super funky syntax.

a-c) I didn't mean that QA or whoever learn Javascript or scala.js. (though I'm not as scared by the idea as others)

The library authors can use those the create a better DSL (or as I call it: "just a good API"). the cognitive load on the users is the same, only it's easier to setup up proper tooling

in javascript you can make the exact sample api he defined only with `await` and some parentheses

in scala you can setup API that looks like: `browser show page`, users don't really learn scala just write the same commands they otherwise would (I'm not saying you should do this. You could, if you really hate parentheses for some reason, but there is a way about it without inventing your own ecosystem)

1) if you are on a qa team writing automation, there is a high chance you already know javascript.

source: was qa automation engineer for a few years. I realize it is purely anecdotal, but nearly all of the QA people I worked with knew javascript, even the manual testers.

My experience here is different. The majority of manual QA I've worked with do not have experience with automation, nor do they know JavaScript.

I know right. It looks so nice. I want to pet it.

But it's a trap.

These are actually valid points for any DSL creation. I have been working on DSL for a few years. More precisely I'm working on a DSL tool that allows people to create "simple DSL that looks like english" so that normal people can use it. Here are my thoughts on those arguments.

Indeed complexity is there for most real domains. But DSL is meant to be another abstraction layer on top of the complexity. My lesson is constraint is the most important as this lays the foundation of its usefulness. I don't mean you should not have things like branch or loop. You certainly could add them, just in a domain friendly way that doesn't hurt the readability and writability of your DSL.

In order to tackle arbitrary complexity, the DSL itself should be designed with extensibility in mind so that technical people could easily add language constructs or vocabularies by using script or general purpose language. Further, those features should be able to be turned off for some scenarios. For those inherently complex stuff, they should be left to general purpose language anyway.

Tooling support is another big problem. In my opinion, DSL is never just the tiny language, it is about the language and tooling. However the tooling requirement is often not as standard as general purpose language. Domain users normally don't need an IDE(an editor should be generally required) or debugger. Rather, they want something that can be seamlessly integrated with the host application.

And more over, different people (even in the same domain) may have different requirements on DSL maintenance, eg, governance, versioning. This often leads to a case by case customization. I do not have a good answer for this other than the general advice: treat DSL and tooling as a whole and make the API as easy as possible for integration.

Edit: For anyone interested in my DSL(still work in progress), here is a glimpse:


Absolutely the current version seems very limiting, but as they start adding more and more features towards Turing completeness, it will become just as complex as a normal Turing complete programming language. Which in many ways undermines the original promise.

This looks great.

I also want to shamelessly plug something similar I am working on, Taiko, it uses javascript and comes with a REPL that generates scripts like.

  await openBrowser();
  await goto("http://todomvc.com/examples/react/#/");
  await write("automate with taiko");
  await press("Enter");
  await click(checkBox(near("automate with taiko")));
The reason we use a javascript is familiarity, IDE support and use of existing node js libraries for testing.

For anyone who's interested


FYI to avoid all those awaits and make it chainable you can wrap it in a Proxy.


Or ppipe, if you want more features (I'm the author):


Completely irrelevant. Something I tried for front end frameworks


I was very impressed by this! Great minds think alike. Would love more feedback if you have any.

For sure!

Here are a few things we learned after getting feedback from users.

  * Minimise or better remove all pre-requisites npm install should bring in everything.
  * Talk directly to the browser using the Chrome DevTools protocol it's not that hard and is now supported by firefox. 
  * Play well with others in a language ecosystem for example if javascript, test runners like gauge, jest, mocha and assertion libraries like chai etc do not build these capabilities.
  * Run well in headless environments.
All the best!

Much obliged! Thanks for taking the time to respond.

Helium author here. Thank you for mentioning us as an inspiration!

Big fans. Thanks for creating Helium!

Your sample code looks like a puppeteer script.

Close, but Taiko has a flat API. For reference here's a script in puppeteer

  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({path: 'example.png'});

  await browser.close();
and the same thing in Taiko

  await openBrowser()
  await goto('https://example.com');
  await screenshot({path: 'example.png'});
  await closeBrowser();

so I can't work on multiple pages in parallel?

You can open tabs, incognito windows and switch back and forth

Is the tab switching synchronous? If not then I couldn't do it in parallel without inadvertently hitting the wrong tab.

If it is then its no better than using a page handle (as a handle would be required)

It seems that way.

We can do this with Canopy in F# (repl/interactive style too!):

    //go to url
    url "http://lefthandedgoat.github.io/canopy/testpages/"

    //assert that the element with an id of 'welcome' has
    //the text 'Welcome'
    "#welcome" == "Welcome"

    //assert that the element with an id of 'firstName' has the value 'John'
    "#firstName" == "John"

    //change the value of element with
    //an id of 'firstName' to 'Something Else'
    "#firstName" << "Something Else"

    //verify another element's value, click a button,
    //verify the element is updated
    "#button_clicked" == "button not clicked"
    click "#button"
    "#button_clicked" == "button clicked"

The idea of this library is exactly to not use HTML ids (or css paths etc), but use instructions you could give to an human browsing the web (enter the page, press tab, type this..)

the problem with that is anything that can 'detect' an html element based on human-language will eventually fail to find something described with human-language due to crappy html on legacy systems.

Test automaton guy here. Why reinvent the wheel? Selenium/WebDriver is already a standard (https://www.w3.org/TR/webdriver/). It has years of maturity. Maturity means that through use and development iterations it can now cater for a lot of corner cases. How to do domain authentication in IE. How to handle all the different types of modal dialogues. And so on. It can be used in several different REAL programming languages so you can interact with a database, or drop a message in a queue or call a webservive during browser interaction. I have done all of those. But sure, if you want a tool for a specific small use like a business analyst doing one linear test case, go ahead. If you don't believe all the corner cases, do yourself a favour and look under the Selenium tag on SO.

I'm currently developing a test automation DSL as part of a full automation service.

My partner worked for some time running on-site test automation courses. This was for organisations where the devs were average 9-5 workers without any passion for software development. Not the sort of places that would ever feature on HN.

Manual testers transitioning to automation testers within such organisations are, in most cases, fully incapable of doing so effectively.

Such testers cannot learn to code. Many could barely type with much proficiency. All were great people and great at manual testing, but coding was generally not what they were wired for.

There is a market for something easier.

It took me some time developing a plain-English DSL to realise myself that the majority of browser automation coding isn't coding.

You can abstract away the hard parts. What you're left with is not coding but configuration.

Given the right automation system you don't need to write code to define your tests, you instead need to configure the system to test as needed.

A DSL to replace current automation coding as-is is indeed an odd task.

A DSL for a minimal-grammar configuration language within an automation system can definitely work.

Will it work for everyone? No, absolutely not. Not you and not many who read HN. We're the outliers.

Will it work for boring dusty companies that we've never heard of and which can't afford to employ people who read HN? Yes, definitely.

I take your point, and kudo's.

Where I'm coming from is having seen some tool vendors sell "scriptless" test automation tools. UFT has it, and what used to be Rational Functional Tester has it (I peddled RFT in a previous life). The vendors sold it very successfully to non-technical managers, and it looks cool, the dusty companies and large companies all fell for it. "Your Business Analysts can automate tests". But a few months down the line, you realise that it is a rock muffin. No modular code, but linear end-to-end scripts. The login page changed? Update hundreds of test scripts. Who looks bad? Test automation as a profession.

I certainly feel your pain when it comes to non-modular linear end-to-end scripts.

The DSL I'm working on is already quite modular so as to reduce repetition, to hide complex-looking things like CSS selectors behind user-defined names and to support the definition of data sets independent of the tests that use them.

A test for a given page can import test steps, adding further actions and assertions if required and injecting one or more sets of data over which to iterate.

Sets of data can be defined inline (to support quick learning) or defined in separate files and imported and referenced (more ideal).

Properties of a page being tested can be defined separately to the test itself, including aspects such as the URL and named locators expressed as either CSS selectors or XPath expressions, referenced later as needed by the user-defined name. This reduces to one the number places many page-specific details need changing, as well as allowing the tests (which reference by name pre-defined locators) to flow more naturally.

I'd greatly appreciate your feedback in a few months when we have something workable to demonstrate. My email is in my profile if you're happy to help.

So is mine. Send me a link when you have something.

Your email doesn't appear to be present in your public profile that I can see.

Interesting but I'm not sure about the syntax. If I'm writing code, why do I want to write code in a language that is overly verbose and not that precise? Just make it code.

And if it is not meant for programmers, then make it clickable+drag-and-drop. Having a compromise in this case, makes it not a solution for anyone.

> Having a compromise in this case, makes it not a solution for anyone.

Drag n drop is surprisingly slow and limiting to use, it's also hard to implement etc.

I think this DSL is approachable for non-programmers. I envision bosses/clients using this to write tests for parts of website to make sure it works.

If they need more functionality, they can move to more complex browser automation later... if they need it

A specific use case, but I can see demand for it...

I’m getting AppleScript flashbacks…

Interesting concept, sounds promising.

Have you checked taiko ? https://github.com/getgauge/taiko

Neat! When we were looking around, we didn't find this one, thanks.

Can it also do stuff like this:

  read ${sender} from row "Test email" column "Sender"
By the way, that works using cartesian lookup.

> pronounced Kuh-SAA-yuh

That's ambiguous and not helpful – particularly in English that has many varieties and also too many vowel phonemes to map onto letters. Use IPA instead.

For anyone else wondering, IPA = International Phonetic Alphabet.

I agree. Kuh-sha-ya in Sanskrit means a decoction made from any herb(s). And despite knowing that, I was a bit befuddled about how to pronounce the word, until I saw the logo.

Looks promising but I don't want to install the JDK (and it's not practical for CI). Isn't it possible to do without with something like Puppeteer?

Will look into this!

might want to think about a different name. https://www.kaseya.com

Would be great if this "transpiled" into a more robust and popularly supported formalism so that the functionality could be refined over time.

I could see implementing this script as being a requirement for a new feature delivered by engineering, and then having test engineering use that functionality as a foundation for more thorough qualification.

I worry that this approach by itself will fall into the same issues that Cucumber does, which is the amount of manual definition you would need to implement through what they're referring to as "macros". Over time those become as brittle as the code you intend to test.

I think Selenium IDE is still around.

Its a drag-n-drop browser plugin that does everything Kasaya can do (both selenium under the hood) and can export to js/python/etc.

> Would be great if this "transpiled" into a more robust and popularly supported formalism so that the functionality could be refined over time.

Great idea. Would love to see something like that. For now I'll have to continue to roll my own generators/templates.

Why do they become brittle? Because they have to be fixed every time code changes?

I don't see how this is a scripting language and runtime; so, maybe I'm totally missing the boat and everything below is nonsense...

I think DSLs like this turn into one thing: maintenance nightmares.

I love the idea of them, I've just _never_ seen one be useful for more than a demo. Especially, for something as complex as interacting with the browser... Why not just use a visual tool and record the session with something like selenium? At this point, the idea of a "DSL" for non-programmers is pretty much a fantastic myth. I think DSLs should really only be used when they enforce quality, not to have a nicer looking statement. Same rule applies with macros, and lmao, if they aren't responsible for most DSLs.

I haven't even begun touch on the problems with errors, debugging, warnings, deprecating, updates, etc. which also come with a DSL.

The SDK setup looks an awful lot like I'm just using selenium with node and this is on-top of that entire debugging nightmare, is there more to it than that and the DSL?

As well I'm curious, how do you reliably use the web without selectors? I see it referenced, but I don't see "how," and that quite honestly seems like the coolest thing on the readme.

Again, I love the idea, it looks slick. Just based on my experiences, it seems like how nightmares, not dreams, start...

The "blah blah this could have just been a library instead of a DSL blah blah no one will use this blah blah" conversation always comes up, but always ignores the fact that even if it were just a library, that alone wouldn't make people use it... It also ignores the fact that you'd get just as many "blah blah why not just use this other existing library blah blah" comments.

Cool. Related: the venerable WPT (https://webpagetest.org) is well-documented, and it's straightforward to set up a private instance (eg via AWS AMI) that supports scripting and a robust set of testing tools.

This is neat. I was just looking for something like this the other day.

Does it support loops? I don't see any example like that. Basically I wanted to load a search results page and check something about each of the results on the page.

Our current thinking is to not provide branching mechanisms (loops, conditionals) by design. Both to keep the language simple, but more importantly, to force script writers to create one test per each scenario. If you need an if statement, that's probably an indicate you need two tests.

For your use case, you'll need to write a macro and then call it the number of times you need.

  how to check for $something in search results for $thing

  check for "foo" in "bar"
  check for "foo2" in "bar"
  check for "foo3" in "bar"
That works?

I think people often fail to grasp why natural language works for humans. It is because we can have a conversation back and forth and supplement with other things like illustrations or drawings.

That I can explain task to a programmer in natural language and that he can implement it, is only possible because he can ask questions back and gradually build up a mental model.

These natural language solutions often lack this feedback loop mechanism.

When you don’t have feedback you are better off with a more precise and more mathematical language.

Looks cool for really simple things but wouldn't use it for anything serious. I have been using Cypress[0] the past few months and so far I've been quite pleased with it.

[0] cypress.io/

Is it suitable to do automation? They seem to be mentioning testing only.

It is. Testing is simply automation with assertions.

I like the idea. Bookmarked for when I get back to my home machine.

One way this can be achieved is using specflow or cucumber and then make that drive selenium or puppeteer.

Probably a 10 minute job to set up some basic commands (like demoed here). Not saying you can replace this project in 10 minutes of course.

Advantage is you can use the same language as used here pretty much, but you can use some of gherkin's nice features like the tables for different test cases (or scraping cases!).

Kasaya would have the role in this case of defining the common language.

Does it use Selenium under the hood?

Yes it does...

Note that some sites block Selenium, since browsers report the use of WebDriver, and Selenium injects known predictable Javascript. Does Kasaya do anything to mitigate this?

Not yet, but someone suggested Chrome DevTools protocol. This is still in the very early stages, so we're looking into these things.

I wonder why don't use Puppeteer[1], which is a established project for automating Chromium using Chrome DevTools protocol.

[1]: https://github.com/puppeteer/puppeteer

Taiko was initially based on puppeteer, but it was hard to keep up with puppeteer's api changes.

Plus, the abstraction leaked. Taiko is now built on the excellent https://github.com/cyrus-and/chrome-remote-interface

This is possibly Kasaya's plan as well.

Is Selenium less crashy nowadays?

Back in 2017 when I had a testing automation job, I wrote a test automation system using Node, Selenium WebDriver, Cucumber and Vagrant.

It worked well once I managed to set up a Vagrant box that would take Cucumber tests from a local directory and keep a Node, Selenium WebDriver and Cucumber install cached, but WebDriver never really stopped crashing unpredictably.

I had to implement very coarse retry logic. Tests would take way too much time just because each run a few of the tests would keep crashing for a minute or more, until they finally succeeded.

I parameterized the Vagrant box so that testers could run subsets of tests by running the "test" command with parameters, not because we had that many tests (just about a 100) but because they were so slow.

It wasn't even that complicated of a SPA, and the backend engineers even added nice classes and ID's to elements that were to be tested.

The binding of Cucumber to JS to WebDriver was flawless, adding new testing functions (e.g. "do something with some particular type of list of items"), it was just that the browser component kept crashing all. the. time.

I longed for a deterministic means of automating the browser then, preferably by hooking right into the browser code and integrating with it, so I would know why the thing kept crashing. That hasn't happened yet.

Dragging is a command I wish existed. I’m not super sure how to test draggable components - any thoughts?

Drag is implemented, but we haven't fully tested or documented it properly: https://github.com/syscolabs/kasaya/blob/master/kasaya.js#L8...

Inspiring ideas, Perhaps this will be useful for automated testing or accessibility, etc.

Trying to explain to my mother what to click in a user interface over the phone is next to impossible. So much for the superiority of natural language.

What you need is the ability to point and talk.

There seems to be a lot of different efforts going on this space. While it's great to see people trying to make this area better I'm pretty sceptical that almost anybody would be wise to try and adopt this - you will hit the limits of what you can do with such a limited language so fast. And the language is only marginally more intepretable than things like Geb[1], which even supports similar constructs to "near" etc., but is a full programming language (Groovy) when you need it to be.

[1] https://gebish.org/

Who is this for? It requires the JDK and Node.js to be installed, telling me that this is targeted to developers. If I'm a developer, I'm OK to use a browser automation framework that requires a bit of code (at least one that has conditionals and loops...).

I guess it'd just need to be packaged up nicely.

Why is Java SDK needed?

I did think today :)

I am confused by the use of "WYSIWYG". It seems to be more a control console/REPL with Natural Language syntax.

Edit: A killer feature would be autocomplete for things found on the page.

You're right -- we just put it in double quotes to convey the meaning. It's not entirely accurate. The idea is that you write the script based on what you see, and now what in the HTML. I guess we could do better with the description.

> Edit: A killer feature would be autocomplete for things found on the page.

Challenge accepted!

IMHO it would also benefit from a "raw typing" mode. It's a bit silly to have to explicitly write 'type "cat"' and 'press "enter"', when you're already typing 'cat' and 'enter'.

You could simply start this raw mode with a keyboard shortcut, and everything you type is automatically transformed into this "type" and "press" commands, until you exit the mode with either the same shortcut or Esc.

Or if the first thing you type is a quote if you are focused on an input, then it infers the type command, otherwise it's a focus search.

I was also confused by the term “English like”.

I think that better terms would be “simple” and “readable”.

Will take this into consideration when going out of beta...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact