
Show HN: Kasaya – A scripting language and runtime for browser automation - hliyan
https://github.com/syscolabs/kasaya
======
BiteCode_dev
All those kind of projects share the same flaw: it uses a DSL instead of an
existing language, so tooling/documentation/testing/support/modules are going
to be very weak, and 100% depending of the creator for the first years, in a
domain that is already a niche.

It could be a library with good API instead. Or even a special env setup for
an existing language with injected built-in and automatic imports.

Now you could argue that the goal is to have this "simple DSL that looks like
english" so that normal people can use it.

This argument has existed since DSL exist, and the result is always the same:

\- the simplicity of the language will, in the end, limits its usefulness. Not
every domain is like SQL/HTML/CSS, where descriptive is enough. Complex
domains need branches, loop, namespaces, etc. Eventually you will be forced to
add features in a twisted way to something that was not designed for it and
replace its simplicity with ugliness, or restrain yourself and be limited
forever. See Ansible DSL for a good, or rather, horrible, example of that.

\- end users that can't use a normal programming language won't suddenly
become tech saavy because your DSL is simple, but their use case will never
stop at simple. So they will start building complex systems as soon as they
use it for real, with a limited DSL and their limited ability. The system will
become a monstrosity, and you'll have no tooling to help.

\- successful dsl are usually not simple. Css is not. Sql is not. Html is an
exception because it has no logic. This has logic.

\- it looks so cool. It's like candy for tech lovers. Even to me: I find it so
sexy. And so people will adopt it, ignoring the arguments above. Ignoring the
decades of such attempts that ended up in pain behind so many corporate
firewalls. There will be many tweets at adoption to talk about how nice it is,
but no blog post 2 years later to admit it was a bad idea. And then it will
happen again with $new_shiny_dsl_based_tool.

~~~
debaserab2
All these kinds of HN comments share the same flaw: they unnecessarily detract
from the work the OP has done by attacking use cases that the project never
claimed to solve.

DSL's, for all their problems, do simplify complex programming logic, and
people use them every day to do great things. DSL's like Ansible and Chef save
people an order of magnitude of time for server provisioning hence why they
are wildly popular.

It seems reasonable to me that someone would build a DSL with the same goals
for web browser automation. The screen recording example gif they have looks
so intuitive even a non-programmer could do it.

I don't look at this project and expect that it solved every problem with web
scraping and reading the README it doesn't look like they are claiming it does
either.

~~~
threatofrain
Using Ansible is what made me lose faith in DSL's in all the ways mentioned
(eventually wanted loops, conditions, variables and namespaces...). Ansible is
an API over a domain and if it were originally just represented through a
Python language library I don't see why it would be less accessible or
productive. Even something as simple as YAML can be screwed up and turned
complicated.

~~~
farisjarrah
Building infrastructure is complicated. Have you found an easier way then
Ansible to accomplish building infrastructure?

Ansible does make yaml more complicated, however, you don't need to use much
of that complexity for simple projects. Compared to Terraform, Chef, Google
Deployment Manager, and Windows Desired State Configuration, Ansible is by far
the simplest to get up and running on to do real work with.

~~~
erikpukinskis
AWS CDK would be an example of using a programming language rather than a DSL
to build infra.

~~~
ofrzeta
There's also Pulumi that covers a wider range of infrastructure than AWS only.

------
zabil
This looks great.

I also want to shamelessly plug something similar I am working on, Taiko, it
uses javascript and comes with a REPL that generates scripts like.

    
    
      await openBrowser();
      await goto("http://todomvc.com/examples/react/#/");
      await write("automate with taiko");
      await press("Enter");
      await click(checkBox(near("automate with taiko")));
    

The reason we use a javascript is familiarity, IDE support and use of existing
node js libraries for testing.

For anyone who's interested

[https://github.com/getgauge/taiko](https://github.com/getgauge/taiko)

~~~
kozhevnikov
FYI to avoid all those awaits and make it chainable you can wrap it in a
Proxy.

[https://github.com/kozhevnikov/proxymise](https://github.com/kozhevnikov/proxymise)

~~~
egeozcan
Or ppipe, if you want more features (I'm the author):

[https://github.com/egeozcan/ppipe](https://github.com/egeozcan/ppipe)

~~~
imvetri
Completely irrelevant. Something I tried for front end frameworks

[https://github.com/imvetri/ui-editor](https://github.com/imvetri/ui-editor)

------
aloisdg
We can do this with Canopy in F# (repl/interactive style too!):

    
    
        //go to url
        url "http://lefthandedgoat.github.io/canopy/testpages/"
    
        //assert that the element with an id of 'welcome' has
        //the text 'Welcome'
        "#welcome" == "Welcome"
    
        //assert that the element with an id of 'firstName' has the value 'John'
        "#firstName" == "John"
    
        //change the value of element with
        //an id of 'firstName' to 'Something Else'
        "#firstName" << "Something Else"
    
        //verify another element's value, click a button,
        //verify the element is updated
        "#button_clicked" == "button not clicked"
        click "#button"
        "#button_clicked" == "button clicked"
    

[https://lefthandedgoat.github.io/canopy/](https://lefthandedgoat.github.io/canopy/)

~~~
Dontrememberit
The idea of this library is exactly to not use HTML ids (or css paths etc),
but use instructions you could give to an human browsing the web (enter the
page, press tab, type this..)

~~~
holtalanm
the problem with that is anything that can 'detect' an html element based on
human-language will eventually fail to find something described with human-
language due to crappy html on legacy systems.

------
ou_ryperd
Test automaton guy here. Why reinvent the wheel? Selenium/WebDriver is already
a standard
([https://www.w3.org/TR/webdriver/](https://www.w3.org/TR/webdriver/)). It has
years of maturity. Maturity means that through use and development iterations
it can now cater for a lot of corner cases. How to do domain authentication in
IE. How to handle all the different types of modal dialogues. And so on. It
can be used in several different REAL programming languages so you can
interact with a database, or drop a message in a queue or call a webservive
during browser interaction. I have done all of those. But sure, if you want a
tool for a specific small use like a business analyst doing one linear test
case, go ahead. If you don't believe all the corner cases, do yourself a
favour and look under the Selenium tag on SO.

~~~
webignition
I'm currently developing a test automation DSL as part of a full automation
service.

My partner worked for some time running on-site test automation courses. This
was for organisations where the devs were average 9-5 workers without any
passion for software development. Not the sort of places that would ever
feature on HN.

Manual testers transitioning to automation testers within such organisations
are, in most cases, fully incapable of doing so effectively.

Such testers cannot learn to code. Many could barely type with much
proficiency. All were great people and great at manual testing, but coding was
generally not what they were wired for.

There is a market for something easier.

It took me some time developing a plain-English DSL to realise myself that the
majority of browser automation coding isn't coding.

You can abstract away the hard parts. What you're left with is not coding but
configuration.

Given the right automation system you don't need to write code to define your
tests, you instead need to configure the system to test as needed.

A DSL to replace current automation coding as-is is indeed an odd task.

A DSL for a minimal-grammar configuration language within an automation system
can definitely work.

Will it work for everyone? No, absolutely not. Not you and not many who read
HN. We're the outliers.

Will it work for boring dusty companies that we've never heard of and which
can't afford to employ people who read HN? Yes, definitely.

~~~
ou_ryperd
I take your point, and kudo's.

Where I'm coming from is having seen some tool vendors sell "scriptless" test
automation tools. UFT has it, and what used to be Rational Functional Tester
has it (I peddled RFT in a previous life). The vendors sold it very
successfully to non-technical managers, and it looks cool, the dusty companies
and large companies all fell for it. "Your Business Analysts can automate
tests". But a few months down the line, you realise that it is a rock muffin.
No modular code, but linear end-to-end scripts. The login page changed? Update
hundreds of test scripts. Who looks bad? Test automation as a profession.

~~~
webignition
I certainly feel your pain when it comes to non-modular linear end-to-end
scripts.

The DSL I'm working on is already quite modular so as to reduce repetition, to
hide complex-looking things like CSS selectors behind user-defined names and
to support the definition of data sets independent of the tests that use them.

A test for a given page can import test steps, adding further actions and
assertions if required and injecting one or more sets of data over which to
iterate.

Sets of data can be defined inline (to support quick learning) or defined in
separate files and imported and referenced (more ideal).

Properties of a page being tested can be defined separately to the test
itself, including aspects such as the URL and named locators expressed as
either CSS selectors or XPath expressions, referenced later as needed by the
user-defined name. This reduces to one the number places many page-specific
details need changing, as well as allowing the tests (which reference by name
pre-defined locators) to flow more naturally.

I'd greatly appreciate your feedback in a few months when we have something
workable to demonstrate. My email is in my profile if you're happy to help.

~~~
ou_ryperd
So is mine. Send me a link when you have something.

~~~
webignition
Your email doesn't appear to be present in your public profile that I can see.

------
nubela
Interesting but I'm not sure about the syntax. If I'm writing code, why do I
want to write code in a language that is overly verbose and not that precise?
Just make it code.

And if it is not meant for programmers, then make it clickable+drag-and-drop.
Having a compromise in this case, makes it not a solution for anyone.

~~~
gitgud
_> Having a compromise in this case, makes it not a solution for anyone._

Drag n drop is surprisingly slow and limiting to use, it's also hard to
implement etc.

I think this DSL is approachable for non-programmers. I envision
bosses/clients using this to write tests for parts of website to make sure it
works.

If they need more functionality, they can move to more complex browser
automation later... _if they need it_

A specific use case, but I can see demand for it...

------
subhashchy
Interesting concept, sounds promising.

Have you checked taiko ?
[https://github.com/getgauge/taiko](https://github.com/getgauge/taiko)

~~~
hliyan
Neat! When we were looking around, we didn't find this one, thanks.

Can it also do stuff like this:

    
    
      read ${sender} from row "Test email" column "Sender"
    

By the way, that works using cartesian lookup.

------
bmn__
> pronounced Kuh-SAA-yuh

That's ambiguous and not helpful – particularly in English that has many
varieties and also too many vowel phonemes to map onto letters. Use IPA
instead.

~~~
thosakwe
For anyone else wondering, IPA = International Phonetic Alphabet.

------
vincelt
Looks promising but I don't want to install the JDK (and it's not practical
for CI). Isn't it possible to do without with something like Puppeteer?

~~~
hliyan
Will look into this!

------
mml
might want to think about a different name.
[https://www.kaseya.com](https://www.kaseya.com)

------
languagehacker
Would be great if this "transpiled" into a more robust and popularly supported
formalism so that the functionality could be refined over time.

I could see implementing this script as being a requirement for a new feature
delivered by engineering, and then having test engineering use that
functionality as a foundation for more thorough qualification.

I worry that this approach by itself will fall into the same issues that
Cucumber does, which is the amount of manual definition you would need to
implement through what they're referring to as "macros". Over time those
become as brittle as the code you intend to test.

~~~
someone7x
I think Selenium IDE is still around.

Its a drag-n-drop browser plugin that does everything Kasaya can do (both
selenium under the hood) and can export to js/python/etc.

------
rubyn00bie
I don't see how this is a scripting language and runtime; so, maybe I'm
totally missing the boat and everything below is nonsense...

I think DSLs like this turn into one thing: maintenance nightmares.

I love the idea of them, I've just _never_ seen one be useful for more than a
demo. Especially, for something as complex as interacting with the browser...
Why not just use a visual tool and record the session with something like
selenium? At this point, the idea of a "DSL" for non-programmers is pretty
much a fantastic myth. I think DSLs should really only be used when they
enforce quality, not to have a nicer looking statement. Same rule applies with
macros, and lmao, if they aren't responsible for most DSLs.

I haven't even begun touch on the problems with errors, debugging, warnings,
deprecating, updates, etc. which also come with a DSL.

The SDK setup looks an awful lot like I'm just using selenium with node and
this is on-top of that entire debugging nightmare, is there more to it than
that and the DSL?

As well I'm curious, how do you reliably use the web without selectors? I see
it referenced, but I don't see "how," and that quite honestly seems like the
coolest thing on the readme.

Again, I love the idea, it looks slick. Just based on my experiences, it seems
like how nightmares, not dreams, start...

------
thosakwe
The "blah blah this could have just been a library instead of a DSL blah blah
no one will use this blah blah" conversation always comes up, but always
ignores the fact that even if it were just a library, that alone wouldn't make
people use it... It also ignores the fact that you'd get just as many "blah
blah why not just use this other existing library blah blah" comments.

------
chrisweekly
Cool. Related: the venerable WPT
([https://webpagetest.org](https://webpagetest.org)) is well-documented, and
it's straightforward to set up a private instance (eg via AWS AMI) that
supports scripting and a robust set of testing tools.

------
losvedir
This is neat. I was just looking for something like this the other day.

Does it support loops? I don't see any example like that. Basically I wanted
to load a search results page and check something about _each_ of the results
on the page.

~~~
hliyan
Our current thinking is to not provide branching mechanisms (loops,
conditionals) by design. Both to keep the language simple, but more
importantly, to force script writers to create one test per each scenario. If
you need an if statement, that's probably an indicate you need two tests.

For your use case, you'll need to write a macro and then call it the number of
times you need.

    
    
      how to check for $something in search results for $thing
        ...
      end
    
      check for "foo" in "bar"
      check for "foo2" in "bar"
      check for "foo3" in "bar"
    

That works?

------
socialdemocrat
I think people often fail to grasp why natural language works for humans. It
is because we can have a conversation back and forth and supplement with other
things like illustrations or drawings.

That I can explain task to a programmer in natural language and that he can
implement it, is only possible because he can ask questions back and gradually
build up a mental model.

These natural language solutions often lack this feedback loop mechanism.

When you don’t have feedback you are better off with a more precise and more
mathematical language.

------
spectaclepiece
Looks cool for really simple things but wouldn't use it for anything serious.
I have been using Cypress[0] the past few months and so far I've been quite
pleased with it.

[0] cypress.io/

~~~
casperc
Is it suitable to do automation? They seem to be mentioning testing only.

~~~
kevlened
It is. Testing is simply automation with assertions.

------
A4ET8a8uTh0
I like the idea. Bookmarked for when I get back to my home machine.

------
mc3
One way this can be achieved is using specflow or cucumber and then make that
drive selenium or puppeteer.

Probably a 10 minute job to set up some basic commands (like demoed here). Not
saying you can replace this project in 10 minutes of course.

Advantage is you can use the same language as used here pretty much, but you
can use some of gherkin's nice features like the tables for different test
cases (or scraping cases!).

Kasaya would have the role in this case of defining the common language.

------
naushniki
Does it use Selenium under the hood?

~~~
hliyan
Yes it does...

~~~
bdcravens
Note that some sites block Selenium, since browsers report the use of
WebDriver, and Selenium injects known predictable Javascript. Does Kasaya do
anything to mitigate this?

~~~
hliyan
Not yet, but someone suggested Chrome DevTools protocol. This is still in the
very early stages, so we're looking into these things.

~~~
maple3142
I wonder why don't use Puppeteer[1], which is a established project for
automating Chromium using Chrome DevTools protocol.

[1]:
[https://github.com/puppeteer/puppeteer](https://github.com/puppeteer/puppeteer)

~~~
zabil
Taiko was initially based on puppeteer, but it was hard to keep up with
puppeteer's api changes.

Plus, the abstraction leaked. Taiko is now built on the excellent
[https://github.com/cyrus-and/chrome-remote-
interface](https://github.com/cyrus-and/chrome-remote-interface)

~~~
hliyan
This is possibly Kasaya's plan as well.

------
Middleclass
Is Selenium less crashy nowadays?

Back in 2017 when I had a testing automation job, I wrote a test automation
system using Node, Selenium WebDriver, Cucumber and Vagrant.

It worked well once I managed to set up a Vagrant box that would take Cucumber
tests from a local directory and keep a Node, Selenium WebDriver and Cucumber
install cached, but WebDriver never really stopped crashing unpredictably.

I had to implement very coarse retry logic. Tests would take way too much time
just because each run a few of the tests would keep crashing for a minute or
more, until they finally succeeded.

I parameterized the Vagrant box so that testers could run subsets of tests by
running the "test" command with parameters, not because we had that many tests
(just about a 100) but because they were so slow.

It wasn't even that complicated of a SPA, and the backend engineers even added
nice classes and ID's to elements that were to be tested.

The binding of Cucumber to JS to WebDriver was flawless, adding new testing
functions (e.g. "do something with some particular type of list of items"), it
was just that the browser component kept crashing all. the. time.

I longed for a deterministic means of automating the browser then, preferably
by hooking right into the browser code and integrating with it, so I would
know why the thing kept crashing. That hasn't happened yet.

------
catchmeifyoucan
Dragging is a command I wish existed. I’m not super sure how to test draggable
components - any thoughts?

~~~
hliyan
Drag is implemented, but we haven't fully tested or documented it properly:
[https://github.com/syscolabs/kasaya/blob/master/kasaya.js#L8...](https://github.com/syscolabs/kasaya/blob/master/kasaya.js#L86)

------
lqs469
Inspiring ideas, Perhaps this will be useful for automated testing or
accessibility, etc.

------
socialdemocrat
Trying to explain to my mother what to click in a user interface over the
phone is next to impossible. So much for the superiority of natural language.

What you need is the ability to point and talk.

------
zmmmmm
There seems to be a lot of different efforts going on this space. While it's
great to see people trying to make this area better I'm pretty sceptical that
almost anybody would be wise to try and adopt this - you will hit the limits
of what you can do with such a limited language so fast. And the language is
only marginally more intepretable than things like Geb[1], which even supports
similar constructs to "near" etc., but is a full programming language (Groovy)
when you need it to be.

[1] [https://gebish.org/](https://gebish.org/)

------
jbob2000
Who is this for? It requires the JDK and Node.js to be installed, telling me
that this is targeted to developers. If I'm a developer, I'm OK to use a
browser automation framework that requires a bit of code (at least one that
has conditionals and loops...).

~~~
t0astbread
I guess it'd just need to be packaged up nicely.

------
tor291674
Why is Java SDK needed?

------
veysel-im
I did think today :)

------
dariusj18
I am confused by the use of "WYSIWYG". It seems to be more a control
console/REPL with Natural Language syntax.

Edit: A killer feature would be autocomplete for things found on the page.

~~~
hliyan
> Edit: A killer feature would be autocomplete for things found on the page.

 _Challenge accepted!_

~~~
TuringTest
IMHO it would also benefit from a "raw typing" mode. It's a bit silly to have
to explicitly write _' type "cat"'_ and _' press "enter"'_, when you're
already typing _' cat'_ and _' enter'_.

You could simply start this raw mode with a keyboard shortcut, and everything
you type is automatically transformed into this "type" and "press" commands,
until you exit the mode with either the same shortcut or Esc.

~~~
dariusj18
Or if the first thing you type is a quote if you are focused on an input, then
it infers the type command, otherwise it's a focus search.

