
Wildcard: Spreadsheet-Driven Customization of Web Applications - akakievich
https://www.geoffreylitt.com/wildcard/salon2020/
======
lubujackson
Wow, I absolutely love this concept.

It is messy and overly ambitious, but promises something like a return to the
"view source" mindset of the old web - where data was in plain sight and
anyone curious and a little tenacious could reshape the web for their own
needs.

I have gone partway down this path for a related concept, and browser
extensions are really the only way to go. The biggest risk and hassle is a
reliance on brittle, site-specific logic to make things work well. I haven't
dug into this project yet to see how automated any of this is or might become,
but if there is an element of community sourcing (like a ruleset for scraping
AirBnB effectively) it opens up a potential attack vector like any
GreaseMonkey-tyoe script, especially if passed routinely to less technical
users. Not a huge issue on day 1 but not an easily solvable issue.

~~~
gklitt
Thanks! "View source mindset" is a nice word for what we're trying to promote
with this project.

Brittle site-specific logic is indeed a challenge. So far we've started with
the simplest thing possible of programmers manually writing scraping code, so
we can focus on how the system works once you have the data available. That
has been enough to test the system out and build lots of useful modifications
ourselves.

I think eventually some degree of automation will be an important way to help
end users use this tool with any website. The "wrapper induction" problem has
been well studied and there are lots of working solutions for end-user web
scraping, so I expect to be able to integrate some of that work.

We're also interested in a community of shared scrapers, but as you point out
there are security considerations. I'm considering trying central code review
from the project to approve new site adapters and make sure they aren't doing
anything obviously malicious. Another solution could be carefully restricting
the expressivity of our scraping system (eg only specify CSS selectors, no
arbitrary code) but I doubt that would be sufficient for all cases. Would
appreciate any suggestions here.

------
gklitt
Author here, happy to answer any questions!

To bring it full circle, here are a couple recent demo videos I made of using
Wildcard to customize the HN front page:

[https://twitter.com/geoffreylitt/status/1229251217118892032](https://twitter.com/geoffreylitt/status/1229251217118892032)

~~~
carapace
This is really cool, congrats. :-)

Are you aware of Ted Nelson's ZigZag data model?
[https://en.wikipedia.org/wiki/ZigZag_(software)](https://en.wikipedia.org/wiki/ZigZag_\(software\))
It seems like that might work well with Wildcard.

------
fxtentacle
I don't think this can work.

It seems to rely on a willingness of the company owning the data to disclose
their full data set up you. Currently, with things like GraphQL, we are moving
in the opposite direction in that the server only sends you those columns that
are absolutely required to fill the fields in your GUI.

Since they used it as the example, I don't see any incentive for AirBnb to let
random people on the internet download their full internal data tables. Quite
to the contrary, AirBnb will block you from accessing their servers if they
believe that you are scraping.

So this is a new way for users to toy around with the limited incomplete data
set that the website operator was willing to give them. But it won't empower
users. What if AirBnb implements server-side pagination, so that your client
doesn't even receive the data for the cheapest apartment, because it's on a
different page?

Tools like this would be perfect in theory to enhance social networks like
LinkedIn with an export and batch processing capabilities. But the company
claiming ownership of your contacts will surely prevent you from actually
getting a useful export.

Plus there's cases where the data is on a server because it's impractically
large. For example, try this to improve your Google search results.
Downloading a 100mio row spreadsheet as the first step?

~~~
gklitt
You're absolutely right that limited data access and pagination exclude
certain types of modifications.

So far, we've decided to defer thinking about that limitation, and first focus
on other questions like getting the spreadsheet interactions right. We're
making new site adapters every week and finding that we can build lots of
useful modifications for ourselves which work even with only one page of a
paginated list. For one example, see my demo of modifying HN front page [1],
which I find useful even though it only loads the current front page articles.

At some point, we're considering adding more features around fetching
subsequent pages of a table (as explored in Sifter [2], which sorts an entire
list of search results across pagination boundaries) or scraping each detail
page from a table (as explored in Helena [3]).

[1]:
[https://twitter.com/geoffreylitt/status/1229251217118892032](https://twitter.com/geoffreylitt/status/1229251217118892032)

[2]:
[https://doi.org/10.1145/1166253.1166274](https://doi.org/10.1145/1166253.1166274)

[3]: [http://helena-lang.org/](http://helena-lang.org/)

~~~
irq-1
Tell the websites what is being done with the data/spreadsheet. If hacker news
is being filtered to exclude domains, or people are searching for all things
LISP, the admins could use that information to change the website. Try making
a sharing website (like greasemonkey scripts) -- users post scripts and
discuss what their trying to do, and website admins can comment and post
changes or scripts, etc...

------
angleofrepose
Wow, this is the real deal. First time I've heard of this work and I've got
some more digging to do. Has all the right language and some great references
with pretty awesome related work on digital tools should anyone want to keep
digging.

Low floor, high ceiling is the best case for that framework, and should be
every toolmakers ideal.

The Airbnb story reads like a sign of the times. Platforms can do as they
like, users just have to conform. PCs and the internet promised the kind of
programmatic control described here(I wonder if there is a better term than
"programmatic" control?), end users should be able to come up with arbitrary
representations of the data they query on the fly and realize them as quickly
as possible.

Web UIs are stupidly underpowered, table based queries for flights as
presented here seem much more usable. Michel Beaudouin-Lafon has a few great
ideas to explore here, "One is Not Enough" which he described in a different
context but I think can apply to the desire for composability between multiple
tools here (Airbnb + walkability) and "software is not soft" describing the
boundaries placed on software users. I have many tools for manipulating
strings or sorting numbers, why can't I use them on the Airbnb table listings,
served up on my computer?

Thanks for sharing.

------
warpech
This is the most inspiring implementation of live web scraping that I have
seen. However, I think it will only work well on semantic HTML. I don't know
about AirBnb, used in the paper, but I can say many good words about GitHub.
GitHub is an awesome example of a customizable web app thanks to solid,
semantic HTML structure. You can see hundreds of web extensions and
Tampermonkey user scripts for GitHub that work consistently. I wrote a few of
my own.

As a co-founder of Handsontable, I am proud to see it used in this paper.
Handsontable is a commercial spreadsheet component, however it is free for
non-commercial purposes such as education, research, study, and personal use,
testing and demonstration:
[https://github.com/handsontable/handsontable/blob/master/han...](https://github.com/handsontable/handsontable/blob/master/handsontable-
non-commercial-license.pdf)

~~~
gklitt
Thanks for building Handsontable! It's been essential for quickly prototyping
this project and I'm a fan of the API design.

re: scraping, it's true that semantic HTML makes things easier, but we've also
been building site adapters for a variety of modern sites that use frontend
frameworks, "utility CSS", etc. Most promising solution so far is something
I'm calling "AJAX scraping" \-- observe the JSON request made by the client
and just directly extract structured data from there.

------
elamje
I know this is front page, but I’m surprised more people aren’t chiming in.
Maybe the web used to be this way where you can easily manipulate views to
your liking, but as a 20 something I’ve never even envisioned end users
crafting their own views of pages. It honestly makes a lot of sense to
represent pages how you see fit as a user, and no, inspecting each page and
changing the source isn’t practical at all in Web 2.0 div spaghetti. It seems
pretty practical to have a spreadsheet formatted UI for your most popular
sites.

~~~
emj
I've found this dream is not about age people just think about this
differently, and some are jaded saying we will never get there.

As a 50 something it has been one of my ultimatedream, but it has proven to be
hard all trough my very short history with computers. Letting the user modify
their view in a GUI is always a hard task to solve.

The curl trick worked for so long[1], it's nice to see that you can get a
better experience with wildcard with the div/js spaghetti today.

[1] curl
[https://example.com/apartment/[0-1000]](https://example.com/apartment/\[0-1000\])
-o \\#1.html

------
willberman
I've been following this lab's work for a while and actually suggested to them
that the implementation for this be based on an RDF style data model. Ontology
languages are the level of abstraction up from a spreadsheet and are an atomic
unit in semantic web technologies. It looks like the way this fits in to the
existing architecture is that the site adapters would extract data as RDF
triples.

Professor Daniel Jackson runs this lab. His book, Design by Concept, is a
phenomenal read. It made me understand why software can be so unintuitive for
people who haven't grown accustomed to its idiosyncrasies that I've come to
internalize.

~~~
jcelerier
Please don't bring RDF out of its coffin. It has been tried and failed because
it's overly complex and verbose. It's terrible, and technologies which still
use it are terrible to interact with to this day.

~~~
willberman
Interestingly enough I think that the core of rdf (the subject predicate
object triple) is a quite elegant abstraction for knowledge graph
representation.

I do agree that the layered system of different ontology languages as present
in current semantic web standards is not beginner friendly, but it doesn’t
mean they can’t be improved on.

I think you might be throwing the baby out with the bath water.

------
juped
This looks really excellent, and is the future (meaning, these sorts of worse-
is-better tools scraping loosely structured messes into very simple standard
structures are the future).

Something that's conceptually related but pretty different is Workbench from
the Columbia School of Journalism (although glancing at their page they may be
some kind of dumb startup now).

[https://workbenchdata.com/](https://workbenchdata.com/)

------
mmckelvy
I've said it a few times here on HN that I think the best UX for many web apps
(particularly business apps) would be a spreadsheet connected to an API (or
better yet, multiple APIs).

Of course most web apps don't expose an API, so here we are.

------
darkhorse13
This absolutely blew my mind. However, aren't these sorts of custom views
limited to things which work well as rows and columns?

~~~
gklitt
Yes, you're right! In practice, though, we're finding that many useful
customizations can fit into that framework. For example, the Expedia demo in
the paper shows a "1 row table" to represent an input form. It's worth
thinking about how many different things people use spreadsheets for...

I think another useful analogy for thinking about abstract data
representations is text streams in UNIX. It turns out many types of data can
be represented as newline-delimited text, which enables you to use a suite of
generic tools with that data. Inevitably, some data doesn't fit into that
format, but it's perhaps surprising how much does.

------
brianberns
This seems to assume that all the relevant data is present in memory on the
client, but this is often not true (e.g. due to paging issues).

What happens if you try to sort listings by price and the actual item with the
lowest price hasn't even been fetched by the browser yet?

~~~
willberman
This is a very relevant engineering critique, but note that this is a research
project. The first step is to ask, "what if it worked this way?" After a
prototype has been developed to more accurately identify what is the actual
problem that is being solved, then problems like this can be addressed.

I had many interesting conversations in my undergraduate research lab trying
to find the right place to draw the line between engineering and research.
Problems can be more apt classified as engineering when there is high
consensus on what the actual problem is. Research often addresses what
question should we be asking to determine the problem that may then be solved.
Most often there is a series of research and engineering iterations
intertwined with each other.

------
gitgud
I was under the impression that it would be another 'Google Sheet' used to
configure a web application...

This however is refreshingly different, and demonstrates the potential power
that users can have over the data displayed in browsers...

~~~
gitgud
It reminds me of useful extensions like [1] Honey (auto coupon code finder),
which are generalized enough to automatically detect coupon code _input
fields_ in eCommerce checkouts that it's never seen before.

 _" Wildcard"_ however either needs; AI to detect and classify unknown HTML as
rows in a table. OR tonnes and tonnes of integration code (glue code) for all
the popular websites used... which seems to be the plan

[1]
[https://chrome.google.com/webstore/detail/honey/bmnlcjabgnpn...](https://chrome.google.com/webstore/detail/honey/bmnlcjabgnpnenekpadlanbbkooimhnj/related)

------
benmarks
My favorite example from the (ecommerce) domain in which I work is
[https://www.cobby.io/](https://www.cobby.io/) \- I know the team behind it,
and while it perfectly solves the problem of product data for shops of a
certain size, the live-editable cell idea always sparks conversation about the
broader applications. Years later and we still see the genius of spreadsheets.

------
geocar
I've done something similar with Google Sheets and a sprinkling of JS
automation. This works well because Google Sheets is pretty good and I can
embed Google Sheets in an iframe. A server-to-server POST message sends the
(relevant) cells to my running application using a secret key (it's like 5
lines of JS).

Why should I invest in Wildcard's API instead of Google's? Am I missing
something?

------
tarcon
I don't get it. We have custom web frontends because we feel our problems can
be solved more efficiently in UIs different from a spreadsheet, don't we?

Granted, not every custom UI is better than its spreadsheet version would be.
But thats a different problem.

Otherwise, there are a lot of react datagrid and spreadsheet components to use
if you feel that would be the best UI solution for your app.

Am I missing something?

~~~
qubex
It’s not really true that we have custom front-ends because they are best for
the user. Consider the example of AirBnB in the article and the fact that in
2012 they stopped allowing ranking by price: one has to assume that if a
feature has been absent for eight years there’s no intention to reimplement it
(presumably because it behooves the company to have it absent). I’m guessing
that AirBnB knows/believes that the absence of this feature leads users to
choose slightly more expensive properties and this generates more income. The
spreadsheet intercept allows the user to regain control.

------
evrydayhustling
My company, frame.ai, uses a lightweight version of this pattern and have
gotten a lot of value!

One of our products helps teams ensure response times on shared Slack
channels. On some teams, the duty schedule of who should respond in these
channels evolves in complicated ways - for example, complex business hours and
holidays, account managers with backup reps, and so on.

Rather than attempt a one size fits all interface, we expose configuration via
an Airtable base that we prepare. Airtable makes it much more convenient to
enforce structure and give a nice interface to the configuration - plus an
API. Highly recommended.

We have been pretty surprised at the variety of processes folks have
implemented there, and it's easy for our support team to help them. Airtable
did a write-up about the pattern here: [https://blog.airtable.com/api-content-
series-frameai-convers...](https://blog.airtable.com/api-content-series-
frameai-conversation-analysis/)

Excited to see the OP, which takes the concept much farther!

------
matijash
I like the demos a lot, it is easy to understand the idea from them! How hard
was it to write the browser extension and how well does it work for the
different sites?

Obviously the main challenge, as other mentioned, is that not all of the data
is present on the frontend. Also, user cannot permanently change the app,
since just the DOM is changed and that is not persisted anywhere, am I right?

But the whole idea to be able to peek "under the hood" of an app and
customise/edit it sounds very appealing to me! I am actually working on the
open source project that has that aim, to "understand" the web app from
within.

But of course for that we had to go with a bottom-up approach, so we are
building a DSL for describing how a web app behaves: [https://github.com/wasp-
lang/wasp](https://github.com/wasp-lang/wasp)

Would be happy to hear your thoughts on it!

~~~
foreigner
wasp looks awesome! I signed up to your mailing list.

------
adamredwoods
This is essentially the early Filemaker Pro all-in-one desktop app (pre-
Apple), maybe even Microsoft Access, but in the browser. I always liked the
flexibility of Filemaker, being able to add a new field and pull data into it.
I could never find anything comparable. Good to see a revival of the concept.

------
iddan
If you want to integrate an interactive spreadsheet to your React application
check out: [https://iddan.github.io/react-
spreadsheet/](https://iddan.github.io/react-spreadsheet/)

------
thesuperbigfrog
Spreadsheet-Driven Customization is a great way to enable non-technical users
to customize and configure software.

I used that technique for a one-off Java application back in 2011. The Java
application did not do any live synchronization with the spreadsheet like
Wildcard does. It just read the spreadsheet at application start-up to get
configuration data needed to drive the application logic.

Spreadsheet-driven customization allowed the application's users to edit the
spreadsheet to grow and maintain the dataset that drove the application logic.

I would not be surprised if others have done something similar before me.

~~~
capableweb
Yup, doing the same for a frontend we're currently building. Basically some of
the core features exposes warnings (on purpose) to the user once they do an
action that they might want to do in a different way. These errors just have
error codes assigned to them on the backend side and the frontend loads the
real messages from AirTable on boot, which then is used to show the user-
friendly title and description. We're doing the same thing for a couple of
things and it has really cut down and development time as product team can now
change the frontend themselves by just editing cells in AirTable instead of
creating tasks for the development team.

------
denster
This is very cool!

While I'm biased [-], I think this only scratches the surface of what
spreadsheets can do.

What if spreadsheets could be used to create all software? Could the software
be of the same quality & sophistication as that built with code?

To fellow readers on HN -- what do you think, is it possible?

[-] Founder of MintData, [https://mintdata.com](https://mintdata.com)

~~~
yumaikas
I think a lot is possible if you could use spreadsheets to create all software
(I even have an idea on the back burner to try in that direction).

That being said, $95/month isn't an accessible price point for anyone that
isn't trying to integrate this into a business of some sort. Which might be
fine for your market, but it doesn't work for most hobbyists, or people
wanting to write personal projects. Heck, Adobe Creative Cloud can be had for
half of that price.

I realize your market probably doesn't include those people right now, which
is a valid business decision. I've got plans to try to make a scriptable web
spreadsheet application that allows people to make their own websites with
that sort of thing.

That being said, if you had a $5/month or $10/month tier, or an option to
self-host without some of the fancier functionality, I'd be all over trying
those out.

------
secondary
Some ten years ago there was an online spreadsheet startup called Hypernumbers
which, I believe, at one point pivoted to letting people build websites using
their spreadsheet. Intriguingly close to this, but without the mashup angle,
which is important.

------
chrisweekly
Brilliant! This really resonates for me, as someone who's used to dev tools
for centralized state management (a la Redux or especially Mobx) ... thanks
for sharing and good luck! :)

------
slifin
Seems like you'd want to look into eavt databases for a universal schema,
Datomic has a lot of resources explaining it's schema

------
foolinaround
nice, this would be very useful in enterprise/internal facing apps, where the
users are 'power users' who already know and know the app on a day-to-day
basis.

------
agumonkey
I feel like watching 2020 version of Access

