It is messy and overly ambitious, but promises something like a return to the "view source" mindset of the old web - where data was in plain sight and anyone curious and a little tenacious could reshape the web for their own needs.
I have gone partway down this path for a related concept, and browser extensions are really the only way to go. The biggest risk and hassle is a reliance on brittle, site-specific logic to make things work well. I haven't dug into this project yet to see how automated any of this is or might become, but if there is an element of community sourcing (like a ruleset for scraping AirBnB effectively) it opens up a potential attack vector like any GreaseMonkey-tyoe script, especially if passed routinely to less technical users. Not a huge issue on day 1 but not an easily solvable issue.
Brittle site-specific logic is indeed a challenge. So far we've started with the simplest thing possible of programmers manually writing scraping code, so we can focus on how the system works once you have the data available. That has been enough to test the system out and build lots of useful modifications ourselves.
I think eventually some degree of automation will be an important way to help end users use this tool with any website. The "wrapper induction" problem has been well studied and there are lots of working solutions for end-user web scraping, so I expect to be able to integrate some of that work.
We're also interested in a community of shared scrapers, but as you point out there are security considerations. I'm considering trying central code review from the project to approve new site adapters and make sure they aren't doing anything obviously malicious. Another solution could be carefully restricting the expressivity of our scraping system (eg only specify CSS selectors, no arbitrary code) but I doubt that would be sufficient for all cases. Would appreciate any suggestions here.
To bring it full circle, here are a couple recent demo videos I made of using Wildcard to customize the HN front page:
Are you aware of Ted Nelson's ZigZag data model? https://en.wikipedia.org/wiki/ZigZag_(software) It seems like that might work well with Wildcard.
Are you eventual plans pure research, productization or open-sourcing it?
It seems to rely on a willingness of the company owning the data to disclose their full data set up you. Currently, with things like GraphQL, we are moving in the opposite direction in that the server only sends you those columns that are absolutely required to fill the fields in your GUI.
Since they used it as the example, I don't see any incentive for AirBnb to let random people on the internet download their full internal data tables. Quite to the contrary, AirBnb will block you from accessing their servers if they believe that you are scraping.
So this is a new way for users to toy around with the limited incomplete data set that the website operator was willing to give them. But it won't empower users. What if AirBnb implements server-side pagination, so that your client doesn't even receive the data for the cheapest apartment, because it's on a different page?
Tools like this would be perfect in theory to enhance social networks like LinkedIn with an export and batch processing capabilities. But the company claiming ownership of your contacts will surely prevent you from actually getting a useful export.
Plus there's cases where the data is on a server because it's impractically large. For example, try this to improve your Google search results. Downloading a 100mio row spreadsheet as the first step?
So far, we've decided to defer thinking about that limitation, and first focus on other questions like getting the spreadsheet interactions right. We're making new site adapters every week and finding that we can build lots of useful modifications for ourselves which work even with only one page of a paginated list. For one example, see my demo of modifying HN front page , which I find useful even though it only loads the current front page articles.
At some point, we're considering adding more features around fetching subsequent pages of a table (as explored in Sifter , which sorts an entire list of search results across pagination boundaries) or scraping each detail page from a table (as explored in Helena ).
Low floor, high ceiling is the best case for that framework, and should be every toolmakers ideal.
The Airbnb story reads like a sign of the times. Platforms can do as they like, users just have to conform. PCs and the internet promised the kind of programmatic control described here(I wonder if there is a better term than "programmatic" control?), end users should be able to come up with arbitrary representations of the data they query on the fly and realize them as quickly as possible.
Web UIs are stupidly underpowered, table based queries for flights as presented here seem much more usable. Michel Beaudouin-Lafon has a few great ideas to explore here, "One is Not Enough" which he described in a different context but I think can apply to the desire for composability between multiple tools here (Airbnb + walkability) and "software is not soft" describing the boundaries placed on software users. I have many tools for manipulating strings or sorting numbers, why can't I use them on the Airbnb table listings, served up on my computer?
Thanks for sharing.
As a co-founder of Handsontable, I am proud to see it used in this paper. Handsontable is a commercial spreadsheet component, however it is free for non-commercial purposes such as education, research, study, and personal use, testing and demonstration: https://github.com/handsontable/handsontable/blob/master/han...
re: scraping, it's true that semantic HTML makes things easier, but we've also been building site adapters for a variety of modern sites that use frontend frameworks, "utility CSS", etc. Most promising solution so far is something I'm calling "AJAX scraping" -- observe the JSON request made by the client and just directly extract structured data from there.
I especially love the Excel like formula feature.
As a 50 something it has been one of my ultimatedream, but it has proven to be hard all trough my very short history with computers. Letting the user modify their view in a GUI is always a hard task to solve.
The curl trick worked for so long, it's nice to see that you can get a better experience with wildcard with the div/js spaghetti today.
 curl https://example.com/apartment/[0-1000] -o \#1.html
Professor Daniel Jackson runs this lab. His book, Design by Concept, is a phenomenal read. It made me understand why software can be so unintuitive for people who haven't grown accustomed to its idiosyncrasies that I've come to internalize.
I do agree that the layered system of different ontology languages as present in current semantic web standards is not beginner friendly, but it doesn’t mean they can’t be improved on.
I think you might be throwing the baby out with the bath water.
(But even the draft version is a fantastic read!)
Something that's conceptually related but pretty different is Workbench from the Columbia School of Journalism (although glancing at their page they may be some kind of dumb startup now).
Of course most web apps don't expose an API, so here we are.
I think another useful analogy for thinking about abstract data representations is text streams in UNIX. It turns out many types of data can be represented as newline-delimited text, which enables you to use a suite of generic tools with that data. Inevitably, some data doesn't fit into that format, but it's perhaps surprising how much does.
What happens if you try to sort listings by price and the actual item with the lowest price hasn't even been fetched by the browser yet?
I had many interesting conversations in my undergraduate research lab trying to find the right place to draw the line between engineering and research. Problems can be more apt classified as engineering when there is high consensus on what the actual problem is. Research often addresses what question should we be asking to determine the problem that may then be solved. Most often there is a series of research and engineering iterations intertwined with each other.
This however is refreshingly different, and demonstrates the potential power that users can have over the data displayed in browsers...
"Wildcard" however either needs; AI to detect and classify unknown HTML as rows in a table. OR tonnes and tonnes of integration code (glue code) for all the popular websites used... which seems to be the plan
Why should I invest in Wildcard's API instead of Google's? Am I missing something?
One of our products helps teams ensure response times on shared Slack channels. On some teams, the duty schedule of who should respond in these channels evolves in complicated ways - for example, complex business hours and holidays, account managers with backup reps, and so on.
Rather than attempt a one size fits all interface, we expose configuration via an Airtable base that we prepare. Airtable makes it much more convenient to enforce structure and give a nice interface to the configuration - plus an API. Highly recommended.
We have been pretty surprised at the variety of processes folks have implemented there, and it's easy for our support team to help them. Airtable did a write-up about the pattern here: https://blog.airtable.com/api-content-series-frameai-convers...
Excited to see the OP, which takes the concept much farther!
Granted, not every custom UI is better than its spreadsheet version would be. But thats a different problem.
Otherwise, there are a lot of react datagrid and spreadsheet components to use if you feel that would be the best UI solution for your app.
Am I missing something?
This browser extension is targeted at the end user.
Obviously the main challenge, as other mentioned, is that not all of the data is present on the frontend. Also, user cannot permanently change the app, since just the DOM is changed and that is not persisted anywhere, am I right?
But the whole idea to be able to peek "under the hood" of an app and customise/edit it sounds very appealing to me! I am actually working on the open source project that has that aim, to "understand" the web app from within.
But of course for that we had to go with a bottom-up approach, so we are building a DSL for describing how a web app behaves: https://github.com/wasp-lang/wasp
Would be happy to hear your thoughts on it!
I used that technique for a one-off Java application back in 2011. The Java application did not do any live synchronization with the spreadsheet like Wildcard does. It just read the spreadsheet at application start-up to get configuration data needed to drive the application logic.
Spreadsheet-driven customization allowed the application's users to edit the spreadsheet to grow and maintain the dataset that drove the application logic.
I would not be surprised if others have done something similar before me.
While I'm biased [-], I think this only scratches the surface of what spreadsheets can do.
What if spreadsheets could be used to create all software? Could the software be of the same quality & sophistication as that built with code?
To fellow readers on HN -- what do you think, is it possible?
[-] Founder of MintData, https://mintdata.com
That being said, $95/month isn't an accessible price point for anyone that isn't trying to integrate this into a business of some sort. Which might be fine for your market, but it doesn't work for most hobbyists, or people wanting to write personal projects. Heck, Adobe Creative Cloud can be had for half of that price.
I realize your market probably doesn't include those people right now, which is a valid business decision. I've got plans to try to make a scriptable web spreadsheet application that allows people to make their own websites with that sort of thing.
That being said, if you had a $5/month or $10/month tier, or an option to self-host without some of the fancier functionality, I'd be all over trying those out.