I posted this maybe a year ago. It was originally an electron based desktop app which was cool but pretty hard to maintain / get people to download and test it out. Given that there would likely always need to be a cloud component regardless I decided that it would be easier to port it over into a web app and maintain it that way.
Since doing that I took a job and am pretty burned out on the project. I was thinking about open sourcing it and seeing if it gets more interest that way. Potentially as a self contained docker image that folks could pull down and run.
Here is the current API. Probably needs some de-scoping so that its only the important endpoints but its a look under the hood if anyone is curious. - https://app.gluedata.io/docs#/
Anyway if anyone has any comments, suggestions / feedback please let me know. Happy to share any details or thoughts.
If you think you were burned out before, becoming an open source maintainer probably won't help :) If I were you I would put out a call for maintainers, and maybe consider a license that keeps modifications open source. That way a company that modifies it will have to contribute back to the project. If you do open source, I recommend moving your roadmap to a GitHub Project, put together a contributors guide/agreement.
Companies will definitely be averse to using a project due to trademark or licensing issues. Apparently somebody is using the name (https://alter.com/trademarks/glue-85698439). You could probably find a way to signify your use is distinctive from that other one, and you may want to register it to protect it. IANAL.
I think it has a lot of potential, but finding people who want to build it out/run the project will be the challenge.
"If you think you were burned out before, becoming an open source maintainer probably won't help" - Ha yeh I believe that.
"That way a company that modifies it will have to contribute back to the project. If you do open source, I recommend moving your roadmap to a GitHub Project, put together a contributors guide/agreement." - Thanks for the advice
Very cool. A GUI pandas. As a seasoned user of pandas, the way it works seems clear. I get the potential appeal of a no-code solution like this: "Glue is to pandas as Airtable is to relational databases."
The question in my mind: how many users are there that want to build these kinds of data transformation pipelines that are also unwilling to learn Python/pandas (or R, or some other equivalent)? Because once you need to build a custom lambda, or do something downstream of the transformation, like a custom data visualization, you need to know how to write code.
Which I think gets to the heart of the matter: this replaces the easy part of data science. The hard parts are things like understanding what the columns really are, cleaning up messy data, and doing statistical modeling that yields valid insights.
I think open-sourcing it is a good idea. I can see how this could be added to a more complete software ecosystem that is designed to be used by non-coders, like an in-house ELN for a biotech. But as a paid project, I'm almost surely not going to ever give it a try.
Just a note: the first thing I did was check to see if the project was open-source, and that was before reading your comment indicating you were considering publishing the source. I did that because I'm not willing to get locked in to another walled garden.
That does not, of course, imply that publishing it with a FOSS license is the right move for you! I just thought it might be helpful to report my behavior.
Yeh I get it. I appreciate the comment. Part of the reason I am considering open sourcing it would be to give the project more life than I can currently give it myself.
I can fully understand you burning out on something like this. How about declaring a time-out period where development is paused while you take a break? Maybe triage bug reports and feature requests in the meantime, but take some time to yourself to recover and reset your perspectives.
This site makes my phone slow down to a crawl, to the point that my music app stopped. And it's just the landing page. This is insane. What is the resolution on those auto-playing videos?
Outputs are the same (excluding databases currently). This is probably the most time consuming piece of the project if I was to expand this set dramatically given the quantity of possible connections, although here is where open source could help.
It's hard to know how technical or not technical to take this. e.g. could there be a python input e.g. arbitrary python script to pull in data? That would allow for basically any input or output. But for non technical users thats a harder sell... open to ideas.
Yeh I still appreciate the callout / opinion though. It's not obvious to me who is best to target. As a full stack engineer with analytics experience one of the main uses of the tool is scheduling, running pipelines remotely vs the UI for data munging which I can write pretty easily. But for non technical users the data munging piece might be really helpful (e.g. no code). Its a little hard to serve both a marketer and a developer given the needs are so different. But then on the other hand I assume a marketer is more likely to pay for the tool (and thus keep it going) vs a dev who might be a harder sell. I built this tool with the hope of being able to serve both which is where maybe I bit off more than I could chew.
I'd say you should still work toward whatever your own interest with the tool is. If it seems like too much work, make a list of all the functionality you want, figure out the easiest way to implement each, and work on the least-effort/highest-reward things first. And do whatever feels fun!
Oh this is very interesting, I had never heard of this. Yeh I could potentially use this. I do think using a third party library or provider makes sense to massively expand the input / output options. Seems like there is a standard schema for the different tap configs that I could pull and wrap in UI forms in a generalized fashion.
I actually use Stitch so I don’t have to host orchestration, which is how I came to know about Singer in the first place.
It’s true that Stitch got acquired by Talend, but Singer seems independent enough. I don’t understand why Stitch would “abandon” Singer. They’re a sponsor, and seems like they have an incentive to keep that ball rolling. Their bread and butter seems to be hosting orchestration, so more taps means more customers for them. There’s other stakeholders in Singer, too, like Meltano, and all the self-hosters.
The GitHub activity seems robust to me, but I don’t really have any in depth knowledge about the Singer community. Saying it’s “dying” doesn’t seem accurate, though.
Hey! Yeh I did https://g6.antv.vision/en although I think I might try using this https://x6.antv.vision/en if I could start again. X6 wasn't as well built out when I started though but its probably more suited to my use case with its interaction support. If you have other libraries that serve this purpose let me know because I did a lot of searching and thats the best I found at the time.
Digging around a bit I must say I am very impressed by all the data-viz stuff the antv group makes. Would really like to be able to try out their GraphInsight tool (https://github.com/antvis/GraphInsight/blob/master/README.en...) but it seems to Chinese only for the moment
If you have other suggestions for the tool symbols let me know. With this project the right balance between technical and non technical is hard to hit.
I posted this maybe a year ago. It was originally an electron based desktop app which was cool but pretty hard to maintain / get people to download and test it out. Given that there would likely always need to be a cloud component regardless I decided that it would be easier to port it over into a web app and maintain it that way.
Since doing that I took a job and am pretty burned out on the project. I was thinking about open sourcing it and seeing if it gets more interest that way. Potentially as a self contained docker image that folks could pull down and run.
Glue is pretty similar to these commercial projects https://www.alteryx.com/, https://www.dataiku.com/, https://parabola.io/ and other DAG based data pipeline generators. I also saw this posted on hackernews within the last year or so https://datablocks.pro/ which is pretty cool too (also more of of an indie project).
Here is the current API. Probably needs some de-scoping so that its only the important endpoints but its a look under the hood if anyone is curious. - https://app.gluedata.io/docs#/
Anyway if anyone has any comments, suggestions / feedback please let me know. Happy to share any details or thoughts.