> Most prominently, we used it during our Olympics coverage to monitor the results of the API we built and let us know if the data ingestion pipeline ever grew stale. To do that, we set up a pipeline
I hope Huginn does better. I like their copywriting "You always know who has your data. You do."
Very useful, if a little awkward. The Huginn project sounds like a great alternative!
Was it worth it? As a programmer, no. I'm very familiar with scraping (raw) web/RSS feeds for data, then processing it. I was hoping Pipes would have enough intelligence, so that I could subscribe to (cooked) data sources, then split and refine the results.
In practice, Pipes worked, but the data always required further post-processing, which was awkward to do in Pipes. You have to be a dev to understand what your system is doing, but you don't have easy access to all the standard dev things.
I look forward to seeing Pipes take off, or another technology (Huginn? Ifttt?) replace it. It was a lot of fun to wire things up graphically then for example get a text when someone's RSS feed changed.
You have to be a dev to understand what your system is doing, but you don't have
easy access to all the standard dev things.
Curious: what do you think is the minimal subset of unix tools to do this? i.e. instead of pretending the problem is simpler than it is, accept the complexity, but minimize it.
I'm thinking of a tool like "jq" (sed for json) for json data sources... but I don't think its raw-text manipulation is up to the task (and of course you need tools to monitor the feeds etc).
They're all quite simple. The most complex one uses the "parse location into lat/long" Pipes feature to automatically find me jobs in the Los Angeles area.
# Getting Started
## Quick Start
If you are unsure of our project and just want to play around, you can
get things set up quickly by:
1. Clone this repository and…
2. Do something
3. Do the other thing
If you need more detailed instructions, have no fear. We are not going
to look down on you if you are not an expert. We took the time to write
a setup guide for newcomers: [Novice setup guide][novice-setup-guide].
Everybody has to start somewhere.
## Real Start
Follow these instructions if you wish to deploy your own version or
contribute back to the project. There is nothing we hate more than
README’s that ignore all of the practical concerns related to setting up
a long term installation. Follow these steps and it will be easy for you
to keep up with updates to the project and still retain the all tweaks
you made to suit your idiosyncrasies.
## Odds and Ends
### Optional features
Not everybody needs a XYZ plugin or wants to share their every action
with PQR. You can enable these features by…
### Rare Corner cases
In certain rare circumstances you made need to prevent X or implement Y.
Prevent X by…
If you need to implement Y…
And for more substantial tasks it'd be possible to write an agent in another language and then call it from ruby in some way.
This just has another N bolted on to the end and does something completely different.
(I also have this complaint about a lot of the single-word named OS X applications... Unless they have a LOT of traction then it's hard to find specific info on them.)
BTW, a great idea and an impressive side project!
In brief, I could provide an appliance on something as trivial as a Raspi that updates itself over VPN, and would let you run the services on your own systems. Would that work for you if one of these providers did the same?
Obviously we could do better with a custom system deployed onsite, but the idea is to simplify the process and potentially eliminate cost of getting started; similar to Square sending out card readers.
> "Make a public fork of Huginn. [...] Make a private, empty GitHub repository called huginn-private. Duplicate your public fork into your new private repository[. ...] Checkout your new private repository. Add your Huginn public fork as a remote to your new private repository[. ...] When you want to contribute patches, do a remote push from your private repository to your public fork of the relevant commits, then make a pull request to this repository."
(For the record I can't wait to try out Huginn; I've been using Yahoo Pipes for years... I've apparently got one pipe from before when they started using only hex characters as pipe IDs.)
If you want to run a machine learning algorithm on 100 machines then Storm is what you want. Want a service to check the weather for your location? Huginn looks good.
Unsure if there's a better lib out there, but here's what I worked on...warning: needs lots of love, eg. tests: https://github.com/rickyc/find-my-ios-device
Basically for web scraping. If you had multiple threads and each of them had separate IP addresses, you'd have a better chance than doing it with one IP address.
Twitter is also just another rules engine, with pretty simple rules about which tweets you receive. And yet it's also so much more. It's a platform, and it's a social network. And it's something that many people love to use.
Who cares if it's just another rules engine under the hood?
Rules engines are typically a terrible idea and I say this as someone who has worked at two large corporations, one a bank, where rules engines were heavily used so they could avoid having the larger development staff they really needed. Rules engines fail miserably every single time and eventually have to be replaced.
The problem is that as time goes on people who don't know any better end up writing larger and more complex rules and workflows without an understanding of the side effects those rules generate. The end result inevitably becomes a huge mess that is extremely fragile and nearly impossible to follow.
...not that my hypothetical rule is useful, but you understand what I'm getting at.
Typically what happens is that business discovers they can now implement every last feature they desire without getting any push back from engineering so they go wild implementing new features without realizing the consequences. There is no VCS, no QA, no testing. There is no one telling them they cannot do something because it won't scale, it isn't secure or it won't be maintainable.
Their only metric for success is that they get the result they want now and the long term consequences be damned. Worse yet, every single person using the rules engine is acting independently and not as a team. There is no code review, when the rule works to their satisfaction it gets pushed into production.
At first everything works fine and people get promoted for saving money on engineering costs but then the rules start getting more complex, start becoming composed of other rules, need to have more complex actions or need to integrate with third party systems. Eventually the simple rules engine turns into a bastardized programming language that everyone adds onto and never modifies because no one understands how a modification will affect the 4000 other rules in the engine. At that point you end up having to do a complete re-write, which is something I have had the displeasure of doing in the past.
Typically, by the time you get the dev team to fully implement the solution, it has missed its mark and the analysts have moved on.
Players in the mashup landscape are "trying" to provide scalable and robust, yet flexible and easy-to-use systems.
plug - flowreports.co is one of these ... and it can be self-hosted.
Previous discussion of the project:
https://news.ycombinator.com/item?id=7582316 # Yesterday