Hacker News new | past | comments | ask | show | jobs | submit login
Huginn: Like Yahoo Pipes plus IFTTT on your server (github.com)
524 points by ColinWright on Apr 14, 2014 | hide | past | web | favorite | 94 comments

Also relevant: How the New York Times interactive team uses Huginn


> Most prominently, we used it during our Olympics coverage to monitor the results of the API we built and let us know if the data ingestion pipeline ever grew stale. To do that, we set up a pipeline

I always liked the Yahoo Pipes concept... but it didn' seem take off... and I personally found it too limited for everything I tried to do with it. Perhaps it's just another case of the old "visual programming language" is harder than it looks.

I hope Huginn does better. I like their copywriting "You always know who has your data. You do."

Agreed. I did a multipart Yahoo Pipes project to find my current apartment. It grabbed info from two sites, tossed out the uninteresting ones, filtered it a bit, then texted me if a new apartment in my price/location range appeared.

Very useful, if a little awkward. The Huginn project sounds like a great alternative!

Just curious, how many hours of work did that take? Do you think it was worth it?

It's hard to tell -- I tweaked it quite a bit, and rewrote it from scratch 2-3x. I'd say 4-10 hours.

Was it worth it? As a programmer, no. I'm very familiar with scraping (raw) web/RSS feeds for data, then processing it. I was hoping Pipes would have enough intelligence, so that I could subscribe to (cooked) data sources, then split and refine the results.

In practice, Pipes worked, but the data always required further post-processing, which was awkward to do in Pipes. You have to be a dev to understand what your system is doing, but you don't have easy access to all the standard dev things.

I look forward to seeing Pipes take off, or another technology (Huginn? Ifttt?) replace it. It was a lot of fun to wire things up graphically then for example get a text when someone's RSS feed changed.

  You have to be a dev to understand what your system is doing, but you don't have
  easy access to all the standard dev things.
Interesting, this mismatch may be a good description of the problem of visual languages.

Curious: what do you think is the minimal subset of unix tools to do this? i.e. instead of pretending the problem is simpler than it is, accept the complexity, but minimize it.

I'm thinking of a tool like "jq" (sed for json) for json data sources... but I don't think its raw-text manipulation is up to the task (and of course you need tools to monitor the feeds etc).

Trying to manipulate structured data as text makes about as much sense as parsing XML with a regex.


Python :-) there are libraries specifically for parsing malformed html. I'm happy using Unix tools for scraping and parsing, but you run into a brick wall rather quickly. Python is more reliable, flexible, and easier to integrate.

I published a few pipes -- enjoy! http://pipes.yahoo.com/pipes/person.info?guid=W4YBIUCXEVLHMS...

They're all quite simple. The most complex one uses the "parse location into lat/long" Pipes feature to automatically find me jobs in the Los Angeles area.

The best part of Huginn is being able to self-host and write any arbitrary agents you want.

I don't know, the documentation is excellent. That is certainly one of the best READMEs I have ever seen. The organization of the setup sction of the README is superb. The author of the README clearly thought about the needs of new users as well as seasoned veterans. More projects need to adopt this general format:

  # Getting Started

  ## Quick Start

  If you are unsure of our project and just want to play around, you can
  get things set up quickly by:

  1.  Clone this repository and…
  2.  Do something
  3.  Do the other thing

  If you need more detailed instructions, have no fear. We are not going
  to look down on you if you are not an expert. We took the time to write
  a setup guide for newcomers: [Novice setup guide][novice-setup-guide].
  Everybody has to start somewhere.

  ## Real Start

  Follow these instructions if you wish to deploy your own version or
  contribute back to the project. There is nothing we hate more than
  README’s that ignore all of the practical concerns related to setting up
  a long term installation. Follow these steps and it will be easy for you
  to keep up with updates to the project and still retain the all tweaks
  you made to suit your idiosyncrasies.

  ## Odds and Ends

  ### Optional features

  Not everybody needs a XYZ plugin or wants to share their every action
  with PQR. You can enable these features by…

  ### Rare Corner cases

  In certain rare circumstances you made need to prevent X or implement Y.

  Prevent X by…

  If you need to implement Y…

I wish the agents could language-agnostic though.

There's already support for agents written in javascript: https://github.com/cantino/huginn/blob/master/app/models/age...

And for more substantial tasks it'd be possible to write an agent in another language and then call it from ruby in some way.

I submitted an issue just now to add support for Heroku-buildpack-style agents (just a directory of executables)


Same. I wonder how hard it would be to bake in some kind of local webhook system for that purpose. Haven't looked too closely at this, though.

there are a myriad ways or linking software languages.

This looks really awesome for managing an office. We're currently automating things using Google scripts and other custom glue to do things like order food, get feedback on lunch and mail people weekly digests activities. Sounds like this could be a great solution for this.

Writeup please if you do this.

weekly digestion activites...?

Weekly digests :-) We're about 35 people split in 5 teams. On a weekly basis each team gets a "weekly update" mail containing a google document that gets created off a template doc. The weekly update contains some questions that basically ask the team what they did that week. It's a shared google doc so the team can collaborate to fill it in. Those filled in docs get aggregated into a single PDF and gets sent to everyone on a Monday morning. So everyone stays in the loop with the other teams' progress.

Zapier is also good with lots of integrations, but it's a little pricey. Yet if you calculate what your time is worth and include the amount spent on making this work plus customizations, it's probably less. Depends on if Zapier can do what you want.

This is a really frustrating name. Hugin is already used for panoramic photo stitching software: http://hugin.sourceforge.net/

This just has another N bolted on to the end and does something completely different.

I'm not sure about the etymology of Hugin, but Huginn is more than likely a reference to Norse mythology (for Anchorman fans, you'd recognize it as "Great Odin's raven!"):


It specifically states it in the readme that Huginn is a reference to the raven.

Yep -- I understand where the name comes from, I just personally find it very frustrating when two OSS projects are so closely named. It gets really hard to search for one or the other once both become successful.

(I also have this complaint about a lot of the single-word named OS X applications... Unless they have a LOT of traction then it's hard to find specific info on them.)

Had to debug a Cucumber problem involving recipes in a Chef cookbook. I was building up a TDID toolchain at the time. After wading through six google pages of salad, decided to use different tools.

That's a pretty amusing example, but I'd be surprised if prepending 'ruby' to your query wouldn't have fixed it.

A moot point now, but the query wasn't the problem.

One of the developers posted about this recently: https://news.ycombinator.com/item?id=7582316

... and in March 2013: https://news.ycombinator.com/item?id=5377651

BTW, a great idea and an impressive side project!

I am working on a similar project called Taskflow.io that is aimed at more backend business oriented tasks. It can do similar things through an interface flowchart editors where you make the actual flowchart that gets executed. I would still consider it a public beta. I would love your feedback.

Will this be provided As-A-Service, or will it be a downloadable product that can be deployed in-house? This is exactly what I have been looking for for a while, but there's absolutely zero chance we're going to send any of our business information to a remote service.

I've wondered about this quite a bit, since I run computationally intensive analysis on sensitive data, and some of the same thinking would apply in this context.

In brief, I could provide an appliance on something as trivial as a Raspi that updates itself over VPN, and would let you run the services on your own systems. Would that work for you if one of these providers did the same?

Obviously we could do better with a custom system deployed onsite, but the idea is to simplify the process and potentially eliminate cost of getting started; similar to Square sending out card readers.

It depends. We've got pretty strict security requirements as we operate in the medical and government sector. A black box appliance or something that auto-updates outside of normal patching rounds is probably out of the question.

It's as a service now. I've had other people express the same kind of concern with data security. We can talk about a self hosted version.

Hey, I tried it, but I got stuck trying to set up a webhook, as there's nothing in the dropdown. Here is a screenshot:


Crowded space... We're trying to go the same thing for business intelligence applications with a product we're launching called flowreports.co...

There are other companies doing workflow automation, but their products seem clunky and not aimed at web-services, which companies are increasingly relying on. I want to be the IFTTT of back room office tasks. I want anyone in an organization to be able to create a workflow to automate some mundane process they have at their job. I would love any feedback you have of Taskflow.io so I can get my product to that level. Here is a link for any feedback you might have http://eddie.taskflow.io/start_process/201?return_url=http%3...

Another Pipes+IFTTT tool: https://wewiredweb.com

Like many of the others that have been posted here, it's not self hosted

Will this run on a standard heroku stack? The wiki says it will run on OpenShift and CloudFoundry. https://github.com/cantino/huginn/wiki

It will run, but the default Procfile spins up 4 processes, so Heroku might be expensive. If someone wants to figure out how to get everything to run easily in one process, that would make free Heroku hosting possible. I run it on a small VPS.

I would also love to have this running on a free Heroku process.

Anyone know why this project encourages using a private fork to do contributing development?

> "Make a public fork of Huginn. [...] Make a private, empty GitHub repository called huginn-private. Duplicate your public fork into your new private repository[. ...] Checkout your new private repository. Add your Huginn public fork as a remote to your new private repository[. ...] When you want to contribute patches, do a remote push from your private repository to your public fork of the relevant commits, then make a pull request to this repository."

Just to let you keep any private changes private. Perhaps it's not the best recommendation.

Ah. It seems unnecessarily complicated for people trying to get started. Perhaps preface it with a note saying something like "if you'd like to keep your commits private, follow this brief guide" so it doesn't seem required?

I agree, thanks for pointing this out. I've extracted that section to the wiki.


(For the record I can't wait to try out Huginn; I've been using Yahoo Pipes for years... I've apparently got one pipe from before when they started using only hex characters as pipe IDs.)

Exciting stuff, it would be amazing to build an AI layer on top of this that mines your browsing habits (depending on your paranoia settings) and automatically generates agents based on your interests.

Excluding the UI, I wonder if storm is a more robust, if more complex, option to do the same types of things: http://storm.incubator.apache.org/

Storm doesn't naturally support dynamic topologies and is rather resource hungry, which needs a bit advanced planning. I was looking @ Storm for my own pipelining product (bip.io) very early on and shied away as too high an opportunity cost for self-hosting users/devs to be bothered with. On a Rasberry Pi for example, forget about it. Without being able to create dynamic graphs it otherwise just ends up being a simple message bus (anti-pattern).

bip.io looks very cool. Do you think our efforts should be combined?

Exploring somehow combining efforts and/or the two projects has my vote!

Storm is a framework for coordinating computation. It's not really designed to "perform automated tasks for you online" - although of course you could make it do that.

If you want to run a machine learning algorithm on 100 machines then Storm is what you want. Want a service to check the weather for your location? Huginn looks good.

You are right that Storm is certainly more robust for large amounts of data. But Storm just provides underlying infrastructure. Huginn builds on top of something like that to add different agents for twitter, weather etc. So afaict, Huginn is an app built on top of something like Storm.

This sounds like an excellent project to make use of my raspberry pi.

I was just building exactly this, only worse. Looks really great.

This would be very cool for automating parts of AWS. Inclement weather coming? Or an earthquake? Start spooling up servers in another region.

Does this have a companion android/iOS app to upload location data? I really like the idea of self hosting something like this.

I extended a Find My iDevice API lib about a year ago with the intention of creating a Huginn agent for exactly this, but life got in the way. I also wanted to add a geofencing agent, too.

Unsure if there's a better lib out there, but here's what I worked on...warning: needs lots of love, eg. tests: https://github.com/rickyc/find-my-ios-device

For Android there is https://github.com/jcs/triptracker -- which I did not try yet..

You could probably get all your data from google as well since it's stored on their servers, without having to upload from your phone

Does Google provide programatic access to their location history system without scraping?

I don't know about an API, but there's an export to KML download link


This would be a great addition to Huginn, if you'd like to submit a PR!

I'll create bounty for this if someone wants to do it (as my username communicates, I don't have the time :( )

Huginn is in bountysource, if you want to make one!

iOS App (requires RubyMotion) https://github.com/cantino/post_location

Where can you get an invite code? http://snag.gy/xh6uk.jpg

The default invite code is 'try-huginn'

Is there an online sandbox anywhere to check it out? A project like this simply calls out for their to be a live demo.

This is awesome but is there a tool like this in php? I am looking for a easy visual scraper

what would be great is if each agent was somehow able to obtain it's own ip address.

With IPv6 there's no reason you couldn't do this, but what is the use case? I'm not seeing what you could do with individually addressable agents that you couldn't do otherwise.

Can you elaborate how one can do this with IPv6? What hosts have this? How many IP addresses can you get?

Basically for web scraping. If you had multiple threads and each of them had separate IP addresses, you'd have a better chance than doing it with one IP address.

Just about any host with IPv6 support will assign you a /64 block which is way more addresses than you'd need for this. Your case would then depend on the site you're scraping supporting IPv6, though.

Not really. Websites can easily block scrapers from the same /64 block, so it doesn't matter if you got a 1000 different IPs.

Am I missing something or is this just another rules engine?

Did you dismiss twitter with the same line?

Twitter is also just another rules engine, with pretty simple rules about which tweets you receive. And yet it's also so much more. It's a platform, and it's a social network. And it's something that many people love to use.

Who cares if it's just another rules engine under the hood?

Twitter isn't really a rules engine as it doesn't have the ability to create workflows or manage rule priority afaik but let's ignore that for a second.

Rules engines are typically a terrible idea and I say this as someone who has worked at two large corporations, one a bank, where rules engines were heavily used so they could avoid having the larger development staff they really needed. Rules engines fail miserably every single time and eventually have to be replaced.

The problem is that as time goes on people who don't know any better end up writing larger and more complex rules and workflows without an understanding of the side effects those rules generate. The end result inevitably becomes a huge mess that is extremely fragile and nearly impossible to follow.

Yes, but for simple, personal rules that won't have any major repercussions if they fail (e.g. email me when I get 10 likes on my last Instagram photo), I can see how they'd be useful.

...not that my hypothetical rule is useful, but you understand what I'm getting at.

Absolutely. For simple, isolated, inconsequential tasks this works fine. The problem is rules engines always start very innocent and simple, then users request the ability to have rules call each other, then they want to store results, etc. In an ideal world this is a good thing but I have yet to hear of any place where rules engines, used at any significant scale, aren't a complete disaster.

Is it typically a disaster because business users are given access to create their own rules and they don't know what they're doing, or because the complexity of the system grows to the point where that complexity is better managed by other tools (version control systems, QA environments, rigorous testing, etc.)?

In my experience, both. It starts with business users creating rules without having a clue what they are doing and it ends with the system becoming so complex that it would have been better off being written by engineers using tools more appropriate for the job.

Typically what happens is that business discovers they can now implement every last feature they desire without getting any push back from engineering so they go wild implementing new features without realizing the consequences. There is no VCS, no QA, no testing. There is no one telling them they cannot do something because it won't scale, it isn't secure or it won't be maintainable.

Their only metric for success is that they get the result they want now and the long term consequences be damned. Worse yet, every single person using the rules engine is acting independently and not as a team. There is no code review, when the rule works to their satisfaction it gets pushed into production.

At first everything works fine and people get promoted for saving money on engineering costs but then the rules start getting more complex, start becoming composed of other rules, need to have more complex actions or need to integrate with third party systems. Eventually the simple rules engine turns into a bastardized programming language that everyone adds onto and never modifies because no one understands how a modification will affect the 4000 other rules in the engine. At that point you end up having to do a complete re-write, which is something I have had the displeasure of doing in the past.

I think businesses still need flexible / easy to use systems that allow end-users to create solutions quickly. This may involve analysts, IT professionals, and devs working together on the same platform. For examaple, an IT pro writes the sql queries, an analyst writes the regression algorithms, and the devs writes the output adapters.

Typically, by the time you get the dev team to fully implement the solution, it has missed its mark and the analysts have moved on.

Players in the mashup landscape are "trying" to provide scalable and robust, yet flexible and easy-to-use systems.

plug - flowreports.co is one of these ... and it can be self-hosted.

Businesses have hundreds of flexible easy systems to let end-users create quick business rule based solutions (particularly for reporting purposes) and have had them since the late 80s maybe earlier, the corporate landscape is littered with them. I wish you the best of luck, that's a tough market to get into.

Thank you for the feedback. Perhaps if we add things like support for version control systems, play well with releasing code to multiple environments, and add UI elements that enforce cloning chunks of code so that changes are isolated and the system can maintain it's coherence even among disparate users we can avoid becoming some of those issues. Something to think on...

That's definitely a step in the right direction. The biggest hurdle you'll have to overcome is getting your application to enforce a process, that's a lot harder than you think it is because people tend to take the path of least resistance and process is rarely that path. You'll have to get buy-in from very high levels of any organization you work with, otherwise things will devolve quickly. Either way, good luck, I'll check out your product demo when you've completed it.

I pretty much just know what rules engines are from a pretty high level. Any suggested reading for digging deeper into them and alternatives to them?

There are very few places where business rules change so quickly that a rules engine is needed. Rules engines are essentially a poor practice used by businesses who don't clearly define what their goals are and stick to them. The alternative is to have a highly modularized system that is flexible enough for engineers to make changes to the code base in a timely manner, but that requires business to sit down and define the problem(s) they are trying to solve with their software. Getting that sort of time investment is difficult.

Not only that, but the interface is always a huge part of any product that's simple under the hood, and I'm going to check it out and evaluate right now...


You are doing it wrong. Colin's style is more like this:

  Previous discussion of the project:

  https://news.ycombinator.com/item?id=7582316 # Yesterday

See commit log. (you're on HN after all)

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact