
Huginn: Like Yahoo Pipes plus IFTTT on your server - ColinWright
https://github.com/cantino/huginn/blob/master/README.md
======
danso
Also relevant: How the New York Times interactive team uses Huginn

[https://source.opennews.org/en-US/articles/open-source-
bot-f...](https://source.opennews.org/en-US/articles/open-source-bot-
factory/?utm_content=buffereacd3&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer)

> _Most prominently, we used it during our Olympics coverage to monitor the
> results of the API we built and let us know if the data ingestion pipeline
> ever grew stale. To do that, we set up a pipeline_

------
hyp0
I always liked the Yahoo Pipes concept... but it didn' seem take off... and I
personally found it too limited for everything I tried to do with it. Perhaps
it's just another case of the old _" visual programming language" is harder
than it looks_.

I hope Huginn does better. I like their copywriting "You always know who has
your data. You do."

~~~
shavenwarthog2
Agreed. I did a multipart Yahoo Pipes project to find my current apartment. It
grabbed info from two sites, tossed out the uninteresting ones, filtered it a
bit, then texted me if a new apartment in my price/location range appeared.

Very useful, if a little awkward. The Huginn project sounds like a great
alternative!

~~~
scuba7183
Just curious, how many hours of work did that take? Do you think it was worth
it?

~~~
shavenwarthog2
It's hard to tell -- I tweaked it quite a bit, and rewrote it from scratch
2-3x. I'd say 4-10 hours.

Was it worth it? As a programmer, no. I'm very familiar with scraping (raw)
web/RSS feeds for data, then processing it. I was hoping Pipes would have
enough intelligence, so that I could subscribe to (cooked) data sources, then
split and refine the results.

In practice, Pipes worked, but the data always required further post-
processing, which was awkward to do in Pipes. You have to be a dev to
understand what your system is doing, but you don't have easy access to all
the standard dev things.

I look forward to seeing Pipes take off, or another technology (Huginn?
Ifttt?) replace it. It was a lot of fun to wire things up graphically then for
example get a text when someone's RSS feed changed.

~~~
hyp0

      You have to be a dev to understand what your system is doing, but you don't have
      easy access to all the standard dev things.
    

Interesting, this mismatch may be a good description of the problem of visual
languages.

Curious: what do you think is the minimal subset of unix tools to do this?
i.e. instead of pretending the problem is simpler than it is, accept the
complexity, but minimize it.

I'm thinking of a tool like "jq" (sed for json) for json data sources... but I
don't think its raw-text manipulation is up to the task (and of course you
need tools to monitor the feeds etc).

~~~
TylerE
Trying to manipulate structured data as text makes about as much sense as
parsing XML with a regex.

[http://stackoverflow.com/questions/1732348/regex-match-
open-...](http://stackoverflow.com/questions/1732348/regex-match-open-tags-
except-xhtml-self-contained-tags/1732454#173245)

------
albertsun
The best part of Huginn is being able to self-host and write any arbitrary
agents you want.

~~~
platz
I wish the agents could language-agnostic though.

~~~
albertsun
There's already support for agents written in javascript:
[https://github.com/cantino/huginn/blob/master/app/models/age...](https://github.com/cantino/huginn/blob/master/app/models/agents/java_script_agent.rb)

And for more substantial tasks it'd be possible to write an agent in another
language and then call it from ruby in some way.

------
malanj
This looks really awesome for managing an office. We're currently automating
things using Google scripts and other custom glue to do things like order
food, get feedback on lunch and mail people weekly digests activities. Sounds
like this could be a great solution for this.

~~~
anentropic
weekly digestion activites...?

~~~
malanj
Weekly digests :-) We're about 35 people split in 5 teams. On a weekly basis
each team gets a "weekly update" mail containing a google document that gets
created off a template doc. The weekly update contains some questions that
basically ask the team what they did that week. It's a shared google doc so
the team can collaborate to fill it in. Those filled in docs get aggregated
into a single PDF and gets sent to everyone on a Monday morning. So everyone
stays in the loop with the other teams' progress.

------
yukichan
Zapier is also good with lots of integrations, but it's a little pricey. Yet
if you calculate what your time is worth and include the amount spent on
making this work plus customizations, it's probably less. Depends on if Zapier
can do what you want.

------
c0nsumer
This is a really frustrating name. Hugin is already used for panoramic photo
stitching software:
[http://hugin.sourceforge.net/](http://hugin.sourceforge.net/)

This just has another N bolted on to the end and does something completely
different.

~~~
jmduke
I'm not sure about the etymology of Hugin, but Huginn is more than likely a
reference to Norse mythology (for Anchorman fans, you'd recognize it as "Great
Odin's raven!"):

[http://en.wikipedia.org/wiki/Huginn_and_Muninn](http://en.wikipedia.org/wiki/Huginn_and_Muninn)

~~~
14113
It specifically states it in the readme that Huginn is a reference to the
raven.

~~~
c0nsumer
Yep -- I understand where the name comes from, I just personally find it very
frustrating when two OSS projects are so closely named. It gets really hard to
search for one or the other once both become successful.

(I also have this complaint about a lot of the single-word named OS X
applications... Unless they have a LOT of traction then it's hard to find
specific info on them.)

~~~
inopinatus
Had to debug a Cucumber problem involving recipes in a Chef cookbook. I was
building up a TDID toolchain at the time. After wading through six google
pages of salad, decided to use different tools.

~~~
lepht
That's a pretty amusing example, but I'd be surprised if prepending 'ruby' to
your query wouldn't have fixed it.

~~~
inopinatus
A moot point now, but the query wasn't the problem.

------
FroshKiller
One of the developers posted about this recently:
[https://news.ycombinator.com/item?id=7582316](https://news.ycombinator.com/item?id=7582316)

~~~
jschulenklopper
... and in March 2013:
[https://news.ycombinator.com/item?id=5377651](https://news.ycombinator.com/item?id=5377651)

BTW, a great idea and an impressive side project!

------
fasteddie31003
I am working on a similar project called Taskflow.io that is aimed at more
backend business oriented tasks. It can do similar things through an interface
flowchart editors where you make the actual flowchart that gets executed. I
would still consider it a public beta. I would love your feedback.

~~~
zorbo
Will this be provided As-A-Service, or will it be a downloadable product that
can be deployed in-house? This is exactly what I have been looking for for a
while, but there's absolutely zero chance we're going to send any of our
business information to a remote service.

~~~
rch
I've wondered about this quite a bit, since I run computationally intensive
analysis on sensitive data, and some of the same thinking would apply in this
context.

In brief, I could provide an appliance on something as trivial as a Raspi that
updates itself over VPN, and would let you run the services on your own
systems. Would that work for you if one of these providers did the same?

Obviously we could do better with a custom system deployed onsite, but the
idea is to simplify the process and potentially eliminate cost of getting
started; similar to Square sending out card readers.

~~~
zorbo
It depends. We've got pretty strict security requirements as we operate in the
medical and government sector. A black box appliance or something that auto-
updates outside of normal patching rounds is probably out of the question.

------
alxndr
Anyone know why this project encourages using a private fork to do
contributing development?

> "Make a public fork of Huginn. [...] Make a private, empty GitHub repository
> called huginn-private. Duplicate your public fork into your new private
> repository[. ...] Checkout your new private repository. Add your Huginn
> public fork as a remote to your new private repository[. ...] When you want
> to contribute patches, do a remote push from your private repository to your
> public fork of the relevant commits, then make a pull request to this
> repository."

~~~
tectonic
Just to let you keep any private changes private. Perhaps it's not the best
recommendation.

~~~
alxndr
Ah. It seems unnecessarily complicated for people trying to get started.
Perhaps preface it with a note saying something like "if you'd like to keep
your commits private, follow this brief guide" so it doesn't seem required?

~~~
tectonic
I agree, thanks for pointing this out. I've extracted that section to the
wiki.

~~~
alxndr
Nice!

(For the record I can't wait to try out Huginn; I've been using Yahoo Pipes
for years... I've apparently got one pipe from before when they started using
only hex characters as pipe IDs.)

------
rcyeager
Another Pipes+IFTTT tool: [https://wewiredweb.com](https://wewiredweb.com)

~~~
finnn
Like many of the others that have been posted here, it's not self hosted

------
thomasfl
Will this run on a standard heroku stack? The wiki says it will run on
OpenShift and CloudFoundry.
[https://github.com/cantino/huginn/wiki](https://github.com/cantino/huginn/wiki)

~~~
tectonic
It will run, but the default Procfile spins up 4 processes, so Heroku might be
expensive. If someone wants to figure out how to get everything to run easily
in one process, that would make free Heroku hosting possible. I run it on a
small VPS.

~~~
alxndr
I would also love to have this running on a free Heroku process.

------
jayxie
Exciting stuff, it would be amazing to build an AI layer on top of this that
mines your browsing habits (depending on your paranoia settings) and
automatically generates agents based on your interests.

------
platz
Excluding the UI, I wonder if storm is a more robust, if more complex, option
to do the same types of things:
[http://storm.incubator.apache.org/](http://storm.incubator.apache.org/)

~~~
feralmoan
Storm doesn't naturally support dynamic topologies and is rather resource
hungry, which needs a bit advanced planning. I was looking @ Storm for my own
pipelining product (bip.io) very early on and shied away as too high an
opportunity cost for self-hosting users/devs to be bothered with. On a
Rasberry Pi for example, forget about it. Without being able to create dynamic
graphs it otherwise just ends up being a simple message bus (anti-pattern).

~~~
tectonic
bip.io looks very cool. Do you think our efforts should be combined?

~~~
crawfordcomeaux
Exploring somehow combining efforts and/or the two projects has my vote!

------
weavie
This sounds like an excellent project to make use of my raspberry pi.

------
okhan
I was just building exactly this, only worse. Looks really great.

------
zwentz
This would be very cool for automating parts of AWS. Inclement weather coming?
Or an earthquake? Start spooling up servers in another region.

------
kzahel
Does this have a companion android/iOS app to upload location data? I really
like the idea of self hosting something like this.

~~~
platz
You could probably get all your data from google as well since it's stored on
their servers, without having to upload from your phone

~~~
toomuchtodo
Does Google provide programatic access to their location history system
without scraping?

~~~
platz
I don't know about an API, but there's an export to KML download link

[https://maps.google.com/locationhistory/b/0/](https://maps.google.com/locationhistory/b/0/)

~~~
tectonic
This would be a great addition to Huginn, if you'd like to submit a PR!

~~~
toomuchtodo
I'll create bounty for this if someone wants to do it (as my username
communicates, I don't have the time :( )

~~~
tectonic
Huginn is in bountysource, if you want to make one!

------
kirk21
Where can you get an invite code?
[http://snag.gy/xh6uk.jpg](http://snag.gy/xh6uk.jpg)

~~~
tectonic
The default invite code is 'try-huginn'

------
SloughFeg
Is there an online sandbox anywhere to check it out? A project like this
simply calls out for their to be a live demo.

~~~
sundip
[http://runnable.com/Ut9idpiQaiM-AACi/huginn-example-setup-
fo...](http://runnable.com/Ut9idpiQaiM-AACi/huginn-example-setup-for-ruby-on-
rails-and-open-source)

------
fujipadam
This is awesome but is there a tool like this in php? I am looking for a easy
visual scraper

------
notastartup
what would be great is if each agent was somehow able to obtain it's own ip
address.

~~~
kej
With IPv6 there's no reason you couldn't do this, but what is the use case?
I'm not seeing what you could do with individually addressable agents that you
couldn't do otherwise.

~~~
notastartup
Can you elaborate how one can do this with IPv6? What hosts have this? How
many IP addresses can you get?

Basically for web scraping. If you had multiple threads and each of them had
separate IP addresses, you'd have a better chance than doing it with one IP
address.

~~~
kej
Just about any host with IPv6 support will assign you a /64 block which is way
more addresses than you'd need for this. Your case would then depend on the
site you're scraping supporting IPv6, though.

~~~
AznHisoka
Not really. Websites can easily block scrapers from the same /64 block, so it
doesn't matter if you got a 1000 different IPs.

------
psaintla
Am I missing something or is this just another rules engine?

~~~
perlgeek
Did you dismiss twitter with the same line?

Twitter is also just another rules engine, with pretty simple rules about
which tweets you receive. And yet it's also so much more. It's a platform, and
it's a social network. And it's something that many people love to use.

Who cares if it's just another rules engine under the hood?

~~~
psaintla
Twitter isn't really a rules engine as it doesn't have the ability to create
workflows or manage rule priority afaik but let's ignore that for a second.

Rules engines are typically a terrible idea and I say this as someone who has
worked at two large corporations, one a bank, where rules engines were heavily
used so they could avoid having the larger development staff they really
needed. Rules engines fail miserably every single time and eventually have to
be replaced.

The problem is that as time goes on people who don't know any better end up
writing larger and more complex rules and workflows without an understanding
of the side effects those rules generate. The end result inevitably becomes a
huge mess that is extremely fragile and nearly impossible to follow.

~~~
themoonbus
Yes, but for simple, personal rules that won't have any major repercussions if
they fail (e.g. email me when I get 10 likes on my last Instagram photo), I
can see how they'd be useful.

...not that my hypothetical rule is useful, but you understand what I'm
getting at.

~~~
psaintla
Absolutely. For simple, isolated, inconsequential tasks this works fine. The
problem is rules engines always start very innocent and simple, then users
request the ability to have rules call each other, then they want to store
results, etc. In an ideal world this is a good thing but I have yet to hear of
any place where rules engines, used at any significant scale, aren't a
complete disaster.

~~~
johnobrien1010
Is it typically a disaster because business users are given access to create
their own rules and they don't know what they're doing, or because the
complexity of the system grows to the point where that complexity is better
managed by other tools (version control systems, QA environments, rigorous
testing, etc.)?

~~~
psaintla
In my experience, both. It starts with business users creating rules without
having a clue what they are doing and it ends with the system becoming so
complex that it would have been better off being written by engineers using
tools more appropriate for the job.

Typically what happens is that business discovers they can now implement every
last feature they desire without getting any push back from engineering so
they go wild implementing new features without realizing the consequences.
There is no VCS, no QA, no testing. There is no one telling them they cannot
do something because it won't scale, it isn't secure or it won't be
maintainable.

Their only metric for success is that they get the result they want now and
the long term consequences be damned. Worse yet, every single person using the
rules engine is acting independently and not as a team. There is no code
review, when the rule works to their satisfaction it gets pushed into
production.

At first everything works fine and people get promoted for saving money on
engineering costs but then the rules start getting more complex, start
becoming composed of other rules, need to have more complex actions or need to
integrate with third party systems. Eventually the simple rules engine turns
into a bastardized programming language that everyone adds onto and never
modifies because no one understands how a modification will affect the 4000
other rules in the engine. At that point you end up having to do a complete
re-write, which is something I have had the displeasure of doing in the past.

~~~
larsf
I think businesses still need flexible / easy to use systems that allow end-
users to create solutions quickly. This may involve analysts, IT
professionals, and devs working together on the same platform. For examaple,
an IT pro writes the sql queries, an analyst writes the regression algorithms,
and the devs writes the output adapters.

Typically, by the time you get the dev team to fully implement the solution,
it has missed its mark and the analysts have moved on.

Players in the mashup landscape are "trying" to provide scalable and robust,
yet flexible and easy-to-use systems.

plug - flowreports.co is one of these ... and it can be self-hosted.

~~~
psaintla
Businesses have hundreds of flexible easy systems to let end-users create
quick business rule based solutions (particularly for reporting purposes) and
have had them since the late 80s maybe earlier, the corporate landscape is
littered with them. I wish you the best of luck, that's a tough market to get
into.

~~~
johnobrien1010
Thank you for the feedback. Perhaps if we add things like support for version
control systems, play well with releasing code to multiple environments, and
add UI elements that enforce cloning chunks of code so that changes are
isolated and the system can maintain it's coherence even among disparate users
we can avoid becoming some of those issues. Something to think on...

~~~
psaintla
That's definitely a step in the right direction. The biggest hurdle you'll
have to overcome is getting your application to enforce a process, that's a
lot harder than you think it is because people tend to take the path of least
resistance and process is rarely that path. You'll have to get buy-in from
very high levels of any organization you work with, otherwise things will
devolve quickly. Either way, good luck, I'll check out your product demo when
you've completed it.

