
HostDB - aritraghosh007
https://github.com/Flipkart/HostDB
======
eLobato
It sounds like it's a lightweight version of the Foreman [1] with less
integrations for virtual servers and configuration management? In any case I
starred it to keep track of future updates...

Just as a curiosity, CERN uses something similar to this (LANDB) as its source
of truth for the physical infrastructure. [2]

[1] [http://theforeman.org/](http://theforeman.org/)

[2] [http://landb.sourceforge.net/](http://landb.sourceforge.net/)

~~~
gopalv
This keeps getting reinvented, because using LDAP for this sucks.

Google had slack, which mapped MAC address -> roles

Yahoo had Igor, which did nearly the same (which used yinst, which had package
activation for rollbacks)

Then we saw Puppet/Chef do the same thing, except with conf files you can
check in & version control.

This was reinvented again in Yahoo! for hadoop with Zookeeper, for an even
shorter life-span roles. Also now with zookeeper doing elections and such,
within itself.

Zynga used Zookeeper to build their own role management system, but added the
Igor like features into it and started using it as an C&C master to push
file/code updates with.

I see that possibly several more people will reinvent it, because these
solutions start out as something simple and grow out into an feature-creep
mess & then someone reinvents an easier solution, rinse repeat.

And all of them will be the right answer, except for people who are not using
it :)

~~~
njt
HackerDB was mentioned in the Q&A session after one of the config management
talks at the Usenix LISA conference last week.

I'm glad there are tools like this being actively developed.

I think perhaps the reason they keep being redeveloped is that the current
crop of open source CMDBs doesn't meet needs, so I gladly welcome projects
like this.

------
cryptolect
I've been trying out a few different options in this space. I'm doing work
with Docker, and in certain scenarios I'd like my containers to 'learn' their
configuration at startup. I initially spent some time trying out Etcd which
came close to fitting the bill, then Redis, then I just packed it all in and
rolled a rest-service for them to talk to (the config data was pretty small).

My particular need has come close to disappearing since the advent of Docker
linking, but linking doesn't help when containers have an unscheduled restart.
If one of your app's dependencies (ie database container) IP has changed post-
startup, you need a way for your app to learning about it. That's where I see
the value of things like Etcd and HostDB.

------
xorgar831
What would be more interesting is a modern framework and standard around
querying inventory information. Something that's min implementation
requirement is simple enough for consumer products to implement, and scales up
to enterprises. And of course supports handling legacy/current discovery
methods as well.

Seems like the main issue would be there's only limited case where inventory
systems are currently needed, and the existing systems aren't completely
broken just yet. But you can imagine at some point, being able to track and
manage all of the devices in your home will be a big issue, and there may be
an incentive for companies that produce enterprise and consumer goods to not
have to develop and maintain duplicate standards. At that point it seems like
it would drive some more innovation here.

In the meantime, my preference is to query data in real time form its source
(e.g. AWS, VMware), vs. syncing data to a CMDB like database. Unmanaged
physical inventory of course still requires a CMDB like database.

------
cookrn
Tumblr has a tool named Collins, which they refer to as a configuration
management database (CMDB).

[http://tumblr.github.io/collins/](http://tumblr.github.io/collins/)

------
rohitnair
Flipkart had a blog post about this, which has some background, screenshots
etc.

[http://tech-blog.flipkart.net/2013/10/hostdb/](http://tech-
blog.flipkart.net/2013/10/hostdb/)

------
senthilnayagam
good to see a indian startup open sourcing one of their internal tool.

only question why perl and fcgi? why not golang, python or ruby for it

I last used fcgi 6-7 years back when that was the only option to host rails
apps.

~~~
X-Istence
Why not FastCGI? It is just another way to connect from the web server to a
long running app, just like WSGI is...

~~~
senthilnayagam
most modern apps and frameworks dont recommend FCGI

Rails deprecated its use 4 years back (
[https://github.com/rails/fcgi_handler](https://github.com/rails/fcgi_handler)
), FastCGI is deprecated in Django as well(
[https://docs.djangoproject.com/en/dev/howto/deployment/](https://docs.djangoproject.com/en/dev/howto/deployment/)
)

common pattern is a reverse proxy to a C based server with or without evented
loop with some reverse proxy/load balancer, or some compiled implementation in
jvm, golang, beam etc

~~~
X-Istence
I am familiar with Python ... and I can understand that FastCGI was never
really supported in the first place there, especially since Python has WSGI.

I am not as familiar with Ruby.

But just because "modern apps and frameworks" don't recommend it doesn't mean
it is a bad protocol. PHP with PHP-FPM is awesome and well supported. Fast CGI
still has a place in this world, and I don't see why just because some
apps/frameworks have decided it isn't worth their time to support it, it would
be an invalid choice to make for a new app...

~~~
senthilnayagam
perl is a language which has not seen any new adoption by developers, same is
true about fastcgi. they were the best/state of the art 5-10 years back

was it a design decision or they chose it as no alternate mechanisms/libraries
is available in perl.

these decisions are important to know for a open source project, it will
determine its adoption, number of contributors, and will it flourish

~~~
draegtun
_> perl is a language which has not seen any new adoption by developers..._

This is incorrect. Perl is still seeing good adoption by new developers.

Now your argument could be that it probably not at same rate compared to some
of the _top_ languages around at this moment but Perl is still growing.

Some facts...

\- CPAN continues to grow -
[http://cpants.cpanauthors.org/stats/uploads](http://cpants.cpanauthors.org/stats/uploads)
|
[http://cpants.cpanauthors.org/stats/authors](http://cpants.cpanauthors.org/stats/authors)

\- According to this tweet Perl still produces more new repos on Github than
languages like Scala, Clojure & Go -
[https://twitter.com/dberkholz/status/395922559151009792](https://twitter.com/dberkholz/status/395922559151009792)

 _> was it a design decision or they chose it as no alternate
mechanisms/libraries is available in perl_

No idea what their decision criteria was however they are plenty of
alternative mechanisms/libraries in Perl.

See PSGI/Plack for Perl's _state of the art_ solution -
[http://plackperl.org](http://plackperl.org)

------
threeseed
Looks like a lightweight ZooKeeper.

[http://zookeeper.apache.org](http://zookeeper.apache.org)

------
thu
This seems to be a use-case for `etcd`.

------
babo
chef or puppet has very similar functionality, where is the target audience of
this tool?

~~~
IgorPartola
Can you query them for servers of a specific type? This sounds more like a
service registry than a configuration management system to me.

~~~
lobster_johnson
Puppet has an inventory system. They call it PuppetDB, but it's basically an
interface to an SQL database (eg., PostgreSQL) where the main Puppetmaster
process will stores all the information it compiles from manifests. It's a
simple schema which allows you to easily query all the hosts, resources and so
on.

An interesting side effect of PuppetDB is that it can be exploited within the
manifests themselves, via what's known as "exported resources" [2], allowing
nodes to gather information from each other. For example, a web server module
can declare that it exports a URL endpoint that needs to be monitored. Then
the Nagios module can declare that it wants to know about all URLs to be
monitored. The syntax is a bit weird, but it's a rather elegant system.

[1]
[http://docs.puppetlabs.com/puppetdb/1/](http://docs.puppetlabs.com/puppetdb/1/)

[2]
[http://docs.puppetlabs.com/puppet/2.7/reference/lang_exporte...](http://docs.puppetlabs.com/puppet/2.7/reference/lang_exported.html)

------
brokenparser
Congratulations, you have reinvented DNS. You can even pretty much
s/HostDB/DNS/g:

    
    
        DNS: an old tool to help manage data center inventory and write
        applications around it.
    
        DNS is our attempt to solve the problem of finding hosts and
        their purposes in a large environment. DNS acts as a Single
        source of truth about all Physical and Virtual servers and is
        used to define their purpose. It helps us group our servers
        through domains and all the software written by the operations
        team revolves around DNS. DNS acts as the centralized
        configuration store for all sorts of information.

~~~
subliminalbrad
You must not have read past the introduction. I just skimmed it, but it's
clear that this stores considerably more information than DNS.

~~~
vidarh
DNS can store whatever information you want to put into it.

I agree it's not necessarily going to be that practical for larger items of
data, though.

LDAP would be a better comparison, though I can't say I fault people for
reinventing LDAP on a regular basis given how annoying it is to work with.

~~~
brokenparser
It can be practical for very large records if, say, you use a NAPTR record to
point at a web server which serves the actual contents. It just requires a
little out-of-the-box thinking. They can both use the same SQL backend so
everything is still in the same database.

In this regard, DNS isn't broken and doesn't need fixing. HostDB seems
especially obnoxious since it also stores the FQDN and IP address of each
host, information which ends up being duplicated in DNS.

