
Ask HN: I'm trying to represent an entire building as a graph - li4ick
This entire graph would have to represent every entity in the building, including the control nodes. It&#x27;s an IoT graph, where a node may represent the light switch, connected to a particular desk(also a node). This graph should also show directions from one entity to another.
For now, I have a photoshop pipeline which generates a simplified graph from a color coded image, using NetworkX. This is a temporary replacement for when the actual BIM comes along. But the graph remains.
Are there any libraries out there that would help me or should I just roll my own system?
I&#x27;m not sure I would like to add Neo4J to the stack. Storing everything to Postgres seems to work fine. It&#x27;s the in memory representation that I have problems with.
======
shoo
A thing to think about: often there are many different graphs you can create
to represent the same thing. Each graph representation can focus on or hide
different aspects of the real system.

Why are you trying to model the building as a graph? What are the use cases?
What operations do you want this data structure to be able to perform
efficiently?

It might turn out that a single graph (or any graph) is not the most effective
way of modelling an approximation of the real system.

How many nodes and edges will your graphs typically have? python & networkx
work okay for bashing out prototype code and may be good enough for MVP or
even a large number of releases if your data size is small and the operations
you perform on the graph are linear in the graph size (e.g. connectivity
checks, traversals)

I've also seen C/C++ codebases get pretty far by rolling their own domain
specific graph data structures-- eg define your own node and edge struct
types, give each node and edge pointers to the edges/nodes they connect to,
hack domain specific fields as necessary onto the structs. Then just implement
each graph algorithm as you need it. This may end up in an unmaintainable mess
after a few years, but I've seen this work well enough so that the product
based on this is worth enough money that there's enough cash to hire software
engineers to come clean things up!

Another thing to think about: are your graphs dynamic or static? If they are
large and static, there's lots in common between graphs and sparse matrices.
You can encode your graphs in memory in CSR or CSC like sparse matrix
formats-- no objects, just giant arrays full of indices. This isn't a good
idea if your want to dynamically add or remove nodes and edges, but it is
memory efficient.

~~~
jacques_chester
> _A thing to think about: often there are many different graphs you can
> create to represent the same thing. Each graph representation can focus on
> or hide different aspects of the real system._

This is usually the start of my argument in favour of representing data in a
relational form. Any one graph is _a_ projection of the domain. It privileges
certain reads and writes over others. You inevitably find queries and updates
that don't fit the graph's original shape and then suddenly it's a giant PITA
to work with.

If I build a project management system, I might have a graph that runs
[Project] -> [Workers] -> [Timesheets]. If I want to calculate the sum of time
on a particular project that's fairly efficient. But if I want to get the sum
of time for a particular worker, I will need to traverse every project looking
for them.

In a relation form I'd have [Projects] n..n [Workers], [Workers] 1..n
[Timesheets] and [Projects] 1..n [Timesheets]. When I need to sum in a
project, I join on that. When I need to sum on a worker, I join on that.
Neither is privileged over the other.

------
grzewarz
I working with big graphs. Over 1 bln entities & 40 bln edges.

Best open source graph database is ArrangoDB they have master to master
cluster. Fastest and best technology for graphs have commercial TigerDB, but
you must pay >300k annually.

Networkx is great but loading full model - but for what you doing should be
enough. :)

~~~
doh
ArangoDB is fine, but dgraph [0] can scale easily to that size (we have order
of magnitude larger graph). In reality, you can use any resilient key/value
store and use hexastore [1] as a storage format.

For instance JanusGraph [2] has support for a lot of different backends, built
by the old team behind TitanDB.

[0] [https://dgraph.io](https://dgraph.io)

[1]
[http://karras.rutgers.edu/hexastore.pdf](http://karras.rutgers.edu/hexastore.pdf)

[2] [https://janusgraph.org](https://janusgraph.org)

------
alkonaut
Have you looked at just using IFC? It’s the canonical “building as a graph”
BIM format. Whether the in memory IFC allows querying I suppose is up to the
implementation, I have only used xbim
[https://github.com/xBimTeam](https://github.com/xBimTeam)

~~~
aothms
There is also IfcOWL, the RDF (semantic web / linked data) representation of
IFC. It'll allow you to use standard tooling (e.g. SPARQL and storage
engines). [https://technical.buildingsmart.org/standards/ifc/ifc-
format...](https://technical.buildingsmart.org/standards/ifc/ifc-
formats/ifcowl/)

------
motohagiography
Question with graphs is whether you want it to be a representation of a
dataset, or persist the graph as the data itself.

For persistence, I use Neo4j to represent hundreds of dynamic graph
ontologies, and I use the hosted version on graphenedb, which has worked just
fine for my purposes.

For some views, I just use NetworkX to generate interactive d3.js pages from
data I have queried from the graph, or python/flask with py2neo to generate
json for d3 visualizations. Some others use cytoscape for visualization, but I
find that a bit dramatic for most purposes.

Depending on how you would like to represent it, I can also recomment
Webprotege and WebVOWL, since the graph you are creating is also in effect an
ontology.

------
hugofirth
Using Neo4j community edition would certainly make your life easier in terms
of the queries you might want to express, but for this amount of data you can
get away with postgres, as you observe.

If you want a simple in memory graph modelling library then check out Apache
Tinkerpop. Its great.

------
breck
I would recommend storing your data using Tree Notation in a Tree Language.
That way your data is as simple as possible. You could then compile it to
whatever format is needed by your downstream tools. Here's a quick start:

[http://treenotation.org/sandbox/build/#grammar%0A%20nodeType...](http://treenotation.org/sandbox/build/#grammar%0A%20nodeType%20graph%0A%20%20root%0A%20%20description%20A%20new%20language%20for%20graphs%20in%20response%20to%20an%20HN%20comment.%0A%20%20catchAllNodeType%20catchAllError%0A%20%20inScope%20node%0A%20cellType%20keyword%0A%20%20highlightScope%20keyword%0A%20cellType%20intCell%0A%20%20highlightScope%20constant.numeric.integer%0A%20cellType%20deviceTypeCell%0A%20%20enum%20Router%20RaspberryPi%0A%20cellType%20nodeIdCell%0A%20%20highlightScope%20meta.annotation.identifier%0A%20nodeType%20node%0A%20%20cells%20nodeIdCell%0A%20%20firstCellType%20keyword%0A%20%20inScope%20deviceTypeNode%20floor%20connectsTo%0A%20nodeType%20floor%0A%20%20cells%20intCell%0A%20%20firstCellType%20keyword%0A%20nodeType%20connectsTo%0A%20%20cells%20nodeIdCell%0A%20%20firstCellType%20keyword%0A%20nodeType%20deviceTypeNode%0A%20%20match%20type%0A%20%20cells%20deviceTypeCell%0A%20%20firstCellType%20keyword%0A%20nodeType%20catchAllError%0A%20%20baseNodeType%20errorNode%0Asample%0A%20node%20thermometer%0A%20%20floor%201%0A%20%20type%20RaspberryPi%0A%20%20connectsTo%20router1%0A%20node%20router1%0A%20%20type%20Router%0A%20%20floor%201)

------
eternalban
This reminded me of drafting electric diagrams in plans and elevations. And of
course the notions of a 'plan' and and 'elevation' are also front and center.

Why were not graphs embraced by other domains? Plan and elevations by
definition are constrained view points. (Hint: for the same reason the highly
available and highly consistent flavors of graph databases require paid
licenses :)

Graphs make for difficult decomposable 'unit' assemblies. Plans and elevations
are 'standard units'. (Again: it may help to think of as plans and elevations
as tables and reverse indexes, respectively.)

Note requirements such as e.g. "show directions from one entity to another"
are also present for documenting the electrical systems, or HVAC, in a
building.

A modular system for representing graphs of arbitrary scale is the minimal and
trivial 'one node per modular unit'. The alternative is throwing huge amounts
of processing power to allow arbitrary views into a graph at any scale.

IoT, as an 'integrated component' of building systems, will find a very happy
place on plans and elevations.

------
agentultra
Postgres works well. Neo4j. Persistence is pretty well solved.

So is the in-memory representation! Have you thought about using lazy
structures? The graph can be conceptually infinite in size but your program
only loads the pieces being used as they are needed and offloads old ones that
are not.

------
parabiii
May be useful:

[https://github.com/eBay/beam](https://github.com/eBay/beam)

[https://github.com/gchq/Gaffer](https://github.com/gchq/Gaffer)

------
chaz6
I do not know if it is right for your use case, but you could have a look at
[https://github.com/Tulip-Dev/tulip](https://github.com/Tulip-Dev/tulip) for
visualization.

------
winrid
What kinds of queries/aggregations. Write throughput? Reliability
requirements?

------
rat9988
As you talked about BIM, I guess you are already aware of IFC. The real
question is that your use case isn't very clear to me. What are you trying to
achieve? Visualize a graph?

~~~
li4ick
Visualization is a small part, yes. I'd like to control the building. An
employee can enter a room and the system would do stuff. I can say that "it's
kinda dark here" and the system would either open the blinds or switch on the
light. Depending on a lot of parameters. The thing is, the BIM doesn't have
the interior elements such as desks and printers, just the layout of the
floors. That's why my current hack is to have a color coded floor plan which
generates everything. The graph is always changing throughout the day, and
there are a lot of ML algorithms running in the background, including graph
classification. That's why I'm trying to represent everything as a graph.

------
espeed
See GraphBLAS [http://graphblas.org](http://graphblas.org)

Previous:
[https://hn.algolia.com/?query=GraphBLAS&sort=byPopularity&pr...](https://hn.algolia.com/?query=GraphBLAS&sort=byPopularity&prefix&page=0&dateRange=all&type=all)

------
pvaldes
As usual, TeX can help. See Tikz and PGF. If you want to represent a building
you could need to draw electric circuits. See also circuitikz.

[http://www.texample.net/tikz/examples/pressurized-water-
reac...](http://www.texample.net/tikz/examples/pressurized-water-reactor/)

------
taherchhabra
For our marketing analytics product, we are using AWS Neptune and very happy
with that. First we started out with Azure cosmos graph, but due to incomplete
tinkerpop support many queries were not supported. You can give AWS Neptune a
try

------
mhh__
How many vertices?

------
stanislavb
For an in-memory graph representation, you can have a look at Redis Graph
[https://oss.redislabs.com/redisgraph/](https://oss.redislabs.com/redisgraph/)

------
wiradikusuma
I'm currently tinkering with graph using
[https://dgraph.io/](https://dgraph.io/) \-- soon it will support GraphQL.

------
baybal2
May I know for what purposes do you do that?

I'm doing something quite similar at the moment

------
contingencies
graphviz

~~~
pvaldes
Graphviz is fine, specially for small graphs, but probably not what is
required here. Plain pdf could be less painful if your graph program choose
automatically reorder the data as it wants to optimize shorter paths (mixing
nodes in second and fifth floor and then drawing first floor above the others
for example is not what you could want).

It depends on what is really needed there. If is interactive or not and how
complex. To draw a very complex building, something that will not place nodes
and move lines around by itself (Autocad, etc) could be easily the weapon of
choice here.

~~~
contingencies
Graphviz is a family of tools and a file format.

Being text based it is easily manipulable and readily scalable to many nodes.
It also supports subgraphs.

I am surprised at being downvoted when suggesting what is quantifiably the
most mature tool in the space.

PDF is a presentation format, and has nothing to do with graphs in the sense
of data structures. The fact that the original question references photoshop
shows that the person asking is not approaching the problem from a formal
standpoint. This is asking for trouble.

Frankly, I am not sure a graph is the right data structure for this problem.
It is more likely that a general purpose database would fulfil current and
future requirements more effectively. Graphs could be generated from that
trivially as representations.

~~~
pvaldes
I use graphviz often and I'm very happy with it. I'm aware of the cluster
tool. Still sometimes I miss some things that graphviz does not have currently
(and lisp, python or R could provide). If you need to represent a node in form
of a computer, a switch or a room; graphviz is limited in that sense.

I agree in the database focus

