
Show HN: The Codex – a graph database project for the digital humanities - argimenes
http://the-codex.net
======
argimenes
In 2015 I quit my full-time job as an ASP.NET developer to build what I think
of as an "atlas of history" for the Italian Renaissance. It is a semantic-web
style database build in Neo4j, .NET MVC, and KnockoutJS, and is an attempt to
build a map of historical events and personalities for the digital humanities.

[http://the-codex.net](http://the-codex.net)

I am currently the sole developer, product designer, and researcher on the
project -- but I am looking for collaborators who would be willing to help me
take this further.

As an "atlas of history" is a broad concept I decided to give the project
clarity by focusing primary source documents from the Italian Renaissance. The
two main sources at present are the 'Florentine Diary' of Luca Landucci and
the letters of Michelangelo. I have entered about 40 years worth of entries
from Landucci's diary and a good portion of Michelangelo's early letters from
his Roman period. In the process I have added hundreds of historical
personalities, places, artworks, etc., in order to give the user real data to
work with. I have also built various screens with data visualisation tools to
mine the historical events. And of course I have built an extensive back-end
for managing the data and relationships.

Is anyone interested in helping me out? I'd love input from anyone with an
interest in art history and graphic design, or data visualisation tool, Neo4j,
or anyone who wants to help me research and enter data.

Feel free to email me any time at: iian.d.neill@gmail.com

In the meantime, why not check out the Control Panel on Leonardo da Vinci's
dataset. Clicking any links in the text will load the datasets for those
entities; or you can search for them by name. Why not try adding
Michelangelo's dataset to the mix? You can then switch between the three data-
vis modes at the bottom, fiddle with the date filters, etc.

[http://the-codex.net/Time/ControlPanel](http://the-
codex.net/Time/ControlPanel)

Many thanks, Iian

~~~
goldfeld
I appreciate what you've built and I'm interested in helping out. What is the
way going forward for the project? Do you plan on monetizing it? Is code
eventually gonna be public?

~~~
argimenes
Hi,

Thanks for your interest! I have no plans for monetizing the project, partly
because I cannot see a meaningful way to do that, but mostly because it is a
research endeavour. The code is currently public on my BitBucket repo, and
I'll shoot through the URL as soon as I can.

Basically, I want to explore the limits of how graph databases can be applied
to the visualisation of history. My immediate goals are to:

(A) expand the range of datasets that are collected; (B) improve the tools for
entering and annotating the data to make the process quicker and more
automated; (C) to add more powerful data visualisations; and (D) to complete
the input of the primary source datasets mentioned and add more.

I want to build a graph database time machine of the Italian Renaissance so
you can pick a day in history and see what was happening across Italy with
various sources, see events plotted or even animated on a map (e.g., watch a
battle unfold or even weather events), etc. The system is underpinned by
subject tags which are taxonomically organised in a 'is-a' hierarchy, which I
hope will provide a 'semantic zoom in/out' functionality.

I understand that the current platform of C#, ASP.NET MVC, KnockoutJS may be a
turn off ... And I am open to other technologies if that is crucial. But
conversion to another platform would certainly take a lot of time and
manpower. But to be honest I think the hard work is mainly in the product
design, the visualisations, the data-entry, and the Neo4j Cypher queries.

What kind of collaboration did you have in mind?

~~~
goldfeld
I could whip up an alternative frontend implementation if you'd like, I've
been looking for a project where to flex my ClojureScript/Om Next (based on
React) muscles. I'm very experienced both with Clojure and with the frontend,
and with interface design. I actually worked with KnockoutJS (in a C# shop I
used to work at) many years ago, but yeah I don't have fond memories of
MVC/MVVM for large projects/efforts, the React model speaks more to me.

~~~
argimenes
ClojureScript and React sound interesting to me, too, and might be a good fit
for the rich UI Control Panel page, which will need to handle large JS data-
sets and update visualisations quickly. I've been finding that KnockoutJS is a
bit sluggish when wiring up the filters as computed function observables. It
could be something in my filter code, also, but I have seen benchmarks where
KnockoutJS doesn't perform well in.

We should talk further ... drop me a line at my email? iian.d.neill@gmail.com

------
ilya1
Hi Iian, great work!

I built recruitment CRM with cached 1,000,000+ nodes and can imagine how much
time and efforts you invested in your project.

I assume that .NET MVC and KnockoutJS was your first choice as you get used to
it ... and it's fine.

I personally found a lot of frustration dealing with Neo4J full-text search in
multiple languages English, Russian and Japanese. There was no clear guideline
on Lucene integration at the time.

How is your experience with Neo4J? Do you have to sync Neo4J data with
traditional databases like M$ SQL?

Do you want to make your project as community efforts or see it as potential
business?

The last question is much more important. I assume that a lot of developers
won't be excited about .NET or KnockoutJS staff, however they might consider
to share some crawling data, D3.js graph visualizations code, etc with you.

~~~
argimenes
Hi Ilya,

Thank you for your kind words and your excellent feedback re: the chosen
platform and the OSS community. To be honest I hadn't thought about the
language and platform angle until you mentioned it, but it is definitely
something I may need to consider. C# is a mature language comparable to most
other quasi-imperative/quasi-functional ones so conversion I don't think would
be hard so much as time consuming. The brain of the system is ultimately in
the data structures, the Neo4j Cypher queries, and the product
design/architecture.

Can I ask what technologies you think would be most likely to attract
community involvement, but wouldn't sacrifice static typing? Perhaps some Java
framework for the back end, Angular or React for the clientside framework? I
would not be able to immediately convert the project to those technologies,
though, given the code count, but something to keep in mind ...

Many thanks, Iian

------
argimenes
I also recorded a screenshare presentation of how The Codex works:

[https://www.youtube.com/watch?v=_R0ESfLBuHo](https://www.youtube.com/watch?v=_R0ESfLBuHo)

------
tuvalie
Hey, Iian. Not sure if this will be useful to you, but you might consider
checking out [http://endlessorigins.com/](http://endlessorigins.com/) (the
largest structured collection of human events, available for download as a
single TSV file). And good luck with your project! :)

~~~
argimenes
Hi Tuvalle, thank you for your encouragement and for the link to this
fascinating resource! Can't wait to open up the dataset and check it out soon.
May be possible to import this into the Codex if the data structures are
broadly compatible ... :-)

