
Apache Jena - pplonski86
http://jena.apache.org
======
anon1253
A lot of "blast from the past" comments! But as a counter example, we still
use Jena extensively (in combination with its server Fuseki) to deal with
biomedical ontologies and taxonomies for use in Natural Language Processing
(e.g. Human Phenotype Ontology, Gene Ontology, etc). We even recently made the
step to add some PROLOG-ish inference rules [1]. I have nothing but love for
the RDF ecosystem, in the sense that I love their ideas even if some of the
implementations are a bit wonky. For example, the performance of property
paths (i.e. the + and * operators in sparql) leave things to be desired
sometimes. Not to mention the funny looks from some devs when you say you use
RDF, but I take that as a badge of honor! It took me a while to get it, so I
finally decided to write down what it all means a couple of years ago [2].

[1]:
[https://jena.apache.org/documentation/inference/index.html](https://jena.apache.org/documentation/inference/index.html)

[2]: [https://joelkuiper.eu/semantic-web](https://joelkuiper.eu/semantic-web)

------
MrBuddyCasino
A blast from the past. Semantik Web used be a good buzzword to receive EU
research grants, and RDF tripel have been useful in some areas. What else
remains?

~~~
salex89
I chuckled on the EU research grants part. My God, when I remember the
monstrosities I was thinking up just to bodge in Semantic Web/Linked Data.
Those guys are purely fueled by buzzwords. Maybe we met somewhere :) .

"Semantic data is the future and always will be" \- unknown author.

~~~
CeiII
"Semantic data is the future and always will be" \- unknown author."

\- Pretty sure it was Peter Norvig

------
andrewstellman
I'm seeing several people asking what RDF is useful for. If you're curious, I
use it for basketball analytics:
[https://github.com/andrewstellman/pbprdf](https://github.com/andrewstellman/pbprdf)

Here's an article about my system, pbprdf: [https://www.zdnet.com/article/nba-
analytics-and-rdf-graphs-g...](https://www.zdnet.com/article/nba-analytics-
and-rdf-graphs-game-data-and-metadata-evolution-and-occams-razor/)

And an example of its use:
[https://gist.github.com/andrewstellman/4872dbb9dc7593e56abdd...](https://gist.github.com/andrewstellman/4872dbb9dc7593e56abddbe8b998b509)

Here's an example of what the RDF files generated by pbprdf look like:

Here's the ontology, which defines the vocabulary it uses:
[https://github.com/andrewstellman/pbprdf/blob/master/generat...](https://github.com/andrewstellman/pbprdf/blob/master/generated/ontology.ttl)

And this is what the data looks like:

    
    
      <pbprdf/games/2017-11-29_Warriors_at_Lakers/230> pbprdf:shotPoints "3"^^xsd:int ;
            pbprdf:shotAssistedBy <pbprdf/players/Klay_Thompson> ;
            pbprdf:shotType "26-foot three point jumper" ;
            pbprdf:shotMade "true"^^xsd:boolean ;
            a pbprdf:Shot ;
            pbprdf:shotBy <pbprdf/players/Stephen_Curry> ;
            a pbprdf:Play ;
            pbprdf:forTeam <pbprdf/teams/Warriors> ;
            pbprdf:inGame <pbprdf/games/2017-11-29_Warriors_at_Lakers> ;
            pbprdf:time "10:23" ;
            pbprdf:period "3"^^xsd:int ;
            a pbprdf:Event ;
            rdfs:label "Warriors: Stephen Curry makes 26-foot three point jumper (Klay Thompson assists)" ;
            pbprdf:secondsIntoGame "1537"^^xsd:int ;
            pbprdf:secondsLeftInPeriod "623"^^xsd:int .

~~~
simo-climb
Hi Andrew

Thanks for posting your project in full. Too much of RDF/linked data is in the
abstract, to big to see the moving parts, or behind propriety doors. I'm at
the beginning of the learning curve so it's much appreciated and quite a
number of things about the data workflow clicked in - nice to see a graph and
instance create process in rd4j.

I'm wondering if you've come across an approach to push the outputs of
quantitative sparql queries such as your shot points% to a visualization
tool..but, I'm looking for a semantically aware approach.

So as to be informative to this forum.. what do I mean... I'm not talking
about a basic flow of the output a flat file (e.g. csv) and digest by a
generic tool - take the pick of zillions of libraies here, but Power BI is my
current bug bear where Microsoft has sold the promise of self serve BI but
leaves everyone else to manaage the chaos of cleaved, chewed and duplicated
data and fragile and disconnected calculation(DAX measure) code base.

So what am I looking for ?

Let's call them "measures", but in the rdf construct they are sparql queries
as you've documented so well. The measure operates on data that meets
constraints of it's type and cardinality amongst other things, but which has,
if required been automatically changed as to the conforiming "pattern" using
constraint rules. I then build my client application with visuals e.g. a chart
or map that displays the sparql query results. The visual changes based on
properties or constraints on that data. More over the acutal measure is stored
with the data and encapsualted in the client application. Plus it has full
provenance also included. __I note here that general ontology and instance
visualization tools abound, but not what you could call BI tools for charting
etc.

I know these have been concevied and prototyped before. See:
[https://composing-the-semantic-
web.blogspot.com/search?q=cha...](https://composing-the-semantic-
web.blogspot.com/search?q=chart)

I have been building my skills and work flow in a team that's adopting shacl
and spin rules to drive data ignestion through to interfaces in the Topbraid
tool set. The space is coming along but for this use case of charting and
visualizing seems to have stalled, with the above UISPIN work now deprecated
and waiting... maybe for shacl and some shacl javascript mappings to come to
the rescue.

I've found some interesting new work using webcomponents (polymer/LitElement)
that makes sense: [https://blog.resc.info/reboot-of-using-interface-
encapsulati...](https://blog.resc.info/reboot-of-using-interface-
encapsulation-to-listen-to-linked-data-predicates/) But it feels a long way
away for me to tackle conceptually and skills wise with yet another code
framework to get on top of.

Hoping you've seen some potential paths mate.

Cheers

Simon

------
amirouche
I think Resource Description Framework (RDF) is overkill.

The underlying idea that is triple store (reminiscence of Entity-Attribute-
Value model) is a good idea because it allows to model data with less overhead
than a graph database will do over the same problem. Think of a list of items
attached to a node or hypergraphs. All that is easier to do in a triple store.

Actually, I think that triple stores are not given enough buzz. Most of RDF
buzz is around ontologies (aka. standard vocabulary for describing things).
Datomic prooves that triple store is a great idea in itself.

Datomic is RDF in disguise. That is, it implements a versioned triple store
and a language similar to SPARQL (based on core.logic (aka. clojure's
minikanren))).

When you think about it, a versioned database is a gem when it comes to
debugging. Versioning a database is next to the best idea of the decade and
that would not have been possible with another model than the triple store
model.

The idea of database versioning or more generally versioning of structured
data, especially versioning ala git is making its way through academia (see
[https://project-hobbit.eu/](https://project-hobbit.eu/)) and outside academia
(cf. [https://qri.io/](https://qri.io/)) to help with the vast amount of data
that is flowing.

~~~
jacques_chester
> _When you think about it, a versioned database is a gem when it comes to
> debugging. Versioning a database is next to the best idea of the decade and
> that would not have been possible with another model than the triple store
> model._

This is a pretty bold claim. Why isn't it possible with another model?

~~~
amirouche
Well, you are correct. It is possible to implement historisation / audit trail
in other database models. Sorry. Maybe I should blog about it :)

------
openasocket
I actually stumbled across this a couple weeks ago, while thinking about the
design for a project. I'm working with threat intelligence/security data and I
need a nice way for users to save/share that data with the team, as well as
query it in a simple fashion. People are pushing for a graph database. So I
did some research and learned about RDF, OWL, Jena, etc.

I'm still not sure if this is the right road to go down. Just from reading
papers it's hard to tell how much of this is hype and over-engineering and how
much is solid. Anyone have some RDF/OWL/Jens/SPARQL stories they want to
share?

~~~
ktk
Unfortunately a lot of stuff going on in the RDF domain is behind firewalls so
it's a bit hard to give a lot of details. But I can contribute some public and
some private use-cases of where RDF is used:

Refinitiv (formerly Thomson Reuters Financial and Risk) knowledge graph is
built completely on the RDF stack:
[https://www.refinitiv.com/en/products/knowledge-graph-
feed](https://www.refinitiv.com/en/products/knowledge-graph-feed)

When I talked to them in late 2017, they told me they have 100 billion triples
in their database, plus more in a versioning back-end. Their triplestore is
open source: [https://github.com/CM-Well/CM-Well](https://github.com/CM-
Well/CM-Well)

Several government-agencies all over the world start to build public RDF
knowledge graphs. I'm closely involved in the one from the Swiss government,
see my presentation from last week [http://presentations.zazuko.com/Swiss-LOD-
Platform/](http://presentations.zazuko.com/Swiss-LOD-Platform/)

There are similar projects in other countries like the Netherlands, Belgium,
UK, etc. This stack makes a _lot_ of sense for open data, as you can do some
pretty crazy queries without spending 2 days on preparing your data. See for
example the Swiss Open Data Advent Calendar of 2018:
[https://twitter.com/linkedktk/status/1076064066525949952](https://twitter.com/linkedktk/status/1076064066525949952)

As I said there are many "behind the firewall" use-cases where people use the
stack exactly because of its features like OWL. Yes it comes at a price
(bootstrapping is not really super easy) but this is stuff we will still run
in 40 years from now. I see it in:

Finance: Fraud detection, compliance, customer 360° views, ... Stardog
([https://www.stardog.com/](https://www.stardog.com/)) lists Moody's, BNY
Mellon and National Bank of Canada as customers, last week I've met someone
from Credit Suisse which is Mr. RDF there. * Production: You have a ton of
databases containing products you create but there is no way to figure out
what a final product consists of as the data is scattered across at least 5 of
them. The automotive supplier I talk about here is using RDF to get that view.

Life sciences: The largest RDF dataset available to the public is UniProt and
related datasets. In total they provide a SPARQL endpoint (RDF database) with
50 billion (!!) triples available. This is a highly popular dataset and is
used in pretty much every larger pharmaceutical enterprise as well. See
[https://www.uniprot.org/](https://www.uniprot.org/) as a starting point. I
know at least of one large life sciences company that just recently decided
that RDF will be the base of all future data unification standards within the
organization.

Insurance business: One of our customers is using RDF to unify a ton of
different systems and get the 360° view as well about their customers.

RDF is an absolutely amazing stack and I do not see anything else available
that gets remotely close to the power of it. The day I find something more
powerful, I will be the first using it. But most of the time people dismissing
RDF have zero clue about what it really can do.

~~~
cmutel
I am part of a team building an RDF database to be used for environmental
footprinting and industrial ecology
([https://github.com/BONSAMURAIS](https://github.com/BONSAMURAIS)), and am
also slowly becoming part of the Swiss open data scene - I would appreciate a
chance to chat with you about your experiences!

For us, RDF seems like the only technology that can easily adapt to the large
number of data types that we envision collecting.

~~~
ktk
sure, more than happy to. You will find me at @linkedktk on twitter or
adrian.gschwend @ zazuko . com

------
mark_l_watson
I have been a fan of RDF/RDFS/OWL for a long time. I started adding RDF in
various forms to my main web site shortly after TBL, et al wrote the original
Scientific American article. I have also written two semantic web/linked data
books.

The uptake for the semantic web has been spotty. I was hired as a contractor
at Google to work with the Knowledge Graph and over the years I have had a lot
of consulting work in related areas.

That said, if I were building a custom Knowledge Graph for a company or
organization today, I would likely use a graph database like Neo4J. Maybe not
though - it would depend on the application.

~~~
blablabla123
Really? Why not Jena or Marmotta with Postgres backend? I have little
experience but just started setting up the latter, therefore I'm curious.

~~~
zwifi
I can't speak for OP, but in my humble opinion, they don't really serve the
same purpose. Jena and Marmotta are implementation of standards, while Neo4J
is more of a proprietary system, and the underlying paradigm is different
between triple graphs (for Jena and in a lesser dimension Marmotta) and
property graphs (for most graph databases, e.g Neo4J, Apache Tinkerpop,
OrientDB...).

To make a very short summary, RDF is more concerned with the possibility to
link data, so each piece of information is identified by a dereferencable URI,
and can be described with an explicit model called an ontology (fancy word for
a vocabulary used to describe data). On the other hand, Neo4J is more
concerned with performance, but does not consider linking data accross the
Web, or using an explicit schema. And for Marmotta, it is in kind of a bad
place right now, the development seems a bit stalled, and the standard it
implements is quite complicated compared to the majority of the problem it
solves. This might evolve however, since Linked Data Platform (said standard)
is now promoted by SOLID
([https://solid.inrupt.com/](https://solid.inrupt.com/)), a new initiative by
Tim Berners Lee et al. to enable a truly distributed Web.

~~~
jimmy_ruska
I don't see what you can't do with neo4j, neptune, datomic. Neptune offers a
sparql interface. You could easily specify an explicit schema. If you wanted
to make an ontology that's globally unique you could force it to be defined at
a centrally defined application level. If you wanted a inferring traversal,
for example applying some type of hierarchy, you could write your own iterator
in java in neo4j if you wanted to, or just apply multiple types to the vertex.

------
azatris
What even is Semantic Web in modern terms?

~~~
jimmy_ruska
Things like json-ld, microdata, rdfa
[https://developers.google.com/search/docs/data-
types/product](https://developers.google.com/search/docs/data-types/product)

Wikidata also has data in rdf
[https://wikidata.org/wiki/Wikidata:Database_download#JSON_du...](https://wikidata.org/wiki/Wikidata:Database_download#JSON_dumps_\(recommended\))

I wonder what is the killer feature to be had. If you want highly connected
data, you can use graph databases like neptune, neo4j, datomic. If you want
logic programming, you still have swi-prolog, or something like picat,
eclipse, mercury, which can easily model triples or custom ontology. There's
also apache tinkerpop and similar which give querying a more object oriented
feel. I see prolog can interop with Jena, but if it can, why not parse & query
rdf/owl in prolog itself. Can't prolog do everything sparql can.

What is the value offering on apache Jena?

~~~
tannhaeuser
RDF, and in particular OWL2 (reformulation of RDF tech based on description
logic) is about decidable fragments of first-order logic, whereas Prolog is an
existential Horn fragment on terms with Turing-complete extra-logical
additions such as negation-as-failure and "cut" hence has undecidable decision
problems. I actually think the EU-granted research belittled in another
comment did a good job of carving out fragments of FOL for relevant
applications with desirable complexities of decision problems. But from a
practical PoV, RDF, OWL, and SPARQL is bordering on unusable (starting from
the fact that open world semantics alone isn't applicable in many real-world
scenarios), though jena and rdflib are working fine. Think about
OWL2/description logic as a variable-free representation of axioms with just
two logical variables (plus some with three vars such as the axiom of
transitivity).

Today there's a renewed interest in Prolog and Datalog, which makes me happy
after RDF had captured the field for almost two decades.

------
rilut
\- Are there any examples of using Apache Jena for reasoning/inference CMU's
NELL/RTW [0] data?

\- Are there any tools for ontology inference like Apache Jena, but simpler
and preferably built with js or python?

[0] [http://rtw.ml.cmu.edu/rtw/](http://rtw.ml.cmu.edu/rtw/)

------
th0ma5
Big fan of Jena. The command line tools alone for running SPARQL queries and
converting between formats is great. Can anyone compare and contrast Jena in a
Java project and/or operation against flat files with Neo4J or other graph
systems?

~~~
eecc
What do you think of [http://rdf4j.org](http://rdf4j.org) ?

~~~
karimtr
Interesting. I'll have a look. Thanks for sharing :D

------
dominotw
[http://rya.incubator.apache.org/](http://rya.incubator.apache.org/)

How does it compare to rya?

~~~
kinow
Looks like rya is a triple store on top of accumulo. Jena has SDB (stores data
in relational DB), TDB (single machine, single JVM, transaction-wise, on-disk
store [with mmap, journal, cache, etc]).

Jena API is also extensible, and I remember someone was talking in the mailing
list about an example with HBase I believe, as the backend for the triple
store.

Both HBase and Accumulo can be scaled to multiple nodes. So you should have
some pros and cons of using either (or rya), as well as pros and cons of using
Jena.

Finally, I believe Jena is not simply a triple store. It has a web layer via
Fuseki, services for managing graphs, command lines, and other tools useful
for data processing with semantic web technologies.

------
ktk
My comment is a repost that I did in another RDF related thread. It is still
valid so here we go:

I'm an engineer that used to do RDBs for a long time. One day a customer of a
friend came with an issue that was in my opinion impossible to solve with
relational DBs: He described data that is in flow all the time and there was
no way we could come up with a schema that would fit his problem for more than
one month after we finished it. Then I remembered that another friend once
mentioned this graph model called RDF and its query language SPARQL and
started digging into it. It's all W3C standards so it's very easy to read into
it and there are competing implementations.

It was a wild ride. At the time I started there was little to no tooling, only
few SPARQL implementations and SPARQL 1.1 was not released yet. It was PITA to
use it but it still stuck with me: I finally had an agile data model that
allowed me and our customers to grow with the problem. I was quite sceptical
if that would ever scale but I still didn't stop using it.

Initially one can be overwhelmed by RDF: It is a very simple data model but at
the same time it's a technology stack that allows you to do a lot of crazy
stuff. You can describe semantics of the data in vocabularies and ontologies,
which you should share and re-use, you can traverse the graph with its query
language SPARQL and you have additional layers like reasoning that can figure
out hidden gems in your data and make life easier when you consume or validate
it. And most recently people started integrating machine learning toolkits
into the stack so you can directly train models based on your RDF knowledge
graph.

If you want to solve a small problem RDF might not be the most logical choice
at first. But then you start thinking about it again and you figure out that
this is probably not the end of it. Sure, maybe you would be faster by using
the latest and greatest key/value DB and hack some stuff in fancy web
frameworks. But then again there is a fair chance the customer wants you to
add stuff in the future and you are quite certain that at one point it will
blow up because the technology could not handle it anymore.

That will not happen with RDF. You will have to invest more time at first, you
will talk about things like semantics of your customers data and you will
spend quite some time figuring out how to create identifiers (URIs in RDF)
that are still valid in years from now. You will have a look at existing
vocabularies and just refine things that are really necessary for the
particular use case. You will think about integrating data from relational
systems, Excel files or JSON APIs by mapping them to RDF, which again is all
defined in W3C standards. You will mock-up some data in a text editor written
in your favourite serialization of RDF. Yes, there are many serializations
available and you should most definitely throw away and book/text that starts
with RDF/XML, use Turtle or JSON-LD instead, whatever fits you best.

After that you start automating everything, you write some glue-code that
interprets the DSL you just built on top of RDF and appropriate vocabularies
and you start to adjust everything to your customer's needs. Once you go live
it will look and feel like any other solution you built before but unlike
those, you can extend it easily and increase its complexity once you need it.

And at that point you realize that this is all worth is and you will most
likely not touch any other technology stack anymore. At least that's what I
did.

I could go on for a long time, in fact I teach this stack in companies and
gov-organizations during several days and I can only scratch the surface of
what you can do with it. It does scale, I'm convinced by that by now and the
tooling is getting better and better.

If you are interested start having a look at the Creative Commons
course/slides we started building. There is still lots of content that should
be added but I had to start somewhere: [http://linked-data-
training.zazuko.com/](http://linked-data-training.zazuko.com/)

Also have a look at Wikipedia for a list of SPARQL implementations:
[https://en.wikipedia.org/wiki/Comparison_of_triplestores](https://en.wikipedia.org/wiki/Comparison_of_triplestores)

Would I use other graph databases? Definitely not. The great thing about RDF
is that it's open, you can cross-reference data across silos/domains and
profit from work others did. If I create another silo in a proprietary graph
model, why would I bother?

Let me finish with a quote from Dan Brickly (Googles schema.org) and Libby
Miller (BBC) in a recent book about RDF validation:

> People think RDF is a pain because it is complicated. The truth is even
> worse. RDF is painfully simplistic, but it allows you to work with real-
> world data and problems that are horribly complicated. While you can avoid
> RDF, it is harder to avoid complicated data and complicated computer
> problems.

Source:
[http://book.validatingrdf.com/bookHtml005.html](http://book.validatingrdf.com/bookHtml005.html)

I could not have come up with a better conclusion.

------
narrator
I think the semantic web never worked because of seo spam. The closest it got
to adoption in any form was the keywords meta tag. We know how that ended up.

~~~
acdha
Less that and more the combination of bad tools, labyrinthine “standards”
which were usually poorly implemented by said tools, and lack of concrete use-
cases. Almost nobody had a case where implementing it showed a clear business
value so most of the “you should do this” advocacy was along the lines of
“this will become useful later” and that was a hard sell because it took many
years before you could go to a webpage, read some decent instructions, mark
something up, and run it through a validator or another client and get the
expected data back out.

~~~
namedgraph
That is exactly what is happening: people and companies marking up their pages
with JSON-LD and RDFa metadata because it impacts SEO:
[https://developers.google.com/search/docs/guides/intro-
struc...](https://developers.google.com/search/docs/guides/intro-structured-
data)

~~~
acdha
That was my point: to the extent that the semantic web has happened at all,
it's because someone can go to their boss and say “If we do this, here's a
non-hypothetical benefit we'll see immediately” and because Google invested in
high-quality documentation and tools which make it easy to do correctly.

------
ragerino
I prefer rdf4j as Rdf-Api for clients and GraphDB from Ontotext as RDF store.
I never liked programming with Jena.

------
martinsbalodis
How is semantic web data being used today? Has there been any research to
build a general AI using this?

------
tqkxzugoaupvwqr
What is the origin of the name / What does it mean? Couldn’t find an answer on
the website.

~~~
evanb
Jena is a city in the eastern part of Germany. It is historically famous for
its university and optics innovation, and as an epicenter for bauhaus
architecture.

~~~
la_fayette
That is why people from AKSW Leipzig built a similar thing for PHP named
Erfurt ([https://github.com/AKSW/Erfurt](https://github.com/AKSW/Erfurt)) for
their semantic wiki OntoWiki
([https://github.com/AKSW/OntoWiki](https://github.com/AKSW/OntoWiki)).

------
pragmaticlurker
why Java?

~~~
karimtr
Easier to build something quicky but definitely not the best option for DBs.
It's always about tradeoffs I guess.

~~~
pragmaticlurker
for quickly, I would have chosen a dynamic language (python, php)

