
RC1 ArangoDB 3.4 – What’s new? - rubbercasing
https://www.arangodb.com/2018/09/rc1-arangodb-3-4-whats-new/
======
emmanueloga_
I'm new to ArangoDB but this release looks impressive just by the sheer number
of new features. Congrats!

I was wondering if this release allows FULLTEXT indexes when the backend is
RocksDB (now that it is the default storage engine)? The new ArangoSearch
features look cool, but honestly a bit daunting vs the simple setup of a
FULLTEXT index.

By the way, the ArangoSearch tutorial casually talks about "ArangoDB views of
type 'arangosearch'", but I haven't come across the concept of views before in
the documentation. Are there other types of views?

~~~
hkernbach
Currently there are no other types of views. But they are planned and will
follow.

------
lbhnact
Love each of the new releases. Would appreciate hearing any stories of
performance implications with the 'Distributed COLLECT' improvements.

~~~
jsteemann
Thanks! We have a very brief description of the "distributed COLLECT" feature
here:
[https://github.com/arangodb/arangodb/blob/3.4/Documentation/...](https://github.com/arangodb/arangodb/blob/3.4/Documentation/Books/Manual/ReleaseNotes/NewFeatures34.md#distributed-
collect)

More beef to be added to this until the GA release.

The benefits of distributed COLLECT will come into play for queries that can
push the aggregate operations onto the shards. Previous versions of ArangoDB
shipped all documents from the database servers to the coordinator, so the
coordinator would do the central aggregation of the results from all shards to
produce the result.

With distributed COLLECT we now create an additional shard-local COLLECT
operation that performs part of the aggregation on the shards already. This
allows sending just the aggregated per-shard results to the coordinator, so
the coordinator can finally perform an aggregation of the per-shard
aggregates.

This will be beneficial in many cases when the per-shard aggregated result is
much smaller than the non-aggregated per-shard result.

Following is a very simple example. Let's say you have a collection "test"
with 5 shards and 500k simple documents that have just one numeric attribute
(plus the three system attributes "_key", "_id" and "_rev"):

    
    
        db._create("test", { numberOfShards: 5 }); 
        for (i = 0; i < 500000; ++i) {
          db.test.insert({ value: i });
        }
    

Running a query that will calculate the minimum and maximum values in the
"value" attribute can make use of the distributed COLLECT:

    
    
        FOR doc IN test 
          COLLECT AGGREGATE min = MIN(doc.value), max = MAX(doc.value) 
          RETURN { min, max }
    

The database servers can compute the per-shard minimum and maximum values, so
they will each only send two numeric values back to the coordinator.

Without the optimization, the database servers will either send the entire
documents or a projection of each document (containing just each document's
"value" attribute back) to the coordinator. But then each shard would still
have to send 100k values on average.

With a local cluster that has 2 database servers and runs them on the same
host as the coordinator, this simple query is sped up by a factor of 2 to 3
when the optimization is applied. In a "real" setup the speedup will be even
higher because then there will be additional network roundtrips between the
cluster nodes. And in reality documents tend to contains more data and
collections tend to have more documents. If this is the case, then the speedup
will be even higher.

------
spamizbad
Graph database aficionados, I have a question: What's the deal with most graph
databases having a very limited number of types? It would be nice if we had
more robust numeric types (integers and decimals rather than just doubles) and
timestamps for example.

~~~
janemanos
This is on the list of more than one teammate here at ArangoDB. But also not
trivial to implement.

~~~
nh2
Can you elaborate why that is? Of all types I could come up with (beyond
bool), integers and unsigned integers seem the most basic.

~~~
janemanos
The simplest explanation is that ArangoDB uses JSON as dataformat to the
outside world. JSON doesn't support these types like arbitrary exact decimal,
or timestamps. Despite ArangoDB using VelocyPack internally, which is
capabable of much more than JSON, a user will import JSON and get JSON back.

You can of course use datetime
[https://docs.arangodb.com/3.3/AQL/Functions/Date.html](https://docs.arangodb.com/3.3/AQL/Functions/Date.html)
and decimals with a precision of 10E38 in ArangoDB but it is not as precise as
in a relational database. If we want to be as precise as a relational DB, then
we would have to say goodbye to JSON

~~~
fulafel
MongoDB uses this to express decimals as JSON: { "$numberDecimal": "<number>"
}

~~~
e12e
I suspect creating a specification based on what mongodb does/did might be the
better approach - but a quick search for "typed json" turned up:

[https://www.tjson.org/](https://www.tjson.org/)

Not sure if I'm a fan of the syntax - but some kind of open, sane, standard
would be nice.

