
Aerospike goes Open Source - ZenoArrow
http://www.aerospike.com/blog/entrepreneurs-break-all-the-rules-aerospike-goes-open-source/
======
ZenoArrow
Worth pointing out that unlike some NoSQL engines, Aerospike does have access
to its own query language (AQL) that is syntatically similar to SQL...
[https://docs.aerospike.com/pages/viewpage.action?pageId=3807...](https://docs.aerospike.com/pages/viewpage.action?pageId=3807532)

From the AQL query documentation you have... SELECT name, age FROM
users.profiles WHERE age BETWEEN 20 AND 29 ...which is pretty easy to
understand.

There´s also a Python client (Apache licensed)...
[https://github.com/aerospike/aerospike-client-
python](https://github.com/aerospike/aerospike-client-python)

~~~
ZenoArrow
This video shows off how easy it is to manage, pretty impressive (I don't work
for Aerospike by the way, I'm just happy to share about it)...
[https://www.youtube.com/watch?v=CF83TmR-
NME](https://www.youtube.com/watch?v=CF83TmR-NME)

------
yeukhon
The APGL license caught my eyes. Does anyone take this license into account
(deciding between an Apache/MIT/BSD DB vs APGL DB) when they use it in their
service / software stack.

For example, this openstack thread always keeps me alert using APGL when I am
developing a solution. [http://lists.openstack.org/pipermail/openstack-
dev/2014-Marc...](http://lists.openstack.org/pipermail/openstack-
dev/2014-March/030510.html)

and here is MongoDB's FAQ explaining APGL in plain English:
[http://blog.mongodb.org/post/103832439/the-
agpl](http://blog.mongodb.org/post/103832439/the-agpl)

~~~
ZenoArrow
I can't see it being an issue. As mentioned in the useful MongoDB link you
shared, the licence will require sharing only when modifying the database
code, but not require automatic sharing of the rest of the software stack.

Have there been any court cases involving AGPL violations? I wonder if some of
Gil Yehuda's fears are partly out of lack of clarity on where the reach of the
AGPL ends? For example claiming the MongoDB drivers 'violate the AGPL
license', I´d prefer to see a response from GNU on this.

~~~
yeukhon
Well in the case of companies like Google or Facebook, they might actually
modify the database for various of reasons (e.g. internal policy, speed, etc).
I don't have first hand experience with modifying databases so I thought I
would ask :)

~~~
_delirium
If someone modified the database and deployed it publicly (or as part of a
webapp that was deployed publicly), they'd need to share the modified source,
yes. AGPL is roughly GPL but where the traditional definition of "shipping
software" is expanded to include deploying on a network.

My guess is that they have two motivations, both of which are fairly
traditional GPL motivations: 1) sell AGPL exceptions to commercial licensees;
and 2) prevent a competitor from making a private commercial fork, where the
competitor improve the DB and licenses their version to clients, without
sharing the source to their improvements.

~~~
e12e
To be pedantic: deployed it a an _externa_ l service, not necessarily a public
service (I suppose to a legal entity that would not normally share license
rights, such as an inividual not part of the organisation or to another
organisation). Those external users would have to be given an option to access
the full source with modifications.

As for the "part of a web service"-bit, I'm not sure what the agpl's actual
"reach" is. My understanding is that with a (modified) db under agpl powering
eg a web app runing in php, the end users (accessing only the web server)
would _not_ be entitled to the db source. If the agpl covered the web srrver
itself or a php library on the other hand, the users would be entitled to that
code?

Similarly if one sold a modified db-as-a-service, modifications would be
covered by the agpl.

~~~
belorn
While its interesting to theorize about databases and AGPL, I am rather sure
there hasn't been any actually case or a legal reason to think that the
database would become a derivative work when used together with a webserver.

If it was, the EULA for SQL server and oracle would have to include copyright
permission for derivative works. That we do not see that should be a clear
sign that the scope of copyright has not reached that far yet.

------
pwnna
You know, when I first saw this I thought it's the aerospike engine:
[http://en.wikipedia.org/wiki/Aerospike_engine](http://en.wikipedia.org/wiki/Aerospike_engine)

~~~
wlesieutre
Same here, I'm assuming that's what the name is borrowed from.

------
remon
"A database literally ten times faster than existing NoSQL solutions, and one
hundred times faster than existing SQL solution".

The odds of that claim being true and/or supported by benchmarks is somewhere
well below the 1% mark. Why do companies keep making those sort of obviously
questionable claims knowing the negative backlash that surely will follow.
Boggles the mind really.

~~~
bbulkow
CTO, co-founder, initial coder here.

Aerospike really is a lot faster than Mongo and Cassandra. It's open source,
and you can run whatever benchmarks you'd like yourself. It's about as fast a
well-tuned multi-core sharded Redis system, except you don't have to write
configure the sharding, and you can have a combination of RAM or Flash,
different data in each, of course Flash is cheaper/slower but that's why we
give you both.

You can run a single c3.8xlarge on amazon and see 1m tps, or 250K on a
c3.2xlarge. We're doing a lot of benchmarks on EC2 and GCE because they're
"reference platforms" that you'll all believe. More details in the coming
weeks from us, or publish your own.

Just try it yourself; this isn't marketing.

Everyone I talk to coming from Cassandra is seeing a server reduction of
4x~5x, with higher levels of stability (overhead for peaks). I was at a
conference late last week and the company I was with (adform's founder, Jakob)
said they had a major Cassandra outage that week that cost them a lot of
money, and Adform is a Cassandra contributor and knows what they're doing.

Same thing with Mongo shops. They do about 5x reduction and see much higher
performance.

Technical points of why we're faster:

* Coded in C, multitheaded, with reference counting.

* Avoid malloc, but if you have to malloc, avoid the CLib memory allocator. We do a lot of slab allocation (a la memcache) and use JEmalloc for variable sized allocs.

* Use epoll directly and be careful about IO. Don't use mmap, which is 4x slower than read and write.

* Code directly to device, with your own data layout. Databases are a reliability layer, everything else is extra complexity. O_SYNC is better than fsync.

There's a lot of smaller tricks in the code, but it all adds up to speed, and
I don't expect you to believe me. I've spent 25 years in silicon valley
writing high performance software, and so has most of the team. We come from a
strong background of embedded, settop box, cell phone programmers.

Let me tell you a short story. I brought my particular bag of tricks to a
streaming video server company in the mid 90's. I produced an internal product
that was 100x faster - that is, required 100x lower cost hardware than the
company's existing product (133mhz Pentium instead of high end sun machines).
The product got buried - because the sales guys couldn't make their commission
checks.

I'm tired of that mentality.

Aerospike has been running in production at seriously high loads for years. I
work with a lot of guys who say - "What else am I going to use?" For the use
case where you want KVS, with decent API support (redis-like lists and UDFs),
and a little analytics, and scale-out adding nodes under production load, it's
the right choice.

If you're thinking of a Mongo KVS, Cassandra, Redis, you really need to look
at Aerospike. Do yourself, and your startup, a favor.

( And, yes, the name is based on the Aerospike engine, but we were thinking
more of the Trident II D5, which uses an Aerospike at the front, to
essentially extend the aerodynamic length of the missile. The problem with
sub-based missiles is they have to be short to fit in the sub, and a use of
the aerospike was one of many techniques for making the US based deterrent
accurate. We used the name Aerospike because there are a lot of small
techniques that make an "unbelievable" difference - that's what engineering
is, compared to theory. )

PM me directly if you're having trouble running benchmarks or anything.

~~~
regularfry
> Don't use mmap, which is 4x slower than read and write.

For what sorts of access patterns?

~~~
bbulkow
I believe my numbers were 4k random access. Admittedly, this was an ages-ago
kernel (2.6.18 derivative). mmap as a pattern has issues with concurrency,
because you burn threads, and you can't cancel them. The only way out is to
try to predict when an mmap page access will block and thread it differently,
but then your code path gets longer.

There might be single-thread single-core patterns where mmap works best, or if
newer kernels have changed. The reality: you have an IO, you put this "action"
aside, you need to be woken up when complete, do you want to burn a thread or
an IO context?

We also have recent numbers about using Linux's epoll / eventfd / signal
mechanism, like Nginix seems to use, and its so deeply inferior to doing Linux
AIO that its hard to choose that path, as seductive as single-event-loop is.

------
perlgeek
Now I'm waiting for aphyr to test aerospike under cluster partition :-)

~~~
remon
Indeed.

------
alexnewman
OK I can't find any tests. It also disturbs me when people "open source"
projects without any real revision history.

Also is it true that [https://github.com/aerospike/aerospike-
server](https://github.com/aerospike/aerospike-server) hasn't been updated?

~~~
ryanobjc
A system without comprehensive tests is one that cannot be changed.

The big question is, do they have tests or not?

~~~
alexnewman
AFAIK basically not. Check my above comment. Those they have seem to be
cryptic and few and far between.

------
TheCondor
It's an impressive cache. Last I looked at it they were using lua and it
looked like they were going squarely after mongo

~~~
ZenoArrow
The core code is written in C, but they're definitely using Lua in places, I
believe they've incorporated some code from AlchemyDB...
[https://code.google.com/p/alchemydatabase/](https://code.google.com/p/alchemydatabase/)

From what I've read it could easily surpass Mongo, just look at the cost
savings...
[http://www.datanami.com/2013/09/06/aerospike_says_secret_to_...](http://www.datanami.com/2013/09/06/aerospike_says_secret_to_nosql_speed_is_simplicity/)
"The second comparison (a video ad serving platform) had much bigger
requirements, including a 5TB database processing 500,000 TPS. The hybrid SSD-
DRAM setup running the AeroSpike database was able to handle the load with
just 14 servers, at a total cost of $322,000, compared to 186 servers using
NoSQL running on clusters of servers that use a lot of DRAM and cost $5.6
million."

They´re ACID compliant as well, which Mongo is not (AFAIK)...
[https://www.youtube.com/watch?v=nnxj77NNEeg](https://www.youtube.com/watch?v=nnxj77NNEeg)

~~~
nlavezzo
They don't have ACID transactions. We had to put up a page to dispel some of
the misuse of the term ACID. Aerospike's excerpt:

"Aerospike does not provide true ACID transactions. Just like Cassandra 2.0,
Aerospike only provides compare and set, and misleadingly labels it as ACID."

[https://foundationdb.com/acid-claims](https://foundationdb.com/acid-claims)

~~~
ZenoArrow
That's interesting, thank you. In practical terms, what does ACID provide that
compare and set does not?

~~~
nlavezzo
The page linked goes into more detail, but among other things, Compare and Set
operations are only concerning one data element at a time - meaning you cannot
do Atomic updates to multiple keys. The ability to do multiple key updates
atomically makes it possible to build higher level abstractions by reliably
combining data from multiple keys under concurrent workloads.

Basically, transactions that can span an arbitrary number and set of keys are
what makes it possible to build rich data models from simple ones. SQL
databases are a perfect example of this - most use a simple transactional data
store on the bottom to store complex relational data structures. A single SQL
operation may require many key-level updates - but this is OK if you can wrap
them all in ACID transactions.Without ACID transactions you can't guarantee
data consistency because keys will be getting updated at different times,
allowing for a mix of old and new values.

It's a shame to see vendors trying to change the meaning of ACID to fit the
limitations of their databases. It means more confusion and bad decisions in a
market that needs clarity and honesty for people to make the right decisions
for their applications.

~~~
baotiao
Yes, I really hate this. When I first saw that Aerospike support ACID
transaction, I thought, Wow, it is really amazing. After I read the paper,
That "ACID" means one key transaction, it make me dispoint, I don't even want
to see there code again..

------
meritt
I don't think this press release has nearly enough buzzwords.

~~~
jaseemabid
Yep. Everything about Aerospike so far have been filled with buzzwords. There
was a flash talk by an employee of theirs recently at a conference in
Bangalore and it was just marketing BS. I'm still skeptical.

~~~
arrryarr
I think the independently written article from AppLovin (link above) points to
it actually working as promised.

------
alexnewman
Common atleast has some tests posix4es-MacBook-Air-3:aerospike-common posix4e$
cloc src/main/ \- 6603 posix4es-MacBook-Air-3:aerospike-common posix4e$ cloc
src/test/ \- 1247

I worry about test coverage stats like that

Not to mention if you look at the tests

/ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
__ __ __ __ __ __ __ __ __ __ __ __ __ __* TEST CASES __ __ __ __ __ __ __ __
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
__ __ __ __ _/

TEST( msgpack_roundtrip_integer1, "roundtrip: 123" ) { as_integer i1;
as_integer_init(&i1, 123);

    
    
            as_integer i2;
            as_integer_init(&i2, 456);
    
            as_val * v2 = roundtrip((as_val *) &i1);
    
            assert_val_eq(v2, &i1);
    
            as_integer_destroy(&i1);
            as_val_destroy(v2);

}

Not exactly terse and readable

------
shabinesh
Aerospike guys were at a big data workshop at Bangalore last week. It's
performance is pretty impressive. They claimed about >1M TPS with just 3 nodes
and compared it with Couchdb which is claimed to achieve 1M TPS in 330 nodes.
But unclear about their benchmarking method.

~~~
riyer
I believe that was cassandra... latest from netflix on that
[http://techblog.netflix.com/2014/07/revisiting-1-million-
wri...](http://techblog.netflix.com/2014/07/revisiting-1-million-writes-per-
second.html)

------
qmaxquique
You can try Aerospike in a Terminal.com container. I've created a snapshot
with a simple CRUD example at
[https://terminal.com/tiny/yEITIwNyLT](https://terminal.com/tiny/yEITIwNyLT)

------
rolfvandekrol
Hmm, if the community edition server is AGPL licensed, are they even allowed
to have an enterprise edition that is not AGPL licensed? I suppose the
enterprise edition is an altered version of the community edition, so can they
be 'forced' to publish their changes?

~~~
infinite8s
This seems to be a common misconception about the interplay of copyright and
GPL style licenses. Since they own the copyright, they can relicence it
however they choose. The GPL and other open source licenses just give non-
copyright holders additional rights beyond what copyright law provide (which
for most works is nothing beyond a bit of fair-use). In that way the GPL is a
clever hack of copyright, since it relies on the default of no-rights granted
by copyright to enforce its terms.

~~~
rolfvandekrol
I get that part, but how does that work when other people also contribute
changes for which they own the copyright and which are also AGPL licensed, but
then to the Aerospike devs?

~~~
nl
Usually the way it works is the company requires copyright assignment to them.

~~~
coldpie
Right. The owner could also refuse outside contributions, or perhaps there
simply haven't been any outside contributions.

~~~
infinite8s
Well, since the project was just released as opensource, presumably all work
on it so far has been by employees of Aerospike (who automatically grant
copyright to their employer by virtue of the work-made-for-hire exception).

------
weitzj
If the server uses the AGPL instead of the GPL, why does it matter that the
clients are under an Apache License? I thought if you use the AGPL you have to
contribute back the client code as well, when you use the server remotely.

~~~
DoubleMalt
That is not true.

If you expose an service based on AGPL licensed service, you have to make the
source code available to the services that use it.

For example you could modify WordPress (which is GPL licensed), put it on your
server and let it serve pages without providing your modified source code to
anyone.

If WordPress was AGPL licensed you would have to provide your modified source
code to anyone using the system.

This also effects services that use libraries that are AGPL licensed (like
newer versions of iText), but not services, that use other services.

The point is AGPL only adds that if you consume it over the network, you have
the right to the source code. If you use it as a network service, for your
webapp, your webapp is the consumer.

MongoDB has the same licensing model, and nobody sued Foursquare for the
source code, so I guess this is legally tested ;)

------
ZenoArrow
To give some idea of speed...[http://www.aerospike.com/blog/aerospike-doubles-
in-memory-no...](http://www.aerospike.com/blog/aerospike-doubles-in-memory-
nosql-database-performance/)

It's got some impressive responses from people in industry too, check out this
post about its use at eBay... [http://www.aerospike.com/blog/ebay-helps-
retailers-know-your...](http://www.aerospike.com/blog/ebay-helps-retailers-
know-your-holiday-shopping-list/)

~~~
ris
"To give some idea of speed"

Purported speed. Please, we've all seen enough wildly hyped NoSQL databases
now to remain a little cynical, haven't we?

~~~
ZenoArrow
Here's a benchmark from 2013 from Thumbtack Technology, comparing Aerospike,
Cassandra, MongoDB and Couchbase... [http://www.odbms.org/2013/01/ultra-high-
performance-nosql-be...](http://www.odbms.org/2013/01/ultra-high-performance-
nosql-benchmarking-analyzing-durability-and-performance-tradeoffs/)
[http://www.odbms.org/wp-
content/uploads/2013/11/NoSQLBenchma...](http://www.odbms.org/wp-
content/uploads/2013/11/NoSQLBenchmarking.pdf)

In this benchmark, Couchbase gets some impressive results, but it does appear
that Aerospike is the overall winner when it comes to speed and reliability.
Anyway, the code is free to install, it's easy enough to validate the speed
claims... [http://www.aerospike.com/blog/aerospike-doubles-in-memory-
no...](http://www.aerospike.com/blog/aerospike-doubles-in-memory-nosql-
database-performance/)

~~~
ris
In the former study, if you read it carefully, Aerospike were essentially able
to choose the hardware for the test.

"Anyway, the code is free to install, it's easy enough to validate the speed
claims"

So why don't you do so and come back with your own results that can at least
pretend to be neutral instead of spreading empty hype around here?

~~~
hackalyst
/me aerospike employee here - we did a live demo of running aerospike server
on AWS EC2 during the fifth elephant last weekend at bangalore. The demo had
1M TPS and latency for 80/20 load (80% read, 20% write) was <1ms for >99.8% of
queries.

This demo was done on 4 r3.4xlarge nodes - We did earlier runs on r3.2xl as
well with similar results.

[https://twitter.com/anshprat/status/492971667493122048](https://twitter.com/anshprat/status/492971667493122048)

I didnt do a latency screenshot grab but those who saw the demo can comment..

------
blitzprog
How does this perform in comparison to Riak (cluster, v2.0)?

------
simi_
> Aerospike’s mission is to rain bullshit on the entire field of databases by
> offering an addictive proposition: a database literally ten times faster
> than existing NoSQL solutions, and one hundred times faster than existing
> SQL solutions.

Gotta love Disrupt to Bullshit:
[https://chrome.google.com/webstore/detail/disrupt-to-
bullshi...](https://chrome.google.com/webstore/detail/disrupt-to-
bullshit/mahaemfhlcjficbbkbpmkbhhenfnikcf)

