
Apache Geode: Distributed, in-memory database - espeed
http://geode.incubator.apache.org/
======
sandstrom
Didn't understand much from the linked page, but I found this website (from
Pivotal, the commercial entity behind Geode) quite informative. Perhaps it's
useful to others.

[https://pivotal.io/big-data/pivotal-gemfire](https://pivotal.io/big-
data/pivotal-gemfire)

===

I found this interesting deployment:

    
    
        China National Railways use Geode to run railway ticketing for the entire 
        country with a 10 node cluster, managing 2 TB of "hot data" in memory, 
        and 10 backup nodes for high availability and elastic scale.
        
        Holiday travel periods [Chinese New Year's] create peaks of 15,000 tickets 
        sold per minute, 1.4 billion page views per day and 40,000 visits per second.
    

[http://pivotal.io/big-data/case-study/scaling-online-
sales-f...](http://pivotal.io/big-data/case-study/scaling-online-sales-for-
the-largest-railway-in-the-world-china-railway-corporation)

~~~
trhway
to understand where it comes historically and ideologically/architecturally
from - the keyword here is Smalltalk
[https://en.wikipedia.org/wiki/Gemstone_%28database%29](https://en.wikipedia.org/wiki/Gemstone_%28database%29)

The guys have been for 3 decades sitting on the forested Oregon river bank
(the place is called Beavertown for reason :) thinking straight and clear in
Smalltalk... :)

~~~
GregChase
Geode and its predecessor GemFire are pure Java.

The original engineering expertise came from the team that built the Gemstone
OODB.

Beaverton isn't so forested anymore, but lots of indigenous object oriented
expertise for sure.

~~~
trhway
>Geode and its predecessor GemFire are pure Java.

i know (was talking at some point about joining, already here in SV), and this
is why i wrote only "3 decades" as i don't know precise dates of GemFire only
suppose it to be beginning of 200x. My simple arithmetics mistake though -
2002 - 1982 is 2 decades, sorry, i see where confusion comes from :)

------
tuyguntn
Landing page and documentation should be reconsidered. lots of unanswered
questions,

\- _distributed, in-memory database_ \- how does it compare to SAP HANA? or
Redis? if "... database" where is the querying language? if you have own
querying language other than SQL please show us.

\- _Performance is key_ \- tell us about some benchmarks

\- _Consistency is a must_ \- CA or CP in CAP theorem?

\- Can I fully drop RDBMS in favor of Geode?

\- web site built as Landing page/PR for product, then all of a sudden starts
Community/Contributors/Getting started columns, with loooooong list of mailing
lists, smaller content for contributors and very important part(getting
started) presented as tiny small link :(, please put there some useful info
about product, you already have Community/Contribute menus.

~~~
finnh
> \- Consistency is a must - CA or CP in CAP theorem?

There is no CA. So, they must mean CP.

Remember, kids: if they promise CA, run away.

~~~
fweespeech
CA is possible if its read only, essentially, with all updates synchronized
when a partition is not present [which is the majority of the time in the real
world].

~~~
eloff
It's semantics, but I'd argue if you're read only than you're changing the
definition of available.

~~~
GregChase
Geode is write-intensive, and generally optimizes for consistency over
availability. That being said - there's a lot built into Geode to ensure
availability as well.

------
Brainix
Feedback on the documentation:

It's not immediately obvious to me how Geode is different from Redis. When
would I want to use Geode over Redis, and vice versa?

~~~
GregChase
Redis and in-memory data grids are pretty different animals. I would
characterize IMDG's like Geode to be concurrent write intensive, and have
flexible data models. It also scales out better than Redis in a more automated
fashion.

Redis is a great read-intensive cache. It also has a powerful data model, but
you have to use their data models. Example: If you want to run calculations on
lists or sets, they have powerful operations you can call.

IMDG's such as Geode were built with the rise of automated trading in the
finance industry.

------
espeed
Two Videos:

1\. "Open Sourced GemFire In-Memory Distributed Database and Apache
Contributors" ([https://www.youtube.com/playlist?list=PL62pIycqXx-
TTMXsq09BE...](https://www.youtube.com/playlist?list=PL62pIycqXx-
TTMXsq09BEGfacLjeHZM28))

2\. "Creating a Highly Scalable Stock Prediction System with R, Geode & Spring
XD" ([https://www.youtube.com/playlist?list=PL62pIycqXx-
Rzd_HcjU7Y...](https://www.youtube.com/playlist?list=PL62pIycqXx-
Rzd_HcjU7YXq1kIOa28M8X))

------
orless
I'm starting to get lost in the new Apache animals.

How does Apache Goede compares to Apache Ignite (advertised as "in-memory data
fabric"?

~~~
GregChase
Apache Geode and Apache Ignite are more similar than they are different.

Apache Ignite, based off the commercial distribution Grid Gain is newer to
market.

Apache Geode, based off the commercial distribution GemFire, has a long
history in the market.

~~~
orless
I wonder why Apache would need to have two "more similar than different"
products.

~~~
wmf
Apache accepts anything that companies donate. (They say they don't, but it's
hard to find anything they've rejected.)

~~~
rectang
That's not quite the right emphasis.

Apache is happy to provide a home for any community that is willing to adhere
to our governance rules and traditions. Competing projects are OK.

Projects are almost never rejected because preparing a proposal for incubation
is rigorous and many projects who would be a poor fit self-select out.

Source: former VP Apache Incubator, who has both helped prepare successful
proposals and privately counseled projects who decided not to come to Apache.

------
lobster_johnson
Can anyone comment on Geode's non-Java support?

I'm asking because a lot of Java "big data" stuff tend to prioritize Java
clients (ZooKeeper, Kafka, Hadoop HDFS, Storm, VoltDB and HBase come to mind),
and while there are sometimes clients in other languages, they tend to be
second-class citizens that take years to reach feature/performance parity with
the Java stuff.

For example, last I checked there still wasn't a mature, feature-complete
Kafka client (consumer and producer with built in offset management) for Go.

~~~
mclarenfan
Gemfire (on which geode is based) has a fully featured c, c++, c# client which
has feature parity with the Java client. I don't know if pivotal is going to
open source these clients too.

There however is a REST api and a python client
[https://github.com/gemfire/py-gemfire-rest](https://github.com/gemfire/py-
gemfire-rest)

------
wiradikusuma
Sorry for the stupid question: how do in-memory DBs deal with power failures?
e.g. someone walking in the server room and trip power cable.

I can understand for read-only in-memory, but what about writes?

~~~
markito
In the case of Geode, you can make a "Region" (think table) persistent on disk
and using the concept of shared nothing architecture [1] to avoid SPOFs.

What's also interesting is that we offer a very efficient way to recover data
from disk as well[2] in the case of a crash of a single node or the entire
cluster.

[1]
[https://en.wikipedia.org/wiki/Shared_nothing_architecture](https://en.wikipedia.org/wiki/Shared_nothing_architecture)
[2] [http://gemfire.docs.pivotal.io/docs-
gemfire/latest/managing/...](http://gemfire.docs.pivotal.io/docs-
gemfire/latest/managing/disk_storage/how_startup_works_in_system_with_disk_stores.html)

------
jacques_chester
FWIW, Pivotal is hiring in our Big Data team, largely based in Palo Alto.
Geode (incubating), HAWQ (incubating), Greenplum, Pivotal HD, MADlib etc are
all mostly developed with engineering effort that we donate.

Hit me up with an email (jchester@pivotal.io) or visit pivotal.io/careers if
you're interested.

------
LoSboccacc
> Gemcached > Geode servers can be configured to talk memcached protocol.

hey this is very interesting, it could work as a persistent acid memcached
drop-in replacement!

~~~
hyc_symas
There's already memcacheDB for that
[https://github.com/LMDB/memcachedb](https://github.com/LMDB/memcachedb)

~~~
LoSboccacc
That one doesn't seem distributed. But I just glanced at the readme.

~~~
hyc_symas
Distribution is usually handled by the memcache client libraries.

------
rdtsc
Source here:

[https://github.com/apache/incubator-
geode](https://github.com/apache/incubator-geode)

------
dberg
I believe this competes with another well known open-source In-Memory Data
Grid, Hazelcast. worth checking out.

~~~
threeseed
Last time I checked Hazelcast couldn't be run standalone. Also I wouldn't use
it as a database for anything more than a few MB of data.

This is probably closer to Apache Ignite aka Gridgain.

~~~
dberg
Can you explain why ? I remember reading a ton about off-heap memory work they
were doing

------
markito
Geode FAQ Wiki page
[https://cwiki.apache.org/confluence/display/GEODE/Technology...](https://cwiki.apache.org/confluence/display/GEODE/Technology+FAQ)

------
GregChase
If you are interested to join the developer community around Apache Geode,
subscribe via: dev-subscribe@geode.incubator.apache.org

------
Thaxll
It looks similar to Hazelcast / Oracle Coherence?

~~~
wener
Yes, both Geocode, Coherence, Ignite and Hazelcast are quit similar.Grid
computation.

------
avodonosov
Would you choose it over Datomic?

~~~
espeed
Datomic's immutable storage and time-travel query capabilities are awesome,
and I often miss them in other DBs. But Datomic currently isn't designed for
write-intensive workloads. And while you can shard Datomic's transactor and
then combine multiple DBs in a query
([http://nosql.mypopescu.com/post/19310504456/thoughts-
about-d...](http://nosql.mypopescu.com/post/19310504456/thoughts-about-
datomic#comment-465580077)), that's only going to get you so far.

However, Apache Geode lets you add custom indexes so it might not be too hard
to add Clojure's persistent data structures as a custom index scheme and hook
in Apache Geode as a backend to Clojure Datalog:

Clojure Datalog:
[https://github.com/fogus/bacwn](https://github.com/fogus/bacwn)

Datscript:
[https://github.com/tonsky/datascript](https://github.com/tonsky/datascript)

Clojure's Persistent Data Structures for Java:
[https://github.com/grignaak/clj-ds](https://github.com/grignaak/clj-ds)

------
fiatjaf
Let me guess: it's written in Java?

~~~
fiatjaf
Yes.

Why are all Apache projects are written in Java?

~~~
zidad
Apache isn't written in Java :)

[https://en.wikipedia.org/wiki/Apache_HTTP_Server](https://en.wikipedia.org/wiki/Apache_HTTP_Server)

~~~
rylee
That's Apache httpd. Apache is a software organization.

