
HBase | An open-source, distributed, column-oriented store modeled after Big Table - iamelgringo
http://hadoop.apache.org/hbase/#Getting+Started
======
jdavid
here are some notes on getting your own ec2 hadoop single computer cluster
upd, these are rough

Getting hadoop and hbase running in single server mode

I am going to track my progress and take some notes on how a newb to the linux
world would go about setting up a hadoop node running hbase running on ec2. I
have chosen ubuntu as my core OS as it offers all of the perks of linux, and
is for me more useable than that of say redhat, suse, or mandrake; I have
tried them all, and between the tools and the community, I feel that ubuntu is
the best linux OS out there (I am so excited about Canonical’s decision to
hire UI designers for linux, go-go gadget Ubuntu!).

2:16 I just got one of Eric Hammond Ubuntu images booted on Amazon. I used
imaged ami-179e7a7e, which is a base install of Ubuntu LTS 8.04, and I used
the firefox3 extension elasticfox to boot the ami.

I am not going to explain how to get an ami booted here, but there are other
documents like these that can get you started with your first AMI boot.

2:28 Looking over the wiki.apache.org/hadoop website for setup guides, and
there seems to be a few quick start ones and maybe a few that are just for
ubuntu.

2:30 I am going to work off of the following tutorial and make changes along
the way,…. I hope it works
[http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux...](http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_)(Single-
Node_Cluster)

the tutorial targets java 1.5 and ubuntu 7.10, and hadoop .14.2 and is dated
October 2007. I am going to target Ubuntu 8.10 LTS Java 1.6 Hadoop .18

2:32 Apt-get install sun-java6-jdk

-this will take about 2-5min to download and install the sun JDK

2:55 After looking for a few text editors to replace vi or vim, I decided to
just grab nano, jed, and gedit, as I remember one of those being more user
friendly than vi

Apt-get install nano jed gedit

-2-5 min to install

3:07 Used “jed” to edit /etc/jvm for java6 by adding this line

/usr/lib/jvm/java-6-sun

to the top of the list

3:10 Was able to follow tutorials directions

3:20 Filed a minor bug for the image, it reports as 7.04 when using ssh.

3:23 Going to skip disabling ipv6, cause it requires a reboot and on ec2….
That is a pain in the ass, as I would need to set up and back up my image
first.

3:51 Make sure you are back at the root login and do a wget to download the
file at one of the hadoop mirrors, you will have to find one that has the 0.18
tar if you are following along.

the hadoop package was only 16MB so it downloaded really fast on my EC2
server, and I downloaded it to /tmp

4:00 After downloading I had to extract the files to a dir, since the tutorial
extracted the tarbal to /usr/local/hadoop, I did the same from the /tmp dir,
but first I had to make a directory

mkdir /usr/local/hadoop

tar xzvf hadoop-0.18.0.tar.gz -C /usr/local/hadoop

4:10 in that directory I saw a build.xml, so for good measure i decided to
install ANT apt-get install ant

4:11 Break

5:50 Break over and time to get more coffee. Meet a few people from geeksugar.

6:50 I found out I was working on the wrong server, and now I am back to where
I left off, plus the hadoop user in the hadoop group now owns the hadoop dir
as the turtorial suggests 7:02 Editing the conf/hadoop-env.sh file for java6
with JED

# The java implementation to use. Required. # export
JAVA_HOME=/usr/lib/j2sdk1.5-sun

Gets changed to

# The java implementation to use. Required. export
JAVA_HOME=/usr/lib/jvm/java-6-sun

7:15 Ok at this point in the tutorial, there are some major edits that need to
be typed and I could not take it to not use cut and paste in a terminal
window, so I decided to configure winSCP, so I could telnet in with a gui and
a windows text editor.

I am also going to use the /mnt dir as my base for my datastore, as the one in
the tutorial is on the root partition and is limited to 10gb on eric’s setup
and mnt is set to grow.

In the future, it might make sense to set up an EBS (elastic block storage)
for more persistent data access.

7:45 You have to run this command as the hadoop user if you are following this
tutorial, or else the directory will have been owned by root

hadoop@ubuntu:~$ <HADOOP_INSTALL>/hadoop/bin/hadoop namenode -format

7:51 Make sure that all of the dirs in your hadoop install directory are
writable by hadoop. It will try to write to the logs dir during starup.

------
mlok
Anyone interested in opensource BigTable clones may also want to check BigData
and Hypertable :

* BigData (on SourceForge) <http://sourceforge.net/projects/bigdata/>

* Hypertable <http://www.hypertable.org/>

[http://www.skrenta.com/2008/01/open_source_bigtable_clone_hy...](http://www.skrenta.com/2008/01/open_source_bigtable_clone_hyp.html)

------
incomethax
How is this different from Hypertable, other than being written in java rather
than c++?

<http://www.hypertable.org>

~~~
SwellJoe
How is Django different from Ruby on Rails, other than being written in Python
rather than Ruby? (Also known as: how is everything on the Internet different
from everything else, other than being different?)

------
jdavid
We have been extensively looking into hbase on hadoop, its a fascinating
technology.

I would love to know how it runs as a basic DB for simple queries.

From what we have seen hbase and hadoop are about 10x slower than their google
counterparts. Other tests seem to indicated the request time seems to hover
between 50 and 250ms. I am in SF and I would like to hack on hadoop and hbase
with anyone in the area to run some tests.

------
rgrieselhuber
I just looked at the PoweredBy section in the wiki and I didn't see any
entries. I'd love to find out who is using this in production. The docs are
rather thin at this point.

~~~
jrockway
Hadoop is a Yahoo project, so it's likely that they are using something like
this.

<http://developer.yahoo.com/blogs/hadoop/>

~~~
jwilliams
Yeah, but using Hadoop doesn't necessarily imply using HBase - HBase is built
using Hadoop core.

Btw, the HBase Wiki is here: <http://wiki.apache.org/hadoop/Hbase>

------
yawl
Powerset (a search engine aquired by MS) is behind hbase development.

I have been using hbase in my project -- an interesting experience.

------
pmorici
Are there any distributed databases that can handle massively large joins?

