Hacker News new | past | comments | ask | show | jobs | submit login
MariaDB 10 (mariadb.org)
221 points by conductor on Mar 31, 2014 | hide | past | web | favorite | 66 comments

Wow, really impressive set of features. Never really used MySQL a lot outside Epiware and KDE but it sure looks like things have started to shape up quite nice for MariaDB. Multi master replication, built-in sharding and ability to access Cassandra data inside MariaDB are some things that should make Enterprise users more comfortable using MariaDB instead of Oracle.

Using the CONNECT engine you can read (and I think write?) data to ODBC data sources. This is essentially a very cheap IBM Federated system...it's even written by one of their original devs.

Just checked the docs - looks like you can do select, insert and update using CONNECT ODBC. Others are supported too via EXECSRC. Yeah that's quite cool.

I was at a Meetup.com Mysql meeting and one of the talks was on the CONNECT engine. The demo showed CONNECT working with Sybase using all the SELECT, UPDATE and DELETE commands, although the EXPLAIN command didn't show any indexes being used. Still impressive.

As someone running the default mysql install for a SaaS app, are there any downsides to moving to MariaDB? I'm assuming they are still fully compatible with the native mysql protocol?

> As someone running the default mysql install for a SaaS app, are there any downsides to moving to MariaDB?

MariaDB is essentially a drop in replacement; its also faster than MySQL:



"For all practical purposes, MariaDB is a binary drop in replacement of the same MySQL version (for example MySQL 5.1 -> MariaDB 5.1, MariaDB 5.2 & MariaDB 5.3 are compatible. MySQL 5.5 will be compatible with MariaDB 5.5)."

"This means that for most cases, you can just uninstall MySQL and install MariaDB and you are good to go. (No need to convert any datafiles if you use same main version, like 5.1). You must however still run mysql_upgrade to finish the upgrade. This is needed to ensure that your mysql privilege and event tables are updated with the new fields MariaDB uses."

>> its also faster than MySQL

I have heard this is not true for high core count scale up type systems. Has anyone here done that kind of testing?

Is 10 still a drop-in replacement for MySQL 5.5?


"With this release, MariaDB 10.0 is now the current stable version of MariaDB. It is an evolution of the MariaDB 5.5 series with several entirely new features not found anywhere else and with backported and reimplemented features from MySQL 5.6."

I want to say yes, but as always, TEST FIRST.

You can literally just type 'apt-get install mariadb-server' into Ubuntu (once you've added the sources) and you won't even know MySQL is gone. You can even still type 'mysql -p' to log in to MariaDB locally.

Careful! At least for mariadb 5.5 setup on Ubuntu 12.04 LTS, you need to pin the mariadb versions of packages because Ubuntu breaks stuff with their updates.

  # cat /etc/apt/preferences
  Package: *
  Pin: origin repo.percona.com
  Pin-Priority: 1003
  Package: *
  Pin: origin ftp.osuosl.org
  Pin-Priority: 1002

  # tail /etc/apt/sources.list
  deb ftp://ftp.osuosl.org/pub/mariadb/repo/5.5/ubuntu precise main
  deb-src ftp://ftp.osuosl.org/pub/mariadb/repo/5.5/ubuntu precise main

>'apt-get install mariadb-server'

This simple command replaces mysql in one shot? I might test it tonight if so. Do I have to run mysql_upgrade?

Got a good recommendation for a guide?

The MariaDB page has great instructions for various distros:


Just pick your distro and you're good to go. I didn't have to run anything else.

None that I've ever seen. MariaDB is a straight drop in replacement for MySQL.

I, however, am a postgres guy now. Our last MySQL project was retired a few months ago.

I can second this, it's a drop in replacement.

Postgres is still my preference for a RDBMS. Schemas, Triggers, JSON, Geo, PL, robust data types, it just offers so much more then MySQL/innoDB/MyISAM.

Absolutely, multiple schemas is a really neat feature in a SaaS app context.

You can give each tenant his own schema, that way you can even make some nice split-testing when you incorporate new features in your app requiring changes in you database. Once a fraction of your tenants have tested/approved (much better if they choose themself to take part of a beta) the new model, you are good for rolling out progressively a full implementation to all of your tenants.

I really like this idea. Have you found any persistence frameworks that support the use of schemas WITH connection pools? Also, running multiple versions of a domain model concurrently seems like it'd be a headache.

Well, you're right it's not just as easy as I wrote it. For the moment I'm just doing it the ugly way when testing 2 versions whith differences I maintain and detect directly in the application code logic. Not nice.

Yeah, it's really hard, but I think you're on to something that would be really useful. For connection pooling, if there were an easy way to set the schema search path prior to using a connection object in application code, something like that might work. I obviously don't like the extra query per request, but there are worse things! For the multi-version domain model, the only way I could think of doing it was partitioning the deployment environment according to version. So, 2n servers, n running each version, and slowly transitioning users from the old segment to the new. That sounds really hard, but not impossible. It also starts getting into some tough environment, human resource and priority questions, but for what it's worth, it'd keep the application logic cleaner! :)

Anyway, if you ever end up with something elegant on this, I would be truly interested in reading about it. It's a good example of how a DBA concept can help app dev teams, who, in alarming numbers, prioritize an RDBMS' ease of use over its power.

Well, let me explain the structure I choose :

Each tenant is assigned a subdomain (No vhost, all routed by the app to the same codebase) with the name he choosed when he subscribed. So this name is retrieved by inspecting the host header the browser send while connecting to the webapp, this name is checked in the public schema to find an appropriate uuid assigned (if it exists of course).

Something like :

  App_DB (database)
  -- public (public schema)
     -- tenants (tenants table)
        -- tenant_uuid uuid
        -- tenant_name text (same as subdomain name)
        -- coderev numeric (for split testing)
        -- some tenant general info like creation date, choosen plan, ...
     -- some other tables accessible by all tenants like shared stats, queues, etc ...
I then use this tenant_uuid as the schema name for this tenant :

  -- "b6e42fd1-d5b9-4de4-ba6b-6eca1dae06ff" (tenant_uuid schema name)
     -- users (it's a multi-user webapp per tenants, so ...)
        -- email (for auth)
        -- password (bcrypt for auth)
        -- etc ...
     -- other tables needed for the webapp
This for each tenant, so some tenants can have a different schema stucture based on their public.coderev

When the user login, his credentials are checked in his tenant schema, the tenant name and uuid are set in his session for not messing with other tenants data.

I think there are 2 ways to deal with connection pooling (If we are both talking about PgBouncer || pgpool connection pooling to be sure).

- The first one is to create a new PostgreSQL user for each tenant in order to access only his schema. Then there is no need to set the schema search path prior the connection object as by definition his search_path will be set to $user,public (http://www.postgresql.org/docs/9.1/static/ddl-schemas.html) But then, I can't really see the point of a connection pooling. I don't really like this solution, so many users, roles, passwords ... - The second (which I choose) is to connect with a role having access to all the schemas. You can set this one for connection pooling. The search_path is only set to public, and the queries use the qualified name to access the tenant data, ie: SELECT * FROM "b6e42fd1-d5b9-4de4-ba6b-6eca1dae06ff".users You can use the connections readily made available by the connection pool and query the data for the tenant you need without touching the search_path. The security is now dealt with the application and not the DB.

For the multi-version domain model, the coderev set in the public.tenants schema is retrieved with the tenant_uuid associated with the tenant name and then used by the webapp controller to route to the associated code version. No need for a second server, just a different branch on the same server. Transitioning a user then just really mean updating his schema and routing it through the new codebase. And that's why I said it's very ugly, because I still need to implement a good DDL and DML stategy in order to navigate between different versions without losing data.

Like you said, it's a tough environment. Unfortunately, I have nothing elegant to propose but I'm also interested in reading how others deal with that kind of stuff. I still lack fluency to write a blog post or something like that which could encourage debating or discussions. I'd be very glad if yourself find some interesting stuff on this topic to share it with me, my email address is my hn username @ gmail.com

As a long-time MySQL user who hasn't used Postgres for over a decade, can you share (in non-documentation language) what schemas are good for and how they're useful?

I'll take a typical context of a SaaS app, multi-tenant architecture. When it comes to store the data of your tenants, you have to decide between different approaches which will impact the level of isolation of each tenant's data. I will not discuss the benefits and disadvantages of each because it clearly depends of your needs, security guidelines and sometimes regulations regarding your business, backup strategy, geo-isolation, sharding, etc ...

With MySQL, you can choose 2 way to architecture your data : - the fully isolated way : each tenant has his own database. - the shared way : all tenants share the same schema, you usually use a tenant_id column on tables to query your data adequately.

Some other RDBMS like PostgreSQL offer you a third way between the fully isolated a shared data system : - all tenant access the same database but their data are stored on their own schema. It's up to you to decide if the tenants access their schema with the same database role (kind of user) or if you create a role for accessing solely their own schema.

Example :


  -- Public schema

     -- tenant_id uuid

     -- tenant_name (subdomain name for example)

     -- etc ...

  -- tenant (named after the public.tenant_id)

     -- your application tables

  -- tenant (named after the public.tenant_id)

     -- your application tables

  -- tenant (named after the public.tenant_id)

     -- Some beta tables model for version n+1 of your app or whatever, you can add more table columns for this very tenant if he needs special features for example.

I hope it's non-documentation language but still english language :) (Need to improve my english and my writing skills)

Thank you, that's really helpful :) Only last week I was sketching the database design (in MySQL) for a SaaS app, thinking how messy it was to add a tenant_id column and parameter onto every query...

One compelling use case I've heard for schemas is versioning.

Say you want to upgrade a stored function in your database. You need to make sure that both the new and the old version of the function are available until all application code that uses the stored procedure has been updated.

Without schemas, you'd have to add a version identifier to each function, like get_customer_5(), get_customer_6() etc.

With schemas, you can just put different versions of the functions into different schemas, and just change the search_path in your application code to use functions from the new schema. When all application code has been updated, you can drop the old schema.

Mysql has schemas too, they are just mislabeled as "databases". The fact that mysql doesn't offer multiple databases is a bit of a concern of course.

Plus IIRC you can't foreign key between them.

I was using it for a long time, and then after some update I was completely unable to get it to run on any new servers. Not sure what changed, but using the package manager route was no longer viable. Building from source also failed for reasons that are totally ungoogleable. Surely my incompetence plays a role, but I had to abandon it after every single workaround in their troubleshooting section failed to resolve the issue.

If you use replication with MySQL 5.6 and and GTID (Global Transaction ID) I heard some issues about not being fully compatible but I don't remember what it was exactly.

I'm researching migrating from Sybase to MySQL and this looks like an interesting alternative considering MariaDB is released under GNU GPL and is a fork of MySQL outside of Oracle control. Are there any weak points in MariaDB worth noting?

By all means read all the good and bad articles about MySQL/MariaDB. Just keep in mind that the presence of detrators of a technology is not a sign the technology should not be used. Just think of all those articles proclaiming Java is dead.

Weakest (or strongest, depending on your goals and tastes) are comparative newness and divergence from MySQL 5.6. This is a nice short summary from the Percona dude:


A fair portion this guy's posts seem to be criticizing people for using certain tools/techniques and how much various things suck.

Does that make him wrong?

As much as cherry-picking makes anyone wrong. Replication is one of the main reasons some big project choose MySQL and the guy ignores that point and only mentions replication in passing. I am sure you can have as good a selection of reasons to avoid PostgreSQL or any other. That does not really matter though. Some guy who never bothered to read MySQL documentation is now switching to PGSQL and will run it without bothering to read any documentation. It does not matter because his project is so trivial it would run on anything, but he is happy because he uses "the real database". So be it. And the guys who really do know what's going on and run systems with hundreds of millions of users don't waste their time writing bullshit articles like this. Guy recomends MongoDB as an alternative, what can I say.

I see this explanation a lot. However, from what I can gather:

- pgsql supports synchronous master-slave replication natively, with excellent performance

- mysql supposedly does multi-master replication, but does not solve the hard problems. You can't write on the same row on two masters simultaneously or replication stops; sequential ids are not synchronized between masters; you can't get serial transaction isolation between masters.

All in all, I'd rate replication support between both databases as nearly identical. Pgsql does master-slave better, mysql has a half-hearted attempt at multi master. Given that on all else pgsql is markedly better, I can't see replication as the reason for picking mysql.

What use is replication if the data itself has been mangled by MySQL? The significant arguments against MySQL revolve around it doing a poor job of enforcing data integrity. You go on to claim he hasn't bothered to read the documentation but give no evidence other than wild fanboy accusations that you have read it either. There are plenty of assertions in that blog post, none of which you refute; from which I can conclude you have no arguemnts against what he actually said.

replication in mysql? Well, kind of random, I remember @work people not getting it right for 3 years... because they forgot to read the documentation.

We may have too much clueless developer in the wild, imitating for security.

Like the more linux is spreading, the more the sect of the post devil spread, you recognize them with their dark incatation: chmod 777 (the number of the post devil) and chown root:root.

Lots of people build up massive amounts of technical debt by using the wrong things. Things like mysql literally hold back progress. "Stop using garbage" is an important message to spread.

Dismissing things as garbage when they clearly are not - that's just the other side of the same coin. It can equally cloud your vision for choosing the right tool for the job.

>Dismissing things as garbage when they clearly are not

That is not the case here. Did you read the site?

I read the site but I didn't drink the kool-aid!

Have you considered PostgreSQL? MariaDB would probably be fine and has the advantage of having a lot more tutorials on the internet (makes it a tiny bit easier to get started), but Postgres is the best open source database there is (in my opinion at least). I don't bother with MySQL/MariaDB for my own projects anymore (I started on it 10 years ago). I only ever use it if I'm installing something that won't work with anything else.

Why do you want to migrate away from Sybase? You'd be giving up a lot of capabilities moving to Maria/MySQL.

Given the common ancestry, your easiest transition would be to SQL Server. But if your reasons are to get out from under from vendor control, that isn't really achieving anything.

Hedging bets against license cost inflation due to the recent SAP acquisition.

I used MySQL for all my projects. However I'm following the tide and moving to PSQL. Apparently it is more flexible and has more options to tinker with.

Has anyone used the NoSQL stuff in Maria? How does it compare with Mongo and Cassandra?

I am curious about this as well. Should we take "NoSQL" in Maria literally? For instance, is it possible to talk to the database using db.find() and db.save() APIs similarly to how MongoDB works without writing any SQL?

I'm confused...isn't this just a matter of wrapping sql select, insert and update statements in the db.find() and db.save() interface? If so, how is it relevant to MariaDB rather than a translation layer in a client driver, for example?

What I'm wondering is if the database itself provides a standard set of such wrappers or if it's left to users to make them themselves.

I would love to see AWS offer a MariaDB option for RDS.

But then how is Amazon supposed to lock you into their proprietary solution to make it harder for you to migrate from their services?

What are you talking about? RDS now supports MySQL. What would the difference be in terms of lock-in?

I don't think they knew what RDS is/was. We heavily rely on AWS but can move off with little work. You have to decide from the start if you don't care about possibly migrating off AWS or not.

We chose to keep that ability to migrate off. In hindsight it was a little more thought up front but little to no additional overhead now.

AWS provides a ton of open source compliant services. When you decide to use a proprietary one just build in the appropriate abstractions.

I like MariaDB improvements over MySQL (and the technical documentation of the query's optimizations), but I'm a bit lost regarding WebScaleSQL's [1] choice to tracks MySQL. I don't know what to think... Google, Facebook, Twitter & co have immediate production needs, so are they simply in "wait and see" mode toward MariaDB ? Or is there something deeper ?

[1] http://webscalesql.org/


My guess is there isn't an enforced MySQL version across all of Google.

"We asked Google for more information, and the company sent us a statement which said: "Google's MySQL team is in the process of moving internal users of MySQL at Google from MySQL 5.1 to MariaDB 10.0. Google's MySQL team and the SkySQL MariaDB team are looking forward to working together to advance the reliability and feature set of MariaDB."

Because companies like Facebook are more interested in stuffs like InnoDB improvements in MySQL 5.6/5.7 than those nosql/sharding features which they already have their own workarounds.

From the FAQ on that page:

"Q: Why didn't you base this on MariaDB, Percona Server, Drizzle, etc....

A: We reached a consensus that MySQL-5.6 was the right choice for this, as it has the production-ready features we need to operate at scale, and the features planned for MySQL-5.7 seem like a fitting path forward for us. We will continue to revisit this decision as the ecosystem evolves."

MariaDB 10 just hit after the launch, much less conception, of Webscale, and still doesn't have all of the features from MySQL 5.6 in it.

No clue. I'd personally stick with Maria or Percona if you're looking for serious improvements to MySQL by extremely talented engineers.

I have been playing with the multi-source replication feature for quite some time. It works like a charm for most versions. It had some issues replicating events from Percona 5.6 but that was with 10.0.2, I believe. I'm hoping they have been resolved by 10.0.10 The advantage of Maria is that they have features over and above the ones they pull in from MySQL branch.

Although it's a fork of MySQL, but turns out to be a successful fork http://news.dice.com/2013/05/22/mariadb-vs-mysql-a-compariso...

Probably because it was made by the original creator of MySQL.

Is it true that for Java and an other JVM languages MySQL/MariaDB is preferred over PostgreSQL cause their Java driver is better supported(better quality and performance) and is more feature complete?

Seems like a tendency to name software after female names from the victorian era. Interesting.

MaxDB, MySQL and MariaDB are all named after Monty's children


Maria was the 79th most popular female name in California in 2012 (the most recent year data is available for)[1].

Far from being a Victorian-era name, it was most popular in the 1960's, and remains more popular today than it was during the Victorian era[2].

However, it was a popular name in literature during the Victorian era[3], with mentions hitting a peak in 1908 (technically that was the Edwardian era, but close enough). Depending on your smoothness settings, it does appear that Maria was actually more commonly mentioned in 1999 than in 1908, though.

Interestingly, there is a dramatic dip in mentions during the 1920s. I don't have a good theory as to why this was.

[1] http://www.ssa.gov/cgi-bin/namesbystate.cgi

[2] http://www.babynamewizard.com/baby-name/girl/maria

[3] https://books.google.com/ngrams/graph?content=Maria&year_sta...

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact