Hacker News new | past | comments | ask | show | jobs | submit login
The lack of proper “alter table” support in SQLite (dustycloud.org)
90 points by paroneayea on June 16, 2013 | hide | past | web | favorite | 48 comments



I am very familiar with SQLite internals. The answer is already in there. SQLite stores each row as each column value encoded sequentially corresponding to the declared order of the columns. Changing column order or deletions/inserts require a rewrite of every row. The one special case that is allowed is adding a column on the end of the schema providing it has a default value.

A SQLite provided ALTER TABLE implementation would do exactly what was stated - start a transaction, rename the existing table to a temporary name, create a new one with the desired schema, and copy data across mangling as appropriate before deleting the old table and finishing the transaction. For plain tables this is no big deal, but for more complicated ones there are a lot of issues such as foreign key references, constraints, indices. The majority of the code would be dealing with all these conditions and interactions.

It also wouldn't be any more efficient than code the developer writes - ie there are no shortcuts available to the SQLite developers that aren't available to developers using SQLite. The net effect would be a larger library (they limit to a 250kb library size so something else would need to be chopped), some complex documentation and a heck of a lot of testing code. For something that is relatively rarely needed (see requests on the mailing lists).

The chance of something like this ending up in the SQLite core is extremely slim, so you could never depend on it being there anyway.

SQLite does have several things to help. There is a user_version pragma you can use to keep track of the schema version and use for upgrading. You can temporarily disable foreign key and constraint enforcing. There are numerous pragmas to get table metadata. The table definitions are stored as SQL strings in a sqlite master table, and a pragma allows you to make that writeable.


Thanks for that helpful reply! Interesting to see that what I wrote out isn't far from what reality would be. I appreciate your informed input here.

But still, as I described in the post, all these bits of tooling that other developers write seem to break all the time anyway. If you look at the quote from the Alembic developer, you can read the level of frustration and spent time that's been put into this. If this is how it's going to work, why not just do it in one location, as I said? All this tooling that's trying to add workarounds seems to be constantly breaking. I think that sqlite could do a better job implementing that logic itself. At the moment, with MediaGoblin, we're trapped and our developers and users are fighting these problems on a very regular basis.

I believe that the constant reimplementation of this in migration frameworks is indication enough that it really is desired and would be useful. Again, I encourage: if there's an experienced developer of sqlite, try testing my theory by running a crowdfunding campaign to add the feature. I bet you'll have proof enough through donations that this is something people want.


I don't see what the actual difficulty is. Being able to do arbitrary schema migrations is considerably harder than making it work for your own data. In my own setup code I look at the schema version and know how to go from version 3 to 4 to 5. It is usually a handful of lines of code for each.

I've reread the content again, and something unstated appears to be the use of an ORM (there are several good posts about why they are evil). ORMs already place several layers of indirection and abstraction between code and the database. It is hardly surprising that the contortions make it harder for them to do things. However I couldn't find an explanation as to why they think the copy and adjust approach would be so hard. Perhaps they are unaware of SQLite functionality like getting the table as a schema or a column by column basis.

I've been on the sqlite-users mailing list for a decade, and use the frequency of requests there to gauge the mood. Alter table is very infrequently mentioned.

It should be noted that just having some code that implements alter table is insufficient. If it doesn't become a standard part of the SQLite core then you can't depend on it being there. And even if it did, you would have to wait years before it is present on the SQLite versions used by stable/enterprise distributions.

If you want to crowdfund this, contact the SQLite developers asking for a quote and use that. http://www.sqlite.org/support.html

You'll also need to establish what you want to happen - eg if a column affinity changes what happens to existing data? What happens to constraints and foreign keys? What should it do if a transaction is already active?


Let me jump in as somebody very familiar with two of the tools mentioned in the OP--sqlachemy and alembic.

> posts about why they are evil

There are legitimate flame wars to be had about ORMs being "evil", but they all have to be had in the context of using them to solve a particular thing. It is true that ORMs do not solve all problems for all people, but it is also true that they solve some problems for some people. There is no point in hashing it out without knowing what OP or me or you or some particular person is doing.

> being able to do arbitrary schema migrations is considerably harder

Being able to do arbitrary schema migrations is not a feature anybody is seriously suggesting, either for SQLite, sqlalchemy, alembic, or any other tool. What is being suggested is merely that SQLite support ALTER TABLE, a very particular, well-defined feature that has been standardized since at least 1992. In the intervening two decades I am not aware of any serious proposals to the effect that the standardized behavior is flawed, although there are various vendor-specific extensions. If you would like to raise an objection to the standard, I would be greatly intrigued.

> I couldn't find an explanation as to why they think the copy and adjust approach would be so hard

All snark aside, four out of five SQL vendors support alter table. They would be doing this logic solely for SQLite. Why would they do that? The path of least resistance is just to tell people to migrate to MySQL. That's what people who need migration support end up doing.

> alter table is very infrequently mentioned

You seem to anticipate that users will discover that SQLite does not support alter table correctly, and that following this discovery they will continue to use SQLite. My experience is at variance with your anticipation.

Specifically, I don't post to SQLite-users, first and foremost, because I am not a SQLite user, because it does not support alter table. The sampling bias would actually preclude you from hearing about any really critical feature requests.

> you would have to wait years before it is present on SQLite versions used by stable/enterprise distributions

Literally an argument against doing anything to SQLite ever. Is this a common objection to feature requests on the mailing list?

> You'll also need to establish what you want to happen

OP needs to establish nothing, this has literally been decided a full eight years before the first line of SQLite was ever written.

I know you're thinking "SQLite is fine, go use MySQL if you need this alter table nonsense". And that's precisely what people who do a lot of migrations end up doing. I've migrated every SQLite codebase I've committed to this year to either MySQL or Postgres or NoSQL. Every last one.


Neither I nor the SQLite team think that SQLite is the solution to everyone's database use. As they say it is a good alternative to fopen: http://www.sqlite.org/whentouse.html - it is not a project goal to be a complete SQL implementation or to be a drop in replacement for MySQL/Postgres/Oracle etc

SQLite's manifest typing is a complication for ALTER TABLE that doesn't affect other databases.

On the ORM side, my point was that layers of abstraction and indirection will make things harder. Is your statement that they would have to have SQLite specific code the only reason why it isn't done, or is there something else considered hard?

While it is true your voice isn't heard, enough people do post to sqlite-users to get some idea of what the various issues are and a rough idea of their popularity. For example a far more common issue is that SQLite does not work well if you use a networked filesystem (you will eventually get corruption).

Timeliness is a practical issue. ALTER TABLE can't be used until it is available. The way the SQLite developers consider best practise is you embed the SQLite library statically inside your app and you write specifically to SQLite: http://www.sqlite.org/amalgamation.html

There are no legal or technical reasons preventing anyone to cause this to be implemented separately from the SQLite core. Wanting it to be part of the core is very different.

You seem to have an underlying assumption of zero sum games or SQLite being in some competition with all other database engines. It isn't. It doesn't lose by developers using MySQL or MongoDB. Heck I use MongoDB far more than I use SQLite. Developers who need comprehensive ALTER TABLE support and find the copy table with modifications approach too onerous are far outside of SQLite's sweet spot. That is ok.


I agree with you. This kind of functionality is much better implemented in a third-party tool like sqlalchemy-migrate or alembic that OP mentions in the article.

I much rather a smaller sqlite library than one that is burdened with functionality you just don't use 99% of the time.

The benefit of making an "official" migration library for SQLite is probably outweighed by the opportunity cost of having it developed, tested and kept up-to-date.


This makes no sense. Who's more capable of keeping a solution for this developed, tested, and up-to-date, the core developers of the database or a third party that's probably hoping to achieve something other than a migration tool?

It's worth noting that it's not like rewriting the whole table is a solution to this that's unique to SQLite. MySQL's two main table types also do this in order to alter a table in certain (very common) ways. But they don't make you do it yourself, because it's deceptively difficult to do it right.


Look at the issue a different way. What functionality needs to be removed from SQLite in order to add more comprehensive alter table? SQLite is defined by being "lite" and that involves a lot of saying "no". You can use other database solutions.

SQLite is also public domain. You can cause this code to be written and have complete freedom to redistribute/keep private as you see fit.

So what is actually being demanded here is that the SQLite core developers spend their time on the feature. SQLite used to have no table altering functionality at all. The ability to append a column (which had few technical risks or ambiguity) was paid for by AOL.


Yep, and I said in the article that I'd be happy to pay for adding other ALTER TABLE methods... if someone ran a funding campaign, I'd donate quite a bit of money. I still suspect that if such a campagign was run, we'd see a good set of money come in from the community for it.


The problem though is that Alembic doesn't want to engineer itself around implementing the tooling because it's so painful (and the author of that tool is himself asking for this kind of alter table support) and sqlalchemy-migrate breaks all the time. And sqlalchemy-migrate breaks all the time, and when I used South and django-migrations (granted, that was years ago), those also broke all the time too. The external migrations path isn't working.


So write and apply such modification scripts manually. It's not difficult, and you can tailor it to your table's unique or unusual properties, as needed.

I've seen too many people waste far too much time and effort with automated migration systems, for what amounts to very little saving even in the cases where it does work well.


> This kind of functionality is much better implemented in a third-party tool like sqlalchemy-migrate or alembic that OP mentions in the article.

Yes because now every SQL migration library of every language has its own partial bug-ridden of a full alter table for sqlite. That does sound so much better than having a single implementation in sqlite itself (wait no, it does not even remotely).


But now every user of SQLite needs to deal with a library that's larger, and includes non-critical functionality that goes unused.

The whole point of using SQLite in the first place is to avoid such overhead, knowing full well that some non-essential functionality is being lost as part of this tradeoff.

I think it's far more important to keep SQLite small and nimble for the many users who are facing constraints beyond their control (imposed by limited hardware resources, for instance), rather than bloating it for the sake of a small number of users who are unwilling (or too lazy?) to write proper manual migration scripts now and then.


SQLite make it clear when you could use it: http://www.sqlite.org/whentouse.html

It is okay not use SQLite, and no one will be upset if a different database is used. Every source file starts with a blessing!


I believe that in the vast majority of cases MySQL's ALTER is 'rename, copy-with-changes, drop' - memory sayeth that LiveJournal used to only ever add columns during normal migrations, and then go back and clean up the 'dead' ones every so often during a maintenance window.

Having the relevant rename, copy-with-changes, drop logic implemented -once- within SQLite would seem to me to be a nice thing to have; it's probably the thing that annoys me most about SQLite (honestly I'd rather have got support for this than foreign keys, which I think speaks to the "you'd have to drop something else" argument).

Perhaps a workable compromise would be to write a C extension that handles it once, and then try and convince vendors to add that extension to the 'recommends' list for the SQLite package - and then, if that code proves widely used, have the argument about implementing it in core at that point.


Side question: how much in demand are recursive CTE's? I, particularly, hack recursive functions around all SQL View queries, but it kludges the code up quite a bit. Am I the only one with this problem?


I don't remember anyone asking on the mailing list, but you can search the archives. SQLite does let you do select operations on selects: SELECT ... FROM (SELECT ... FROM ..)

The other important thing SQLite has available is user defined functions. Because SQLite is running in the same process as your query code, you can add your own scalar and aggregate functions that do anything you want.

It also has a virtual table mechanism so you can provide tables of any shape and contents you want based on other data.


A sub-select is not a recursive CTE; think more along the lines of "find me all children of this Hacker News post #" where the scheme is simply "id | parent-id | poster | text": to do this requires recursion. I could totally be misunderstanding the reason you bring them up, however.


I brought up subselects because they are a solution some of the time to some of the problems.


ALTER is in itself an odd command, useful only in development, never in production, and bringin with it deep architectural implications and an endless list of problems in basically all SQL systems today.

In SQLite, an embedded database, it is certainly out of place. Drop, rebuild, and move on with life.


Why is ALTER never useful in production?


I wouldn't say that ALTER TABLE is never useful in production, but you should try to avoid it, especially if you have large tables. Many implementations of ALTER TABLE, lock the table, which means that no process can write to the table, until the ALTER TABLE command completes. If the ALTER TABLE statement takes minutes, or a few hours to complete, then you will have degraded performance or downtime.


Oh yeah that's very true, many SQL engines seem to lock tables whenever a DDL statement is run. I do like that Postgresql tends not to, it is a huge change from the hoops you have to jump through to add columns in a production MySQL scenario.


"ALTER is in itself an odd command, useful only in development, never in production"

How would you then ever modify an existing database?


> How would you then ever modify an existing database?

You would use techniques like:

https://github.com/soundcloud/large-hadron-migrator

"""

That is until your tables grow to millions of records. At this point, the locking nature of ALTER TABLE may take your site down for an hour or more while critical tables are migrated. In order to avoid this, developers begin to design around the problem by introducing join tables or moving the data into another layer. Development gets less and less agile as tables grow and grow. To make the problem worse, adding or changing indices to optimize data access becomes just as difficult.

Side effects may include black holes and universe implosion. There are few things that can be done at the server or engine level. It is possible to change default values in an ALTER TABLE without locking the table. The InnoDB Plugin provides facilities for online index creation, which is great if you are using this engine, but only solves half the problem.

At SoundCloud we started having migration pains quite a while ago, and after looking around for third party solutions, we decided to create our own. We called it Large Hadron Migrator, and it is a gem for online ActiveRecord and DataMapper migrations.

The basic idea is to perform the migration online while the system is live, without locking the table. In contrast to OAK and the facebook tool, we only use a copy table and triggers.

Lhm requires a monotonically increasing numeric Primary Key on the table, due to how the Chunker works.

"""


.. or you use a real database engine that support schema changes in background transactions, like postgres.

Also, it is good to remember that not all production databases will grow large. Many stay small with a limited number of rows, but will still need to be updated as the product is updated.


OK, I guess the issue is with the "never in production". One size rarely fits all. I can have a lookup table with three rows in my database, for which Large Hadron Migrator is hardly an appropriate solution. ALTER will work just fine. For large tables - yes, a more complex migration solution is in order.


The ALTER doesn't have to be in the embedded code. It can be a separate utility.


The migration libraries mentioned in the article basically are the "separate utility" that you describe. Yet apparently the developers of these libraries are unable to implement this functionality properly on their own, for whatever reason. So they want the SQLite developers to do so instead.


tell that Drupal when they go from one version to the next.


> a successor called Alembic founded by the same core author as sqlalchemy-migrate

Alembic is not founded by the same core author as sqlalchemy-migrate. Alembic is founded by Mike Bayer who is the core author of SQLAlchemy itself.


Thanks for that correction. I updated the post.


It's probably relevant to note that both Android and iOS ship with SQLite as the de-facto database storage engine, so there's very much a use case for having reliable in-place migrations in SQLite on modern software.

I could certainly see the value of having more robust ALTER TABLE support in SQLite itself, and I don't find the argument that such functionality will go unused very convincing. In real world scenarios almost everyone is going to have to migrate at some point, and the easier this is to accomplish, the better.


> I'm pre-pledging $200.00 towards fixing the problem

Nice idea, but wouldn't he have to find about one hundred other pledgers to make this financially viable?


I believe that's the entire point of this blog post. I'm rather puzzled to see the negative attention it has received. Is this a huge pain point? No, but it is a constant niggling annoyance when using SQLite to power things like desktop apps. If not built in to the core, I believe that SQLite supports compile-time extensions, would someone more knowledgeable comment? I wonder if the OP would be satisfied with one of those? I personally would be fine building my own SQLite to get the functionality and it looks like the author would as well. In short, I'm kind of confused as to why everybody seems so down on this idea. If an extension could be developed to do this, and if this extension would meet the OP's needs, I'm not sure why we're all going on about how it would ruin the sanctity and tininess of core SQLite. I'll put my $200 pledge here.


Yeah, I'm also surprised to see people so upset about the idea of the feature mostly for the argument that it doesn't presently exist. As for the compile-time-extension, my main concern with that as a free software project is that we try to play nice with distro packaging and requiring users to compile their own sqlite would be setting them up for a bit of pain. But I suspect it doesn't matter: if the feature was coded with the aim of pushing it upstream, I'm suspecting there's a good chance upstream would accept it if of good quality.

Then again, I'm not upstream. :)


Code is unlikely to be accepted upstream because it very rarely is (assuming you mean by the SQLite team). They are very picky about code contributions and almost always implement themselves. One part is legal - there has to be very clear lineage/authoring to the code, patent clearances etc.

The second part is that the actual code to implement any particular functionality or even a bug fix is relatively trivial. There is far more effort and far more lines of code for testing: http://www.sqlite.org/testing.html - and implementing code to enable the rigourous testing usually results in a different structure.

The third part would be conflicting with the existing goals: http://www.sqlite.org/whentouse.html

OTOH if the intention isn't to be part of the core then upstream doesn't matter and the code can be distributed in any way you deem fit.


> but we also want people to be able to run smallish installations for themselves or their friends and family as well

So, what's the problem in running PostgreSQL or MySQL in small installations? They really need small amount of space and memory to be installed and operated.


On a extremely big picture 50000 foot level, there's the windows model where you put everything into one piece of software and you run that one piece of software on one server (real or virtual). Its philosophically anti-windows to have more than one piece of software installed per OS. So you want a "something" and a database server, well now you need two installs / licenses / hardware / images / admins / whatever not just one. And in windows philosophy you never run user stuff on a server anyway so right there you may have issues with a "user" app and a "server" db system requiring two systems.

Its hardly all peaches and unicorns on the unix side either. Seriously, you're going to install 17 completely separate, different versions of "magic embedded mysql" or whatever, and have no way to do joins across DB servers or system wide DB backup system? Crazy talk. Why not have every programmer write their own homemade strcmp() complete with different buffer overflow bugs on each version... So the whole concept of a "private" database violates quite a few unix philosophy principles. Handling multiple versions of DBs across different distros is a puzzler for the devs rather than just including one version and calling it good.

The whole mess from the 100K standpoint is DBs and DB apps are too new, too much technical churn, too much incompatibility. For example, why are you installing a new DB and replacing all your API code if all you want is a different engine perhaps (or perhaps not) better suited to your workload? Its conceptually like switching your whole infrastructure to a new word processor in 1985.


No, you just install single software, and dependencies like mysql or postgresql are installed automatically as dependencies (just as other libraries).


There's still time and friction involved in getting an instance up or adding a database to an existing instance.


It should all be done via apt-get's after-install hooks of your software. More on that, you can do things like supporting any of mysql or postgresql if they're already installed.


SQlite might be meant to be light, but surely they could implement rudimentary alter table support (perhaps through the create new and rename strategy mentioned here)? The alternative is that each client have to code their own version. I can understand if they say no to features which are rarely used, but if something would be used by the majority of the users, I cannot understand that they would say no for 'lightness' sake.


SQLite is a replacement for fopen(), not a database.


The H2 open source java database implements "alter table" exactly as per the description of sql alchemy: create a new table with the new structure, copy all the data across, delete the original table, rename the new table to the origina name.

This could perhaps be acceptable for sqlite.


SQLLite is "lite"

If you want alter table support, use a SQL database that provides it. There are many.


So he gets a free, lightweight, immensely practical tool. His answer: complaint.


If you're using a given piece of software, and you run into problems, it's perfectly acceptable to complain. In fact, it's one of the most helpful things you can do. Even if it doesn't help resolve the problem directly, it at least helps make the problem more widely known. This can be very useful information for people investigating the use of such software.

It doesn't matter how much, if any, you may have paid for the software. If it doesn't work in a particular case, let it be known.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: