Hacker News new | past | comments | ask | show | jobs | submit login
"How hard can it be to implement?" (37signals.com)
134 points by adamhowell on Sept 8, 2010 | hide | past | web | favorite | 97 comments



Is there really no simpler way to solve this problem?

"Moving a message needs to move all of the message's comments, and all of the comments' files, and all of the comments' files' versions."

Why can't you just change some top level reference in the database? I'm imagining a Projects table and a TodoLists table. Each TodoList has something like a projectID foreign key right? Why can't you just change that and automatically have all of the messages, etc, come along with the TodoList?

"But I hope you will see that sometimes even the simplest feature can be much more complicated than it looks from the outside."

I agree there, just wondering why the simple solution doesn't work in this case.


You're falling into the same trap all programmers do when they perpetually underestimate the time to accomplish any task. The main difficulty in estimating time is that you don't know what you don't know, so you have to actually get into the nitty gritty of implementing it, and if you're smart enough you'll hopefully catch all the requirements before you actually launch it.

I'm going to take Sam's word on this that there is not a simpler way, because he's the one working on it, and I trust the competence of the 37s dev team. That's not to say that some armchair analysis here on HN might not have valuable insight, but without actually seeing the code, and in light of the full list of points Sam mentioned, I doubt very much that there is a much simpler solution of any form.


Yes, to paraphrase Mr. Rumsfeld, when estimating large system changes you can see the known knowns and have an idea of the known unknowns but the unknown knowns and unknown unknowns are hiding like the underwater part of an iceberg. Bad estimates are usually caused by not taking into account the unknowns.

How is that for a mixed metaphor?


> The main difficulty in estimating time is that you don't know what you don't know, so you have to actually get into the nitty gritty of implementing it, and if you're smart enough you'll hopefully catch all the requirements before you actually launch it.

That's certainly true for a low-level, high-precision estimate, which you can only make when you've got detailed requirements. Prior to that you still need to be able to deliver an estimate, and the best approach to take is to provide an estimate with Quantified Uncertainty. The phrase is important, because the estimate is for managers and managers are all about quantifying risks, and uncertainty in a development estimate is definitely a risk.

When you say "I don't know how long this will take, because I don't know how much I don't know about the solution" you're giving an estimate with uncertainty, but you're not quantifying the uncertainty. It's got no bounds, it's limitless. Managers can't make any decisions with that, so it's useless information. But if you say "This will take 4 to 8 effort weeks, and I'll be able to give a more precise estimate after 2 effort weeks of investigation" that gives your manager something to work with. You're still uncertain, but you've quantified the bounds on the uncertainty, and decisions can be made. For example, solving the problem might not be worthwhile unless it happens within the next two weeks (eg: to meet a release deadline) so the manager can just delay the work until later.

The trick with this approach is setting the bounds. The man who taught me this style of estimating called it a Surprise Range. The lower bound should be an estimate such that you'd be surprised if it took less effort than that, and the upper bound should be an estimate such that you'd be surprised if it took more effort than that. Given whatever information you have about the requirements, you just keep pushing your bounds until you can honestly say "I really don't think it would take less/more effort than that."

As you get more experienced, you'll naturally start accounting for the unknown unknowns, because your past experiences with them will be nagging in the back of your mind while you're trying to come up with an estimate for some new task. This is especially true if you're estimating work on a project you've worked on before; you'll have a feel for where the trouble spots are and whether or not the task in question is going to get into those spots.

The other trick with this approach is the "more precise estimate after some investigation" bit of the statement. You need to provide some estimate now, buy some investigation time, and commit to another estimate later. Most of the time the second estimate will fall somewhere within the bounds of the first one, but not always, and your manager should be aware of that possibility. But with experience, you'll usually wind up within the original bounds and narrower than the original bounds too, which is quantifiably less uncertainty than before so you can show progress.


Company wanted to port an OS to 32-bit (yes I know that dates me). I spent a day, reviewed the code (500 modules), wrote a report - two manyears. Project deferred (ashcanned).

A month later my peer, a senior Engineer with the Company, thought "We should run 32-bit! How long can it take?", checked out the source and started changing files. Gradually people noticed what he was doing, resources got added under the table, finally a year later he had something working. Lots of kudos, smiles, what a hero!

I showed him the report I filed a year ago - and he said "yeah, that's about what I had to do. I'm glad I didn't see that to begin with, I would never have started".


Two man-years can easily become one calendar year, if 2-3 people are working fulltime on the project.

Also, "two man-years" isn't a complete estimate using my approach. "One to two man-years" would be a complete estimate, providing a range. Sometimes you really do come in near the bottom of the range, and maybe a month into it if you re-estimated you would have come up with a lower range.


This is a reasonable method, but I think it's worth remembering that you have not actually "quantified the bounds on the uncertainty." What you have done is quantified what you think are the bounds on the uncertainty.


Of course you're right, but part of the nature of uncertainty is that you can't know the bounds for sure. That's why it's an estimate rather than a commitment.

This brings up another point of contention between development and project management. When developers say "estimate", they mean "best guess in the face of uncertainty", but project managers tend think of developer estimates the same as plumber, electrician, and car mechanic estimates. Those guys are saying "I'm not sure exactly how much work I'm going to have to do, but if you pay me $X I'll do it." They inflate $X so that most of the time they wind up overcharging in order to cover the occasions when they wind up doing much more work than expected. The $X is a commitment, and the estimate is how much work is needed.

It doesn't work this way in software development, because there are factors out of the developers control. The developer can say how much effort a task is likely to take (the ranged estimate) but not how much time (the commitment.) The time depends on other things the developer has to work on, both expected and unexpected, dependencies on other developers and tasks, holidays and days off for the developer and others the developer is dependent upon, etc. That's all stuff the project manager is supposed to keep track of, so the developer doesn't have a full picture. Unfortunately, most project managers don't recognize this, so they treat the effort estimate as a time commitment, set an arbitrary start date, and expect the developer to be done X days later.


Wow, that is my experience exactly.


> I trust the competence of the 37s dev team.

I don't, in large part because this shouldn't be a complicated problem to solve.


Yes, in your armchair analysis that takes into account none of the issues that they've dealt with in getting where they are today, any past architectural decisions that make this particular feature difficult to implement indicate incompetence.

Your confidence probably serves you well (let me guess, early 20s?), but in this case you literally don't know what you're talking about.


Are we really taking cheap shots about age now? This is not how an argument is won and not how we do things around here. Your point stands on its own without ageism.


Fair point, but this type of hubris is far more common (and more forgivable) in youth. Is it ageist to recognize this fact?


It is irrelevant and very disturbing for the discussion. So it may not be exactly ageism, but it is definitely inappropriate.


You're right. That was a mistake on my part, I'll take it into consideration for the future.


I've seen it at about the same rate in old and young. Though older people seem less likely to recognize and correct their error.

But that's from personal observation, and I haven't made a point of recording my findings for analysis. I suspect you haven't either.


I'm not sure why you consider it hubris for me not to trust the competence of other developers without evidence.


> (let me guess, early 20s?),

No.

> but in this case you literally don't know what you're talking about.

I don't need to know what exact decisions they made to be able to tell that they were poor decisions. Easy things should be easy, and when they're hard, they're hard because of incompetence somewhere in the process.

FWIW, I've worked on a site extremely similar to basecamp (but with far more traffic, at least according to Alexa). I'm not speaking from inexperience: if this problem is hard for 37signals, it's because of failures of the implementation, not because the problem is intrinsically hard. Engineers who don't recognize that are not ones whose competence I would put trust in.


Pick 10,000 random features and aspects of a software program to be implemented over a 10 year period. Some architectural choices will benefit some and harm others. There is no choice that will make "all easy problems easy". You can't just arrive with one specific feature pulled out of thin air 6 years later and claim that if it is not easy than the system architects were incompetent. Do you really not see that?


> You can't just arrive with one specific feature pulled out of thin air 6 years later and claim that if it is not easy than the system architects were incompetent.

The problem is easy. Seriously, it really is. Most of the programmers here have probably done something almost exactly like it ten or more times in their relatively short careers; I know I have. There is no real intrinsic difficulty to the problem.

I'm not saying that the programmers are 37signals were incompetent. I'm saying that I have do not trust that they are competent. I have no reason, especially in the face of this evidence, to believe that they're competent. They made a successful software product: so have lots of other programmers of questionable competency.

The fact remains that at least in this example, one of their programmers points at an intrinsically easy problem and says, "This is really hard because I'd have to do a lot of stuff that I really shouldn't have to do" except he doesn't seem to realize that he shouldn't have do those things. So tell me, in face of that obvious fact, why should I trust 37signals' engineers' competence?


The problem is easy. Seriously, it really is.

It's easy in the simplified mental model that you have constructed devoid of real world context. Any engineer worth their salt knows the devil's in the details. I'm going to stop arguing now because you're not even responding to my actual argument. It's not a zen thing, you should be able to get it if you actually read my comments, but just in case you need a koan, ponder this fact that I would bet my life savings on:

There exists a potential feature which could be implemented in Basecamp faster than in your product (and vice versa).


> It's easy in the simplified mental model that you have constructed devoid of real world context.

I constructed my mental model based on the detailed description by a BaseCamp engineer of what he'd have to do in order to solve the problem in his system!

> I'm going to stop arguing now because you're not even responding to my actual argument.

Welcome to my world: you've been arguing this whole time as if I were claiming that BaseCamp engineers were incompetent when I've only been saying that I don't trust that they are competent.

> There exists a potential feature which could be implemented in Basecamp faster than in your product (and vice versa).

There are probably many such features. That's really beside the point, unless you're claiming some additional knowledge here that the decisions made in this particular design contributed to the ease of those potential features. If you're only assuming that because you trust the competence of 37signals, your entire argument is circular and depends on the very claim I'm contesting.


Welcome to my world: you've been arguing this whole time as if I were claiming that BaseCamp engineers were incompetent when I've only been saying that I don't trust that they are competent.

Do you realize how weaselly that is? What does that mean? Connotatively it takes a huge swipe at the 37s team without actually making any commitment. You might as well have said nothing at all if you're not going to take a real stand.


> Do you realize how weaselly that is? What does that mean?

There's a distinct difference between saying that you know someone is incompetent, and saying that you don't know that someone is competent. It's the same as the difference between agnosticism and atheism.

What I am also saying, and what you're probably really arguing with, is that the linked article constitutes evidence against 37s' competence. That's still very different than saying that it shows that they're incompetent, which is how you've been mischaracterizing my posts.


There's a distinct difference between saying that you know someone is incompetent, and saying that you don't know that someone is competent. It's the same as the difference between agnosticism and atheism.

Understood, but it's still weaselly because you're offhandedly casting aspersions on a group of people's ability ("I have no reason to believe those people are competent") and then pretending like it doesn't hurt their reputation because of the exact words you spoke ("I never said they were incompetent, just that I have no reason to believe they are competent").

Agnosticism vs atheism, while logically the same distinction does not have this libelous aspect to it.


Should you implement something with an array or a linked list? That depends. Should it be fast to locate an element in the middle? Or should it be cheap to add another element regardless of the size? Ideally, both, but you weigh the pros and cons and choose one. And even if it is the right choice, you might still run into situations where the other choice would have been better and people will comment that your decision was stupid.


This is not necessarily about whether the denormalized database structure is appropriate. It is more about:

Why the h... doesn't their ORM, or even better their database, already take care of cascading updates?

Why does it have to be implemented manually for every code that updates something? (... which is of course expensive and prone to errors)


Exactly. I can imagine that they thought they were being very clever using ORM + MySQL instead of writing SQL to run on a more mature RDBMS. Well, decisions like that often result in you painting yourself into a corner. There's a reason that MySQL is free yet people still pay for Oracle...


I was not thinking of Oracle here. PostgreSQL is free software and offers almost everything you expect from a real database.

So I guess the main reason people pay for Oracle is because they only know about MySQL and not about PostgreSQL.


You're probably right. I've never understood why Postgres has such a small mindshare. Between Postgres and SQLite, there ought to be no ecological niche for MySQL to exist...


MySQL does (did, at least; Postgres has been improving) replication better. Replication is really important for high availability, which almost everyone needs.

The irony, of course, is that a single Postgres server is frequently (IME) more reliable than multiple MySQL servers even when the latter is setup for HA.


In addition, replication is the next thing where PostgreSQL is vastly improving (according to their road map).

Also note that if you want to trade some of the ACID properties for better performance and replication, the so-called NoSQL databases (CouchDB etc) seem to be a better trade-off than, say, MySQL with MyISAM instead of InnoDB.


Whatever the case, when I see a programmer doing a linear search through a linked list, I think you must agree that I'm justified in saying that I have no reason, based on that observation, to trust in that programmer's competence. In fact, I think it could easily be argued that in the absence of additional data, it would actually constitute a reason not to trust the programmer's competence.

While I fully agree that trade offs like yours do occur in our profession, their mere existence is no argument against my initial non-trust of 37signals' competence: you need to actually demonstrate that this problem is one of those cases, and I think that would be difficult to do.

The problem of associating entities in a way that their associations can be modified easily and consistently (i.e., normalization) is effectively a solved problem. Just like the programmer complaining that his program is slow because he's doing a linear search through a linked list, a BaseCamp programmer complain that a requested feature is difficult to implement because their data is in an abnormal form does not inspire confidence, and certainly gives no reason to trust his competence.


> Whatever the case, when I see a programmer doing a linear search through a linked list, I think you must agree that I'm justified in saying that I have no reason, based on that observation, to trust in that programmer's competence. In fact, I think it could easily be argued that in the absence of additional data, it would actually constitute a reason not to trust the programmer's competence

In Erlang, the most common data structure is the linked list. There's no way to search them other than linearly. You are claiming that in general we should suspect that Erlang programmers are incompetent?


Only the ones that won't switch to a log-performance data structure when it's causing them trouble.


If the worst case size of that linked list is small, then I would say that I would trust that programmer more than the one that throws every searched linked list into a tree structure regardless of size.

In the case of sharding by project, it actually seems like a decent move given that a single Basecamp account has potentially large space requirements (75 GB), and you can't realistically provision enough space on a box to meet all the needs of a bunch of expanding accounts on that box. Being able to throw an entire project onto a new server and assuming that everything associated with a project is on that one server are both nice simplifying assumptions which seem like they would strike a nice balance.

Please stop painting everything with broad black and white strokes, programming and business in general are frequently more about striking the right balance of compromises than about knowing the "best" way to do things. There's usually not a universal "best". 37Signals seem to be doing a marvelous job balancing those compromises between their various disciplines, especially given their massive success with customers.

Edit: This is assuming they're storing attachment files on the same machine, which might very well be wrong. It would be pretty hard for one group to get up to 75GB of todo's and other text content...


Basecamp is written in Ruby, so the choice is really a hash or a list. Either way, it's not likely to be your bottleneck in a web app.


The strategy you usually take with scaling a relational database is sharding, partitioning and de-normalization. And given that Basecamp has millions of users I don't imagine their database structure to be pretty or normalized.

This said, if they have their database sharded by project_id then it would not be that big an issue, but it can be that their database structure is very complex or messed up...


> And given that Basecamp has millions of users I don't imagine their database structure to be pretty or normalized.

Millions of users is not really that many. It's certainly within the realm of what can be reasonably vertically scaled.


You don't want to scale vertically once you hit millions of users and most other web companies like Google, Facebook, Yahoo and Microsoft have proven that horizontally scaling without big iron is the way to go. Currently the only way to scale a relational database such as MySQL or Postgre is by sharding and partitioning - these things ruin most good relational properties that your database structure may have.


> You don't want to scale vertically once you hit millions of users

Hundreds of the top websites in the world (the majority, I would hazard, though without the data to support it) have scaled vertically to millions and tens of millions of entities just fine. Far more than have scaled horizontally. It works. It's been done. And it doesn't give up the sort of transactional niceties that make problems like this easier.

> most other web companies like Google, Facebook, Yahoo and Microsoft

You're confusing "the very biggest web companies" with "most other web companies." Most other web companies continue to use commodity products, and Google/Facebook/Yahoo!/MS certainly would (and do) insofar as it's possible at that scale. Expending resources now to be as horizontally scalable as Google is wasteful premature optimization.

Notably, Yahoo! runs the largest PostgreSQL installation in the world, and Google and Facebook both continue to use MySQL.

> horizontally scaling without big iron is the way to go.

You can get 32-core machines with 128GB of ram from Dell (a mildly tweaked R910) for $30k these days. Is that big iron? How does its price compare with the amount of developer salary and benefits you'll have to spend to grok a non-relational data store, migrate your data to it, and reimplement the ACID features of a relational store in the code for your app? How many developer-days will you spend maintaining that code and how many developer-nights will you spend triaging a crashed site because of the complexity and likely bugginess of that reimplementation? How many users' feature requests will you have to reject as "too difficult to implement" because you feel the need to scale to Google/Facebook levels despite having only a few million users now and predicted growth which shows you'll never in a million years catch up to them?

> Currently the only way to scale a relational database such as MySQL or Postgre is by sharding and partitioning

It will be years before the vast majority of startups exhaust reasonable, cost-effective options for vertical scaling. The recent fervor for non-relational, horizontally scalable data stores is simply the new way of scratching the intellectually masturbatory premature optimization itch that programmers have had since ENIAC.

For what it's worth, I'm not the only crank who thinks this; Dennis Forbes has argued it much more eloquently and compellingly on his blog, e.g. http://blog.yafla.com/Getting_Real_about_NoSQL_and_the_SQL_P... .


Scaling depends a lot of how much data you got, how much data you generate and how much you plan to grow. Given the size of Basecamp and 37 Signal's future projection I doubt it would be wise to hope that they can scale vertically - - because once you hit the limit you are pretty screwed and need to buy _much_ more expensive hardware or rewrite most of your database related code and do lots of migrations. (And rewriting database related code to support sharding is usually error prone since you can't use joins, foreign keys, need to copy data around etc.)

Do note that I am not saying that small websites should shard or scale horizontally - - but big sites with millions of users and tons of data should not scale vertically (it can't payoff and at some point they'll hit the limit).


> Scaling depends a lot of how much data you got, how much data you generate and how much you plan to grow.

No doubt, but 37Signals shouldn't have a lot of relational data. The bulk of their per-project bytes, it seems likely, is non-relational stuff like attachments.

> Given the size of Basecamp and 37 Signal's future projection I doubt it would be wise to hope that they can scale vertically

Isn't it less wise to pay a cost you don't yet need and may never need to pay? You pay a significant price in development velocity by forgoing a relational database and using a non-relational data store. Certainly any reasonable organization should be able to project when they will actually need to pay that cost.

> big sites with millions of users

Single digit millions of users isn't that big.

> and tons of data should not scale vertically (it can't payoff and at some point they'll hit the limit).

It can certainly pay off if you never actually need to convert to a non-relational data store. The limit is a lot higher than you seem to think: banks and financial institutions process billions of transactions for hundreds of millions of users daily on the same ACID, relational data stores that you're saying a site like BaseCamp will hit the limit of. I don't buy it.


Also, even if there was a good reason for that strange database scheme, they could have simply specified "ON UPDATE CASCADE" on their foreign references, and the database would have taken care of the nasty details. That's what databases are good at. (Or did their ORM prevent them from doing so?)

So this is more an example of "How to make simple tasks difficult" rather than "Even simple tasks have their pitfalls".


It can be if your data structure is a simple tree. But usually it's a complicated, maybe inconsistent, multi-rooted graph. It got that way because of incremental implementation. As the article illustrated, you then have to tease out the pieces.

I once inherited a large enterprise system that had been designed from the ground up to allow that kind of flexibility. The problem was that there were so many levels of abstraction and indirection in the database and the ORM model that performance was abysmal.


From the response;

"We can't use database transactions because performing a big move would slow Basecamp down for everyone. So we have to log the process of each step of the move, and make it so any failure in the move can be rolled back gracefully. That means a move is actually a series of copies and deletions instead of just changing a field for each moved item"


Why is a transaction necessary? Just change the one field project_id in the table todolists to point to the new project. Because the messages point at the todolist_id instead of at the project_id you don't need to modify anything else.


You're imagining files belong to comments, comments belong to items, items belong to lists, lists belong to projects, and changing the list's foreign key re-parents the whole hierarchy.

Now consider that Basecamp allows comments on messages, todo items, and milestones. What does your schema look like now? Add a feature to show a user all recent comments across all of his projects. What does your query look like? How does it perform? Get all of this (not to mention the other features) working at scale.

Maybe you can build something like Basecamp that works at Basecamp's scale without resorting to denormalization, sharding, and/or partitioning. But I doubt it.


I don't see how this is a problem. The todolist messages still point to a todolist instead of at a project. Can you elaborate?


Looking at just comments on items in todo lists: comment's have an item_id, items have a list_id, and lists have a project_id. You want to get all comments across all projects a user is a member of, so you end with a small handful of joins. Remember that these are reasonable large tables. When you consider the fact that comments can appear on nearly anything, you end up with more joins. Are you certain this will stand up under Basecamp's load even before you consider all of the other features you need? This is all before we run into sharding or partitioning as a scaling strategy.


No, but what is the fast schema that you have in mind that makes it hard to move messages? Perhaps fast and easily movable messages are not exclusive.


I missed this reply, but in case you come back to see it: the common approach to this issue is denormalization. That is, instead of needing to join comments to todo items to todo lists to get the project_id, you duplicate the project_id on comments and you can just get all the relevant comments with very few joins. Obviously, this makes moves need to hit the comments table.

That's a small piece of the picture. With partitioning, a move could entail removing data from one table and inserting it into another. With sharding, a move could mean removing the data from one database and inserting it into another, which probably means manually ensuring consistency because your typical transaction won't span databases.


What's the point of having ACID when you don't use its advantages, one might wonder.


(in reply to all threads stemming from your comment when I posted this)

What's all this talk about "re-implementing" and "if" they have transactions in their DB? They explicitly state that they can't use transactions because it would slow things down, not because they don't have them. They also state that Since moving one milestone could potentially result in hundreds of database operations...

This means they have a transaction supporting database. There's no re-implementing onto a different system. There's an apparently-broken implementation.


So your solution is to reimplement Basecamp on top of another database?


From a couple of web searches it sounds like they're on MySQL. Targeting a database that has working transactions has to be less work than implementing transactions yourself from scratch, which it seems everyone stuck on MySQL eventually has to do (that's what they're describing, and I know we've done similar things).


Depends what MySQL table types they are using. I think MyISAM has no transactional ability.


I'd argue any database on modern hardware where you can't do a single transaction on a few hundred rows without noticeably affecting performance does not have "working transactions".


Rails has always been very MySQL-centric, probably because 37S & DHH use it for Basecamp. Perhaps now that PostgreSQL v9.0 has built in master/slave replication they will finally make the switch.


Duh


Well you're ignoring possible disadvantages, like locking an inordinate number of rows in several tables while a long-running copy takes place. Transactions are all well and good, until they run counter to the usability and stability of your app.


How amusing. Absolutely not an issue in any grown-up database that implements row versioning/MVCC a la Postgres or Oracle. The sad thing is he (and most MySQL users) probably believe this is inherent to all RDBMSs.


I mean this as a genuine question since I don't use it, but: MySQL's engine locks the whole table when a transaction touches it?


The default MyISAM engine does. The behavior of InnoDB is similar to that of SQL Server: a row lock is a data structure held in memory, seperate from the row data. It's computationally expensive to lock this way. However SQL Server escalates row locks into page locks if it thinks it will help (e.g. "lock this row, and the next one, and the next one..." is translated on the fly to "lock this entire page" where a page is an on-disk allocation containing many rows) so now we have a few page locks to manage rather than many row locks. MySQL can't so it struggles when you need to lock many rows at once. Oracle manages this by keeping a row's lock status in-line (e.g. in the block buffer cache) so there is no additional overhead per-lock.

I see this all the time; developers who have "grown up" in an environment where locks and cursors are expensive pick up some odd habits that don't translate well when they code in an environment (such as Oracle) where locks are cheap and cursors are free.


Might be trickier to do this if the foreign key is across a shard or if the db record for the comment wasn't structured like this. One situation could be because they store the threaded comments with the thread key in each record and also the parent comment key. This would be useful to pull down all the comments associated with a thread in one db query but also allow you to have the nested comments.


37signals don't use sharding. They use one database server with enough RAM to load the db into memory


They do that for everything but Basecamp[1]. I wonder what they use for Basecamp.

[1] Second paragraph: http://37signals.com/svn/posts/2479-nuts-bolts-database-serv...


One database server for the whole of basecamp?

Edit: I found this:

> With that in mind, we went looking for an option to host the Basecamp database, which is becoming a monster. As of this writing, the database is 325GB and handles several thousand queries per second at peak times.

325GB RAM?! But now they have multiple servers. Read more: http://37signals.com/svn/posts/2479-nuts-bolts-database-serv...


HP sells servers with up to 64 cores and 2 TB of memory. Considering the car that DHH just bought himself, they should be able to buy/lease several of them.


Are these x86 servers? Or do you mean a single chip with 64 cores?




You can get servers that can house 512GB or 1TB of RAM from HP/Sun/Dell. Scaling out your app servers and scaling up your DB is common in systems that don't need to be truly "web scale".


That might - might - solve the first 4 points, but certainly nothing after that.


I also agree with the sentiment that nothing is ever as simple as it seems. But, yeah, isn't this what relational databases are for?


Maybe we're seeing a downside of the YAGNI, shoot from the hip, do the simplest thing design philosophy.


agree. most of the described changes seem to follow from a bad db design.

lets see:

- Moving a message needs to move all of the message's comments, and all of the comments' files, and all of the comments' files' versions.

Not really. comment should only by 'tied' to message, i.e. there should be comments.messages_id command, so moving massage somewhere else shouldn't require moving comments (unless you fucked up the db schema to begin with)

- Moving any file needs to move its thumbnail, too, if it's an image.

No need to move the file to begin with, see above.

- Moving a milestone needs to move any associated to-do list and messages. And of course those to-do lists can have to-do items with comments, and attached files, and multiple versions of those files.

This is probably the one that indeed needs special treatment even in reasonable designed db. There are ways to skip this if you anticipate the need at the very beginning but it will require some convoluted db hacks to allow O(1) simple move (i.e. make everything including project just an item and just have parent_id in each item, then have messages to be children of the milestone. but this will have other problems, performance and complexity etc. so I wouldn't go this way)

- Moving a to-do list whose to-do items have associated time tracking entries needs to move those time entries to the destination project too.

Not really. time tracking entries should not have project_id in it. just todo_id

- Moving a message or file needs to re-create its category in the destination project if it doesn't already exist.

Indeed.

- If a moved file is backed up on S3 we need to rename it there. If it's not, we need to make sure it doesn't get backed up with its old filename.

why the file path on S3 has project id in it. attachments.id would suffice to uniquely identify stuff. its not like you need to invent multilevel directories on S3 like you do on a file system.

- When someone posts a message to the wrong project, then moves it to the right place, we need to make sure that everyone who received an email notification from the original message can still reply to the message via email.

If the email link just has the message_id and not the project_id it solves itself. You might need to play with some ACL though.

- Similarly we need to make sure that when you follow a URL in an email notification for a moved message, comment or files, you are redirected to its new location.

Dont' include project_id in the url to begin with. Use 'flat' routes.

- Since moving one milestone could potentially result in hundreds of database operations, we need to perform the move asynchronously. This means storing information about the move, pushing it into a queue, and processing it with a pool of background workers.

Not really. Even moving a milestone should only require 2 transactions. move the milestone, and move all milestone's children. Thats might be a non-trivial UPDATE sql operation but its not hundreds of queries.

- We also have to build a new UI for displaying the progress of a move. It needs to poll the Basecamp servers periodically in the background to check to see if the move is done yet, and take you to the right place afterwards.

Probably right.

- We can't use database transactions because performing a big move would slow Basecamp down for everyone. So we have to log the process of each step of the move, and make it so any failure in the move can be rolled back gracefully. That means a move is actually a series of copies and deletions instead of just changing a field for each moved item.

If you eliminate most of the complications above then you can use transactions.


The comments here remind me of an old Irish joke where a hopelessly lost tourist asks an old man by the side of the road "Can you tell me how to get to Dublin?". After a few minutes thinking, the man replies "Well, you don't want to start from here".

I doubt 37signals wanted to be in a place where an apparently simple change would involve so much work, but that's where they found themselves. They did what they had to do. There's no point snarking about their starting place without knowing how and why they got there.


Is this an old Irish joke? I thought I knew for a fact that it was French!

- "Pouvez-vous me dire comment aller à Paris"? - "Ben si j'étais vous je ne partirais pas d'ici!"

Very French in its non-helpful but matter-of-fact way... but maybe the Irish are the same!


Your French version feels authentic to me, but then so does the Irish one. I doubt one can say where jokes like these originate. All cultures and languages probably have versions of them.


The way I heard the joke in Vermont always involved a tourist asking an old farmer for directions and getting the answer "Well, you can't get there from here."


I frequently quote the way Joel phrased this:

If you've spent more than 20 minutes of your life writing code, you've probably discovered a good rule of thumb by now: nothing is as simple as it seems.

http://www.joelonsoftware.com/articles/NothingIsSimple.html


Sounds like they're in need of some hefty normalization. This is the sort of thing that should be handled by changing a single foreign key on each todo / on a single todo list entry; everything else should be pointing to each individual entity, and need no modification.

Unless they're denormalized for optimization purposes? Threaded comments can result in some massively deep queries.


> Unless they're denormalized for optimization purposes?

Bingo?


Though not directly related to this post, the sentence 'well ... how hard can it be?' is a signature quote from the lead Top Gear presenter.

This almost invariably leads into a challenge that seems easy in theory but the practical execution is always plagued with unforeseen hurdles, crap, unexpected mishaps and other random elements.

It's not that hard to draw parallels between software development and the crazy Top Gear challenges, which is probably a reason that many coders who've never been behind a wheel enjoy watching them.


This would be easy in a graph db. Just move the node and its children come along for the ride.


Not exactly. If you look a bit closer, you'll see that this isn't just a strict child, otherwise they might have managed by swapping a foreign key or something. In a graph db, you'd still have to reconnect dangling edges that no longer make sense, like for comments and users. Also, the transaction rollback issue. Which is to say: tanstaafl.


Doesn't need to be a strict child when using a graph (not to confuse with hierarchical) database. Not sure what you mean by dangling edges, at least the Neo4j graphdb guarantees that edges are always connected at both ends. It's also ACID compliant so rollbacks are supported. Sure, tanstaafl, but when dealing with data in a graph structure there's much better tools than RDBMS nowadays!


The comments and user nodes are moved with the parent. It's like moving a folder with subfolders in the filesystem.



if I were to sum up the counter-reaction to a lot of folks here who think that the claimed difficulty is too high:

You're right, under the right conditions. Namely: small site, early in lifecycle, and a perfectly normalized and non-sharded/non-partitioned database.

But I suspect it's more complicated because:

1. they have a ton of traffic

2. they want to maintain a seamlessly perfect UX everywhere

3. their database is denormalized, sharded and/or partitioned

plus possibly:

4. it's gotten at least a little crufty with age (shrug, it happens)


Great post but note that the total complexity/LOE of adding that feature would be significantly reduced if (1) it was delivered earlier in the app's lifecycle (less users, simpler code, lower expectations, less/no customers, etc), and/or (2) quality/UX standards were relaxed somewhat.

I agree that even a seemingly simple feature can be hard to deliver, but on the flip side there are dials a developer can turn to adjust his LOE up or down as desired. Tradeoffs as always.


> quality/UX standards were relaxed somewhat

Yow.


why "yow"?


Quality and UX appear to be what 37s is selling; if they skimped on that there'd be nothing left.


In the same vein, I highly recommend the article "How many Microsoft employees does it take to change a lightbulb?". This was features in Joel Spolsky's "The Best Software Writing 1" (http://www.amazon.com/gp/product/1590595009?ie=UTF8&tag=...).

Link to article: http://blogs.msdn.com/b/ericlippert/archive/2003/10/28/53298...


Nice affiliate link. ;-)




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: