I guess you could call it "attempted merger".
The purpose of a breakup fee is to compensate the target for costs associated with the failed acquisition and various other losses or expenses. Mostly however, a breakup fee compensates the target for business oppportunities not pursued while the acquisition was being attempted because of the acquisition.
In cases where the acquirer is a direct competitor to the target, the breakup fee guarantees that the acquirer can't simply look at the target's books and assets, walk away, and use that knowledge to clone the target's business, without paying for it.
Even if the deal might fall through, I could easily imagine Mike taking a day or two to celebrate instead of jumping back in and answering questions, particularly on the instagram-api list, since nurturing a slowly-growing ecosystem is not Instagram's best path to a successful exit. Logically, I think he did it because he's a nice guy.
I also did the same with all the folks who posted technical details on their experience dealing with the AWS outage. Great stuff to hold on to for future reference.
http://news.ycombinator.com/item?id=3306027 (223 points, 220 days ago) so I'd like to say this is a repost. Though, I have no idea how the same link can be reposted to HN.
Guess they figured that one out in less than 221 days.
Can someone who knows more about various DBs opine on this. Is it better to run your own DB as Instagram seems to be doing or is relying on SimpleDB good enough if you don't need such high performance.
Also, as happens with many startups, how easy/difficult is data migration when startups try to scale and need to scale fast.
SimpleDB is a non-relational store that automatically indexes everything, but can only store up to 10GB. It's very flexible but limited, good for prototyping.
DynamoDB is like SimpleDB's grown-up version. At greater cost and with more up-front configuration, it scales automatically to huge workloads. It's still a non-relational store, with the drawbacks that implies.
Relational Database Service is literally managed MySQL instances. Amazon spins them up for you, manages configuration, backups and restoration. One of RDS's primary value-adds is that it can automatically partition your data across multiple EBS volumes (like hard drives). This helps get around the relatively low I/O performance of EBS volumes.
So pick your poison -- if you don't need high performance, simpleDB or a small RDS instance will work; it depends whether you want relational data or not. I can't speak for the difficulty of migrating; we stuck with running our own MySQL instances from the start.
For a lot of reasons (that I can enumerate if you'd like), we think running your own DBs is the best option.
The longer answer is that we don't use RDS because it relies on EBS, and we do not trust EBS for any critical applications. Instead, we put our data on instance storage (aka "ephemeral" storage).
This has two big disadvantages:
a) portability: you can't detach the drive and move it to a new instance like you can with EBS -- to clone or backup, you have to copy over the network, which is much slower (and obviously, if you kill the instance, you lose the data).
b) storage: you are limited in how big your DB can be. An AWS large instance these days gives you nearly 1TB of instance storage, but if you have a single DB larger than that, you need to use EBS if you're on Amazon. (Of course, if you care about performance and your database is > 1TB, you should probably be looking at sharding across multiple machines anyway)
However, using instance storage has two big advantages that we think outweigh those:
a) performance. EBS is basically a network drive. Total I/O operations per second (iops) is punishingly low. If you have a high transaction rate on your database you're going to really hate it. As I mentioned, RDS tries to mitigate this by using multiple EBS drives, but we consider that a band-aid on a pretty fundamental problem with EBS. Instance storage on the other hand is physically local to the VM's host machine, and is therefore much faster.
b) reliability. After 3 years on AWS, our trust in EBS is zero. It fails too often, and its failure pattern is awful: you tend to lose big batches of EBS drives at the same time, and whenever there been a major EBS failure, the API used to launch replacement volumes has failed at the same time, making replacement impossible. Again, we think this is a fundamental problem with the nature of EBS and unlikely to change.
So if your MySQL storage is ephemeral how do you cope with outage? Replicate it off AWS?
Re: outages, we use multiple replicated servers in different availability zones -- an outage is usually (though not always!) limited to a single zone. For a region-wide outage, we have emergency backups being sent to a different AWS region (east -> west), and if shit completely hits the fan we have off-AWS backups.
I know it's very new so I haven't seen any advice on it, where I don't think I've ever seen a pro-EBS point of view from people with non-trivial experience with it.
If I had a very large, rapidly-growing key-value application and a shortage of experienced ops engineers that made maintaining my own solution impractical (e.g. a cassandra cluster) I would look hard at dynamo.
However, as a matter of principle I am very suspicious of the lock-in that comes with proprietary solutions, no matter how clever. We try not to buy cloud services that only have one vendor.
Dedicated server pricing is 1/2 or less of what Amazon offers you, and you get better performance to boot. Seems like a no brainer to me (but then again I've been doing "dev ops" stuff since the late 90s and learned many lessons the hard way).
Mark Mayo's thoughts on abstracted block storage are spot on: http://joyeur.com/2011/04/24/magical-block-store-when-abstra...
- I don't think any of us have compared it to PostgreSQL, but I can tell you we have clients doing 500 queries per second+ with it including text search. I've yet to see a DB do good text-based fuzzy matching and combining two systems (DB + search) via a join is usually slow. YMMV.
- The main goal of the implementation is to add point-based search to text search. It is not a general purpose replacement for an r-tree, etc.
- You are not required to use haversine. The distance function is pluggable and we have other options implemented. Also, in many cases, you have other clauses in your query that restrict down the set of documents that need to be scored by distance.
Does anyone have particular insight to share on this? Last I checked, Solr's geospatial searching methods are rather inefficient -- haversine across all documents, bounding boxes that rely on haversine and Solr4 was looking into geohashes (better but have some serious edge-case problems where they fall apart).
Meanwhile PostgreSQL offers r-tree indexing for spatial queries and is blazing fast.
Am I missing some hidden power about Solr's geospatial lookups that make it faster/better than an r-tree implementation?
Having this exposed through an api that is standardized and maintained by someone else is also nothing to sneeze at. I'd trade a bit of performance for that kind of standardization and turnkey use in the right scenario.
I dunno I can't envision Solr being more efficient than a properly designed RDBMS for these situations. If you were integrating a full-text search I'd absolutely believe that to be the case but...
Also, there's some new thing I don't understand super well, sp-gist, do you have any thoughts on that?
As far as I can tell, you take the point's latitude and longitude and interleave the binary bits - so if your record's latitude is 11111111 and your longitude is 10000000 your geohash is 1110101010101010. You index on that, then when you do a spatial search for the point nearest to 11111110,10000011 you look up key 1110101010101101 and a prefix search finds the closest value in the index is the the record you inserted earlier. Presumably then you realize there could be an even closer record at 11111111,01111111 which would have got stuck at 1011111111111111 in the index so you look there too just in case, take the closer of the two search results, and bob's your mother's brother.
I suppose that might work pretty well.
Can we stop labeling the set of "not a rdbms" data storage mechanisms with the stupid fucking "NoSQL" moniker.