

SQL Server 2005 vs. MySQL for a startup? - tejus

We are wondering whether to use SQL Server 2005 or MySQL as our database. The database will store the following information (for stocks &#38; bonds):<p>1. Market activity information – price, volume, open, high, low, close for various instruments.
2. Fundamental information – balance sheet line items, income statement line items, news stories etc.
3. Technical indicator values – commonly used moving averages, RSIs etc.<p>Typical use scenario will be that a browser will request the following information:<p>1. Data for an stock/bond
2. Data for a certain day/time (across a bunch of stocks).<p>The predominant concern here is : SPEED. The application should very quickly be able to handle requests and spit out the necessary data to a .NET client.<p>Do you have any idea what might be the fastest solution in this situation? Can you ask your coders for any suggestions?<p>I am tempted to go with MySQL since it is free vs. high licensing costs for SQL Server.<p>The main reason I am considering SQL Server is because of a notion in my head that if it's a .NET/Windows application, SQL Server may be faster.<p>Thanks!
======
JFred
Postgresql is free, it is less encumbered, possibly, than MySQL in the long
run. Postgresql is more totally Open Source, whereas MySQL is owned by Sun
Microsystems. And it is not free for all commercial cases. Postgresql is.

MySql gets its speed rep from the 'MyIsam' table type. But you don't want that
for financial data. You want a stronger database. They have another several
table types that are more robust and standardized, but not as fast.

Postgresql is plenty fast for large installations when well tuned. And it has
good referential integrity, a long-enough history so you can trust it, and
stored procedures. MySql is simpler to get started with.

I've never heard that SQL Server was faster, but I'd estimate they have very
nice integration with .NET, as it's the same company. This nice integration
will undoubtedly lock you in to using their database forever, though. And who
knows how their price will change?

Whichever you choose, you will gain if you have at least one person around who
has used the DB before and (for MySQL) table-type before. Even once in a
while.

Paid consultants are available for any database you choose.

------
iamelgringo
I tend to be fairly techagnostic. So, even though I tend to prefer a Unixy/OSS
environment for web apps. I researched MS offerings for a startup idea last
summer. I'd say that if you're going with a .NET stack, you're probably better
off going with MSFT if you can afford it.

Have you considered an Action Pack subscription? If not, you might want to
check in to it:
[https://partner.microsoft.com/Indonesia/program/managemember...](https://partner.microsoft.com/Indonesia/program/managemembership/actionpack/mapscontents)
You get a +lot+ of MSFT software for around $300 bucks. Granted, the license
is only for two years, but that should be enough to get a startup off the
ground. If you don't have enough money to buy MSFT licenses after 2 years, you
have bigger problems than licensing costs.

And, as long as you're willing to keep going the MSFT way, there's ways to get
cheap bulk subscriptions to MSFT licenses. You just have to keep up with MSFT
certification process. They actually make it pretty easy to start your own
software company based on the MSFT stack.

I really can't speak to speed of SQL server vs MySQL, however. It sounds like
most of the data in your DB is going to be text based. I'd be a bit more
concerned about speed if you were going to be serving video or a lot of
graphics out of your DB.

One option that you might consider before investing a year or two an a DB, is
downloading a trial version of each RMDB and do some tests.

MSSQL Server 2005 download: <http://msdn2.microsoft.com/en-us/bb498185.aspx>

MySQL 5 download: <http://dev.mysql.com/downloads/mysql/5.0.html#downloads>

Your schema sounds pretty simple. So, design your schema, and run some tests.
Find out how quick your server will run with each DB. I bet if you spent a
week or two running tests like that, the time spent would more than pay for
itself in the long term. Platform and DB choices tend to lock you in for a
long time, and if you hate your platform, those are headaches you'll have for
quite a while.

------
ggruschow
Skip both of them.

Just use memory with a (very frequent) periodic write of the entire dataset in
a file named after the time you saved it. Just write/read a block of memory
(big array) to/from disk in a file named after the relevant time. What's
better is that modifying or accessing this data is normal straightforward
memory accesses - very likely less work than writing SQL queries and very very
likely easier to unit test. So this should not only be faster, it should also
take you less work.

You probably only care about ~10,000 US stocks & bonds, maybe 50 numbers per,
all of which are represented well by a 64-bit double (most probably fine in a
32-bit float... yes even US bond prices - they trade in 32nds - binary works
fine). 10,000 instruments x 50 numbers x 8 bytes = 4M bytes. All 3 dimensions
could increase by an order of magnitude and the answer would probably remain
the same or very similar.

You mentioned two requests, and a primary goal of SPEED.

1\. Data for a stock/bond - it's in memory. Heck, there's a good chance you
can fit the entire current dataset in your CPU cache. You just can't beat that
(well you don't need to - put down that VHDL tutorial).

2\. Data for a certain day/time (across a bunch of stocks): Just read the file
off disk for that day/time. Even if this isn't immediately up to your needs,
normal optimizations should buy you tons of performance here - buy RAM and
disk mirrors. The data should be contiguous on disk, so you should have very
minimal seeking (which is normally the speed killer). You could come up with
something that beats this by pulling out fewer than all the stocks at once,
but your complexity would skyrocket. This will should at least beat most
setups with MySQL or SQL Server 2005 without a metric ton of tuning.

I'm guessing you forgot to mention that you need a time-series for price and
volume information (if only for a graph to throw up alongside some boring
numbers). Consider just handling that separately, especially as all the other
numbers you're interested in don't change anywhere close to as often (and
those that do are based off the price/volume and can be trivially calculated
on the fly). Some headaches to watch out for though: if you try opening 10,000
file handles at the same time, something's likely to get mad at you. Also note
that if you're thinking of opening and closing 10,000 files a lot, you're
likely going to be sending the disk seeking around like crazy updating meta-
data on disk.

------
dfranke
It sounds like your dataset will fit comfortably in memcached. Run that on top
of whatever DBMS you trust and whose SQL dialect you prefer to work with. For
me this means Postgre or Oracle, although MySQL 6 looks promising and I might
have to drop my grudge against them once it's released. SQL Server 2005 is
very buggy and sooner or later it will bite you in terrible ways.

Bear in mind when you take this advice that the DBMSs I dislike the most are
also the ones I have the most experience with, so it's possible that Postgre
and Oracle are just as bad, and I just haven't discovered it yet :-)

Don't store technical indicators in your database; compute them on-the-fly.
Memoize them if necessary, but it's probably better not to since I don't think
you'll be CPU-bound and the added latency will be negligible.

If you're dealing with live quotes, don't store them in your database. Just
keep them in memcached and re-fetch them if your server goes down. Do all your
DB writes at once, at the end of the day after the market closing bell. (Aside
from user-specific data, which obviously needs to be committed immediately.)

------
henning
If speed is the overriding concern, see if you can use Berkeley DB, which has
ACID transactions and blows away all relational databases in speed. It sounds
like your data model may be simple enough for it.

You should also consider a mix of both; you could use BDB as a kind of cache
used for rapid delivery whereas, say, the relational database could be used
for non-time-critical processing and storage or something.

------
prakash
Start with a open-source DB(can't go wrong with Postgre or mysql) -- if you
start hitting a wall, start looking into specific database optimized for
financial data like Kx(<http://kx.com/>)

Do you have any SLA/other constraints on how fast you have to deliver
financial data?

------
kogir
I chose SQL Server 2005 for Loopt, but that's because at the time I made the
choice MySQL didn't meet all my requirements. I should have looked more into
PostgreSQL, but didn't.

In your case, it sounds like MySQL should work just fine. For fast response
times, just partition the data across a lot of cheap servers that can keep it
all in RAM. This approach lends itself to a free database. That said, speed
will be affected much more by what pre-processing you do before inserting the
data (to optimize lookups), and how you design your schema than by your
database vendor choice.

If you go with MySQL or PostgreSQL, be prepared to fix bugs in the ADO
connectors from .Net to the database. I've used PostgreSQL with .Net and
npgsql needs some work. Perhaps MySQL connectors are better.

I'd start with PostgreSQL or MySQL and only switch to SQL Server 2005 if you
run into problems. The switching cost won't be that high if you don't use
stored procedures and stick to using ADO.Net generic functionality.

Microsoft has been a great partner and their licensing model hasn't caused us
any problems, so don't be afraid to go with them if they're the right choice.
You'll take some heat for it though ;)

~~~
dfranke
> If you go with MySQL or PostgreSQL, be prepared to fix bugs in the ADO
> connectors from .Net to the database.

Likewise for SQL Server and ODBC, but then the bugs are in SQL Server so you
can't fix them. However, .NET has a perfectly good ODBC client library. Use
that, not ADO, to talk to MySQL.

~~~
kogir
If you find a legitimate bug in SQL Server, you can open a support ticket, and
Microsoft will fix it. Frequently these fixes will make it into the next
service pack, or if they're severe, a Windows update. This is all at no
(extra) cost to you.

In speed tests I've seen using ODBC be as much as 10x slower than npgsql in
talking to PostgreSQL. I didn't look into why, and just used ADO.Net.

~~~
dfranke
I don't think I ever encountered a bug for which I'd be able to write a
coherent support ticket. They tended to be horrendous mandelbugs that were
difficult to reliably reproduce due to caching. If it were an open source
project then I could have at least set some watchpoints and sent a useful
backtrace, but with closed source, SOL.

------
vanekl
The raw performance between Sql Svr and Mysql are pretty much a wash. If you
primarily base your decision on this, you are overlooking quite a few more
important issues. Hardware (Raid level, Ram, CPU), DB driver, programming
language, and caching are all more important issues.

Non-speed issues: 1\. Sql Svr 2000 had locking problems such that many
transactions deadlock and automatically get rolled back. This was due to page
locking. I think MS fixed this in 2005 with record-level locking. 2\. Sql Svr
has very nice stored procedure support. This can be extremely useful. If you
think you may need this capability, then Sql Svr is the better choice.

If you do decide to go with Sql Svr, you will probably end up having to get a
copy of VS 2008 and do some C# programming. It's almost impossible to maintain
Sql Svr w/o it, unless you want to maintain the db by hand every day.

It doesn't sound like you have much db experience. Get a couple of experienced
db admins in to talk with you before you make any big decisions, even if it's
just a two hour consult.

------
gscott
Speed I have found is more dependant upon the sql query you are using and once
you have the data the method you use to display it. If you have highly
optimized queries and you are using Array's or GetString (not sure what they
call those in .net) then you will be in good shape.

For $42 from Amazon you can get the SQL Server 2005 developer version, which
is the "enterprise" version, but just for development to get you started
[http://www.amazon.com/Microsoft-SQL-Server-
Developer-2005/dp...](http://www.amazon.com/Microsoft-SQL-Server-
Developer-2005/dp/B000BHQ5JW/ref=pd_bbs_sr_1?ie=UTF8&s=software&qid=1203198012&sr=8-1)

Personally, I am sticking with MSSQL 2000 because I know and already own it.
You can get that on eBay for about $800.

I hate that I tend to get out of date on Microsoft stuff, I tend to stick with
what I have and not experiment once something is working the way I want it. I
did buy the MSSQL 2005 developer because the price is right, one day I will
try it out.

------
carpal
I'd go with MySQL. There are better tools, it is less expensive and easier to
scale. I don't have any straight-up "speed" evidence other than what is
anecdotal.

If you're worried about the MySQL license, you should try PostgreSQL. I've
been using it for three years and have had nothing but extremely positive
results. Version 8.3 is stupidly fast.

------
bayareaguy
_1\. Data for an stock/bond 2. Data for a certain day/time (across a bunch of
stocks). The predominant concern here is : SPEED._

I'm a big fan of PostgreSQL but based on your description you might want to
look into one of the existing kdb solutions from www.kx.com. Those guys have
been solving stuff in that domain for the last two decades.

------
watmough
Go with what you have experience in. If you build it and it is not fast
enough, then throw caching at it. There probably won't be a huge difference
between most databases, if used effectively.

If you have special requirements, such as true isolation in transactions, be
aware that this is "leading edge" for SQL Server 2005, and probably for MySQL.
On the other hand, Oracle has been doing it for years.

If the data is being streamed in, and timestamped, and stored, you may want to
just hold it in memory (32 Gigs holds a lot of data) or write to flat files.
For simple data types, that are used in a predictable way, not much beats
well-written custom code.

------
DarrenStuart
I would go for sql server 2005/8 in a .net enviroment.

I have never had any speed problems. Also if you are going to use asp.net
membership stuff it makes sense to go with it. There is a mySQL membership
provider but its not great from my playing.

Also the cost is not going to be that much because you can do an aweful lot of
processing with one MS database server.

I am sure with some googling you can find some bench marking of all major dbs.

I use discountasp.net for my .net apps as a starting point.

One last thing, finding someone to work on your MS sql database might be
easier.

------
mmmurf
Have you thought of trying Amazon's simpledb? I just to my beta login and it
seems pretty awesome so far.

what about couchdb? Still young but can be clustered and will probably way
outperform any relational db for the stuff you're describing.

Last time I did benchmarks (a few years ago) mssql totally blew away mysql and
postgres on windows. Turns out this was b/c I was using a "dependent subquery"
which in my opinion is a lot simpler way to write a lot of queries.

Now that I use mysql (and linux and solaris) exclusively I don't use dependent
subqueries at all and all is well.

------
j1o1h1n
The financial industry does not use databases for this kind of thing much -
for keeping records, and back office trade post-processing, yes, but not for
market data. Try wikipedia ~ Event Stream Processing, Complex Event
Processing, TimesTen (now inside Oracle), Streambase... all very expensive...
And if you don't need this kind of speed, then probably all of your listed
databases are fine, none has a significant (x10) lead over the others. Work
with what matches your skillset.

------
darklighter3
The actual design and implementation of your data model and application are
going to be far more important for performance. My suggestion would be to
start on MySQL and migrate to another platform if it becomes necessary. You
also have to ask what kind of skill sets do you have around and what are you
likely to have in the future. A well configured and maintained installation of
either platform will out peform a bad one. So are you windows geeks or linux
geeks?

------
xirium
MySQL Server is cheaper and faster. However, MySQL Server is technically only
free if you use it with GPL2 applications or never distribute your app.

~~~
hhm
I was suspicious, but I read MySQL's page on licensing and that seems to be
true. I'm quite shocked about it, I thought you were free to use MySQL in non-
open products without having to buy their commercial license.

~~~
immad
you can use it all you want. Just don't distribute it. e.g you are prohibited
to take it and make tweaks and release your own DB software...

------
eusman
I would argue assuming its stock trading, your concern should be transactions
safety so you need a database with ACID support. Beyond that, which you choose
is based on your experience.

Speed is really not a concern anymore because you can use memcached or other
distributed cached system when you really need that much scalability and fast
access to raw data.

------
ubudesign
mysql on linux is better then its windows version. and the windows version I
belive has a licensing cost. I guess they call it mysql-max and mysql-nt maybe
this was the older versions..

if you are going with .net then your server I guess is a windows server so go
with MS SQL..

I would not go with.NET Unless other things depended on it and there was gun
pointed at my head :)

------
sohail
Neither, pgsql

------
ken
Do you have any evidence that the database is going to be your bottleneck, at
this point in the game?

