- 50mb is >> total cache on most systems you'll use.
- This is far from the only function you need to optimize.
--> The best performing algorithm in a micro-benchmark can be sub-optimal in real use.
- You've got a fixed-for-the-day list of <10k <9 letter unique upper-case strings
- The bulk of them you'll process are <= 4 letters (CMCSA/CMCSK be damned).
- The median is 3 and the mean is <3.6.
- The symbols aren't anywhere close to uniformly active
--> Benchmark with a typical data stream, not a uniformly distributed one.
- You're likely not even considering trading most of them.
- How much better is your algorithm than the obvious binary-search?
- Or worse -- plain'ol front-to-back linear search?
- Are you sure you understand 100% of how the computer works?
--> Yes? Mail a check to Goldman and find a better line of work.
--> No? Test it with a quick benchmark.
- Is binary search cache friendly?
- Why not? Plot out the memory accesses if need be.
- What's wrong with taking a linear-interpolated guess?
- Is that cache friendly? .. or could it be?
- When your guess misses, what's that cost?
- Can you think of a cheap modification to lower that cost?
- Is the distribution really linear?
- What's wrong with a "perfect hash function"?
- Does it need to be perfect?
... is anyone really using hardware to do this, by the way?
Hardware regexing at this scale is practically a COTS part, or was when I stopped doing network security products in '05.
This is where it gets tricky. FPGA based systems don't really give an edge here, and tilera-based solutions are still nascent
... because "general purpose cache" wasn't helpful, but "exploitable locality" definitely was.
In later GPUs they (mainly due to programmer laziness :-) ) switched to a more GPU-like automatically managed general purpose cache. The problem was that in most cases, being able to quickly write code that is reasonably fast was more important to developers than being able to squish out every cycle (though I believe it's still possible to disable the cache and manage it manually...)
The problem with automatic cache is the same as all "intelligent" CPU features, the CPU tries to predict what the program is doing, the programmer has to predict what the CPU will predict to optimize for that CPU. In the next generation of the CPU, the CPU vendor will try to optimize again for programs currently around. In the end, optimizing becomes much more complex.
There's certainly an advantage to keeping the logic "dumb" and simply having the software manage everything.
Personally, I think specialized hardware is a bad idea.
With 5 bits per letter, this means the bulk of stock tickers could be stored in 20 bits, so the data structure could just be one big 128 MB array, if the size of the structure was rounded up from 109 to 128 bytes.
(In 2005, I built a system to do interesting computations on each stock option tick. The volume for the time was pretty high. Each time the IBM stock, for example, moved a penny, this generated 400 quotes. While small by today's volumes, 100k quotes with an OAS calculation with a binomial matrix on each tick was a challenge. So I am not totally unfamiliar with these throughput issues.)
And Python was used, but only on the GUI side, presenting the results. While I didn't measure it, I feel quite confident in saying that Python isn't going to keep up. At the risk of being a premature optimizer.
Before you optimize the stock look-up function, you might want to look ahead a bit. veyron has commented below or elsewhere that a very important calculation is going to be your total holdings on each tick. This is how you will calculate your risk, or at least the fundamental start.
Another thing to consider is your goal in trading. How will you be making money? For example, will you be buying/selling on, say, 10 second intervals and collect profits from that? Or longer intervals? Shorter?
Whatever your overall strategy function is, you will spend some time in that calculation. Will you do it on every tick? This computation time might well at this point be unknown. If it is an interesting calculation, it might well be complex.
So how is a startup going to deal with this firehose of a problem?
Well, consider splitting up the processing. One obvious way to do this is to filter tickers you are interested in in one box and push a much more manageable stream to your main logic box. Perhaps start with one frequently-traded stock, tune your algorithms. This will be a rich learning experience.
Then, watch the CPU percentage. If it is above 60% at 0910 central, maybe start profiling your code. Your time-consuming parts will surprise you.
If it is going well, and you want to expand, consider adding one box for each stock that you plan to trade. You can do it for about $200 diskless, quantity one retail. Perhaps put the quote stream or a divided quote stream on a multicast channel, two NICs per box. Allocate a pre-determined amount of capital for each stock so they don't need to coordinate to give you your risk calculation.
Also, the idea of High-Frequency Trading is kind of a fractal concept. On the low end, you have a trading situation that is slightly more active than a day trader. On the high end, you have boys who rewrite network stacks in windows or linux for their boxes co-located with exchange points in New Jersey, where the cables are measured to the inch so everybody sees the tick at the same time. Those are the 100 nanosecond boys. These are the fellows who order servers by the multiple racks of 30. And they trade in a whole different style--their net hold time might average out to zero, but because of agreements they have with the exchanges, they have to stick with the market when it takes a dive.
So don't try to be them at the start.
In summary, success here will come more from the engineering than the computer science, and not trying to beat the big boys at their game.
It doesn't construct books or anything else sophisticated but it may be of interest to ETL folks.
I had thought that due to the extreme low latency that a hardware solution would be required. So that may be the next place to start experimenting.