Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[dead]
on Nov 13, 2008 | hide | past | favorite


IMHO you would be much better off just finding someone to deal with all this. A contractor or a partner.

[Edit: read someone experienced on this]


I need to understand the technical requirements first, but thanks for the suggestion.


You already posted this and you are making comments from what looks like your irc logs or comments from the previous post.

http://news.ycombinator.com/item?id=361730


This is a "follow-up thread" to the link you posted. I already explained this in the first post in this thread and also in the previous thread.

Look, there are lots of sub-topics to deal with in this project, and the previous thread was totally messed up and disorganized -- meaning I couldn't follow any of the sub-topics effectively. That's why I re-posted some of the comments in separate posts in this thread ... so I can continue to discuss them in a clearer and easier to follow manner.

Is there something wrong with doing this that I'm not aware of?


I think you are saturating the channel. Why didn't you do one thing at a time, maybe one a day?

Think you are getting here responses that would cost at least a few thousand dollars with a decent contractor.

I've never seen posts in this form on HN, at least in the last few months. As a new user I try to understand the micro-culture and etiquette of the group before posting. Not just go and do things the way I think it should be.

Just a thought.


Saturating the channel? Hmmm, I'm not sure what this means. If I'm not mistaken this is a forum where people try to help each other and try to learn and understand new things. If people don't care about this topic they certainly won't respond or even look at it, so I hardly see where I can be 'saturating the channel' as you claim I am.

Unfortunately it seems you would rather complain and find fault than to contribute something useful. Your suggestion that I go somewhere else and pay someone with experience to 'deal with all this' won't work when I need to gain at least a general understanding of how all these issues relate to each other first!

It would be much nicer if you would stop trying to derail my attempts to learn something here, and just "remain silent and observe" if you don't have anything useful or constructive to offer.

But complaining about the way I've separated sub-topics by posting them separately in this thread, or about the way I've quoted comments from the previous thread so people don't have to go back and review then in the old thread any more ... well it just seems like you're trying to find fault with my efforts any way you can -- and it strikes me as precisely the wrong thing to do in a forum like HN, that's all.

:(


Business Detail #1: IT'S A LIVE ONLINE AUCTION

I'm not at liberty to discuss the industries or specific applications in which this platform will be used so please don't ask. But I can tell you that the platform is structured like a live online auction with some unusual / unique requirements:

Each auction will run for about an hour, beginning and ending at a pre-specified time. In the specific industry where we will launch this platform we can absolutely limit the number of bidders in every auction to 200,000 concurrent bidders, and we expect only about 100,000 or so bidders to be concurrently logged in and bidding.


What jurisdiction will the auction be under - i.e. In most countries auctions are governed by various laws.

I worked at an online, realtime auction startup in the UK in '99 (they folded) - and they encountered a whole raft of legal challenges.

If so, this might influence the design. You might be in a legal predicament if (a) the system coredumps or (b) someone sues and you don't have a log of each bid or (c) both.

This might be a challenge for your in-memory approach. You might need to keep a redo log that can be used to replay/verify the auction after the fact.

EDIT: ps. The startup I was at used this approach. In memory, with redo logs. The client was a Java Applet - JavaScript wasn't an option in those days.

Is it real money changing hands? If so you need to be concerned about security. e.g. You might need SSL or another cryptography technique. HTTPS should be fine with the JavaScript approach, but it will be a big whallop on your server requirements... You might want to consider an edge device that has SSL acceleration.


Legal and financial issues are outside the topic of this thread, but I will say this:

If the auction proceeds to completion the RAM data is written to disk. This file becomes the 'auction log'. If the auction fails to complete because of technical issues we simply purge the data from RAM, re-schedule the auction and try again later.

There's no need to write data to disk during the auction itself because of this 'try again' policy if anything screws up. I also think this eliminates the need for a separate re-do log since each bid can be time-stamped and stored in the auction log that's written after the auction ends -- but this brings up a question:

Can timestamps be created to the nearest 1/100 of a second?


Fair enough - just something to be careful of.

In the industry/jurisdiction I was working in, this was an issue - the view was that there was a liability if the system crashed. Perhaps in your case this doesn't apply or doesn't matter (if they are low cost items, you might just give them away for example).

Anyway. In our case a bid didn't apply until the redo log had been written. So it was pseudo-transactional in that sense.


> Can timestamps be created to the nearest 1/100 of a second?

Sure - but you need to be careful of the hardware and operating system involved. Windows tends to give lower accuracy - but there are builds of Linux that are the same.


Thanks jwilliams, I'll more than likely be using a high performance Linux system so hopefully it will give me the precision I need.


I'll more-or-less repeat my advice from the other thread..

Keep the current highest bid as well as timestamp and owner of the bid in a very simple data structure in RAM. In another section of RAM, keep a datastructure for each user, keeping track of bids, and the current number of bids for the auction and the current bid.

When a new bid comes in, first lookup the user's record, and check if the bid is legal, and then pass it on to compare with the current highest bid. This is your critical section. if the bid is higher than the current, update, otherwise reject.


thanks mseebach, I don't think your concept will work (and I've been known to be wrong before) ... but at least I understand how you're approaching the problem, and that helps me to get a better grasp of how I might have to go about it.

It seems to me that I'll have the fastest performance by storing all the bid values in a single 'column' (in RAM of course) to make it faster to find a new unique high bid when someone posts a bid that matches the existing unique high bid. So here's my current concept of how to structure the RAM data:

Create a RAM-based 'table' with only two 'columns', bidValue and bidStatus. The bidStatus column will store a single character such as 'D' for a duplicate bid, or 'U' for a unique bid, or 'H' for the unique high bid. Then go through this procedure when someone posts a new bid:

- See if his bid matches the 'H' bidValue:

- If so, change the bidStatus from 'H' to 'D' then identify the new unique high bid and change that bidStatus from 'U' to 'H'.

- If not, and if there is already a matching bidValue in the table, set that bidStatus to 'D'

- If not, and if there's no matching bidValue in the table, append a new row, then:

---------- If the bidValue in the new row is lower than the one in the 'H' row, set the bidStatus to 'U'

---------- If the bidValue in the new row is higher than the one in the 'H' row, set the bidStatus to 'H and change the old 'H' status to 'U'

- Append the new bidValue, timeStamp and bidderID to another RAM-based table designed to store this data for each bidder (to create an archive of the auction).

-- Send the status of the member's new bid back to him so he will see it in his browser.


The exercise is to make your critical region (ie. the section of code that only ONE client can be executing at the same time) as small as possible, since it will be the limiting factor.

If you don't think my solution will work, please explain, in terms of business requirements, why it won't - because I can't see what your alternative solution achieves, except being orders of magnitude slower then mine.

For an example, you want to search for the current highest bidder. Why not always keep the current highest bidder in the same place? Looking in one place costs 1 cycle, looking in thousands of places (searching) costs thousands of cycles.

Write me an email (see my profile) if you want to continue the discussion. You can NDA me if you want to.


You probably should listen to this advice. (Searching through every bid is going to be painful). The highest unique bid complicates things slightly..

All of your bids have 3 possibilities compared to the current highest bid. (lower, same, higher).

LOWER: lower is easy - it's not a winner. Except for this scenario: 1. $10 --> Becomes highest unique 2. $9 --> lower, not a winner 3. $10 --> Duplicate with no.1 no.2 ($9) is now highest unique

SAME: Same is easy - your old highest unique no longer is a win.

HIGHER Higher becomes a win - UNLESS there's already 2 (or more) bids at that value.

Some thoughts - perhaps store three lists and a value? sorted list of duplicates lower than the Highest Unique Value (HUV) sorted list of duplicates higher than the HUV sorted list of uniques Current HUV

Take a new bid. Compare to the HUV: Lower: not a winner. SAME: not a winner - HUV updated to next highest unique. value appended to duplicates list. HIGHER: compare to Higher duplicates list. If not in there - update as HUV - winner.

Behind the scenes - I assume you're logging each bid, details, etc.. You wouldn't have to wait to do this in order to return a result.

Is that understanding your issue?


Thanks for the HUV acronym whatusername, it makes describing things easier. I think you have a pretty good understanding of the system now. I'll reword your description in an effort to clarify, and if this is not the way you understand it please tell me so I can try to make a better explanation.

Every bid has 3 possibilities compared to the current HUV: lower, same, higher.

LOWER = not a winner simply because it's lower

SAME = not a winner because it's a duplicate

HIGHER = additional checking required:

HIGHER and DUPLICATE = not a winner

HIGHER and UNIQUE = winner!

That's basically how to determine the status of a new bid. As far as programming tasks are concerned, the value of each new bid must be added to the list of existing bids if it's not already there. Then the status of the new bid must be updated ... and if this changes the status of the HUV, the new HUV must be determined:

Determining a new HUV is where sorting becomes necessary, but I think this may be the ONLY place where a sort is required.

>>> Behind the scenes - I assume you're logging each bid, details, etc.. You wouldn't have to wait to do this in order to return a result. <<<

Yes, logging each bid is important but previously I failed to 'separate' the logging task from the process described above. If the HTTP server can log the cookie value (bidderID) and the post data (bidValue) itself that's all I may need. On the other hand this assumes a new connection to the HTTP server every time a bid is posted -- and if I use Ajax to pass data back and forth without creating these new connections I will also need a separate app to take care of the logging.


Hi Martin,

One reason I said your solution might not work is because it doesn't appear to store the bids, and I need to store every bid posted, and the time it was received, and who posted it. This can be done separately of course but I wasn't thinking in those terms when I replied to your post.

The other reason I thought your approach wouldn't work is because I thought I understood it, but after reading your follow-up post and whatusername's reply it occurs to me that maybe I didn't "get it". I'll email you so we can continue this discussion privately because I'm very interested in understanding how your proposed system might work in conjunction with the business requirements. I can give you more details via email and don't worry a NDA won't be necessary.


So did you want to e-mail me? I haven't received anything.


Topic: GENERAL PLAN OF ACTION

lallysingh said: "So, for advice:

1. Ignore database bullshit. You don't need it, it won't help. If you want a DB for other purposes, fine. A snapshot process writing to your DB is fine, just don't put it in the critical path.

2. Build a load simulator. A raw mode that just sends over the handful of bytes, and a cooked mode that bothers to printf' a GET request.

3. Start with a reasonable prototype, and work your way to something performant. Hell, you can probably do it in java if you don't mind buying a few more CPU cores.

4. Integrate as you need for the rest of your requirements. For example, have another box serve the rest of your webapp, and dedicate a stripped down apache box with a custom module for this stuff.

In essence, I'm telling you to treat it as a very smallish HPC problem, instead of some sort of nightmare webapp problem. It fits better, and suddenly you have lots of people/knowledge/COTS equipment available to you."

my reply: Thanks for your outline of the best way to approach this project. I don't understand parts of what you said here but I think I get the general idea. Do others have any suggestions to add to this, or to change it?


TOPIC: CUSTOM DATABASE STRUCTURE IN RAM?

ig1 said: "I've worked on a number of high volume systems (million+ client interactions/minute), and you don't want a conventional database. Either use a custom data-structure to keep it in memory (even if it's across multiple machines) or if you really want to use a database use one thats designed for that kind of usage (think tickerplant databases, kx, etc.)"

my reply: Others have suggested similar alternatives to using a traditional database program, and right now I'm tempted to learn more about doing this -- unless someone can tell me why I shouldn't?

By the way, I've never heard of ticker plant db's before but I'll look into them if others think they may provide an even better soution than a custom coded C application. For those of you who have used ticker plans db's, what do you think of their use in my situation?


Tech Issue #4 - BATCH PROCESSING?

I've never done batch processing before, but some have suggested it, and I don't understand how it works just yet. My current concept is that I would receive (for example) 15,000 bids during a one-second period and I would store them in RAM as they are received. Then at the end of this one-second period I would have the software process the batch -- which probably means updating the position of the current unique high bid, determining the status of each of the 15,000 bids in relation to this new position, and creating and delivering 15,000 unique HTTP responses. Is this description totally wrong, or is this somewhat close to the way "batch processing" might work for me?


Tech Issue #3 - KEEP-ALIVE OR BROWSER PLUG-INS?

If a connection can be opened and remain open for the entire hour-long auction this might dramatically reduce the overhead of individual HTTP connections. Is something like this possible with as many as 200,000 bidders? If so, how many (and what kind) of HTTP servers might I need to maintain this many open connections?

If this might require a FireFox plugin on the client end it is theoretically possible to make this a requirement. Obviously I prefer to avoid this requirement but if nothing else works this may be a functional alternative.


Tech Issue #2 - CAN JAVASCRIPT SPEEDS THINGS UP?

If I understand correctly, some have said Javascript in the browser can reduce or eliminate the HTTP overhead and dramatically reduce both data transfer and bandwidth requirements. Is this true?

I don't know how to do this but I welcome a simple explanation that illustrates how it might work. Right now I'm thinking that HTTP headers still need to be sent upon each request and they will use much more bandwidth than the data itself.

Is the ultimate solution to find a way to transfer data without constantly opening and closing HTTP connections?


Tech Issue #1 -- WRITING DATA TO DISK

I've always used a database to accomplish tasks like these before, but it seems that what I really need is a faster (RAM-based) way to receive and store every bid value received during the hour-long auction -- along with the bidder's unique ID so I know who made each bid.

Once the auction has ended I can write this data to disk, but there's no need to write to disk during the auction itself -- because if the system screws up I'll have to re-run the auction anyways, and in this case the old bids can be deleted.


Have you checked out memcachedb? It's a wrapper around the berkleydb that uses the memcahed protocol (so support is already available in your language of choice). They're showing benchmark stats that meet your requirements (> 1MM writes per minute on a single box, obviously more boxes = more writes): http://memcachedb.org/benchmark.html

It's simply key-value pair storage but it sounds like that might be all you need.


Hi smoody,

No I've never investigated memcachedb. Lots of the tools people have been suggesting are new to me. I'm reading and trying to learn as much as I can so I can communicate better with folks that have been doing this level of programming for years, but it takes a while since there's a lot to grasp here, and I have to keep it all "in perspective" relative to the business requirements.

Thanks for your suggestion, I'll keep it in mind and mention it to the engineers I discuss this project with when I'm ready to hire them. When an engineer can explain something to me in simple terms that I can understand -- even when I don't really have the background to understand it -- I know I've found an engineer who has the kind of communication skills that make him exceptionally valuable to me.

:)


Topic: LOAD TESTING DURING DEVELOPMENT

owkaye (that's me) said: "This is not a system I can grow into, it must be capable of this performance from the very beginning."

then ericb said: "I would suggest you load test extensively and make load testing a part of your development process from the get-go. Initially, I would load test to evaluate approaches and estimate hardware needs.

and lallysingh said: "Build a load simulator. A raw mode that just sends over the handful of bytes, and a cooked mode that bothers to printf' a GET request."

my reply: I don't know how to do this (or C coding or javascript coding) personally but thanks for pointing out these technical needs and more. Your suggestions will be put to good use when I have a complete concept of the basic requirements, and I can "fill in the blanks" by hiring programmers and buying equipment as I go along.


Topic: MY EMAIL ADDRESS, TOP OF THE PAGE

lallysingh said: "If you're doing something game-ish, talk to me privately. Scalability of video games is my phd topic."

my reply: It certainly seems like I'm doing something 'game-ish' although it's not likely to be used by the typical gamer. I've posted my email at the top of the page so please send me an email and we can discuss the details privately. Others can email me privately as well to discuss this or other topics, thanks.


Topic: ONE OR MORE SERVERS FOR THE DATA?

ig1 said: "Figure out how to partition your data/algorithms so you can split it across multiple machines..."

but lallysingh said: "A single dell box could handle your load if you wrote the whole thing in C on a modern quad-core desktop."

my reply: If I'm not mistaken you guys have a difference of opinion here regarding the ability of the data to be processed on a single machine. Can you expand on the reasons for your apparent difference of opinions?


Business Detail #2 - LIMITED NUMBER OF BIDS

Every bidder gets the same limited number of bids in a particular auction, for example 200 bids each. This means the server software must "count the bids" for each bidder and disable bidding for every person who uses all his/her bids prior to the end of the auction.


Business Detail #3: ONE MILLION HITS A MINUTE MAX

Each bidder who remains online and participates during the auction may submit as many as 5 new bids per minute, thus requiring the server to "deal with" as many as one million hits a minute.

The auction structure is atypical because the the highest unique bid at auction's end wins, and the bidding range is limited to 0-10% of the item's true market value.


Business Detail #4: CLOSED BIDDING WITH LIVE STATUS REPORTS

Every bid is a "closed bid" which means no other bidders know its actual value. Bidders are given the "status" or "position" of their bids in relation to the current unique high bid, but they never see the values of other people's bids.

I must send the answers to these two questions to each bidder every time he posts a new bid:

-- Is his new bid unique or has someone else posted the same bid value previously?

-- Is his new bid > = < the currently unique high bid?

If his last bid = the currently unique high bid, the bidder is in an enviable position because he is currently winning the auction. Other bidders must bid the same exact amount (to the penny) in order to cause him to lose his position as the unique high bidder. When this happens the unique high bid value must be determined by the software again, and with 100,000+ bidders is is likely that someone else will become the new unique high bidder.

If his last bid is not unique or if it is > or < the currently unique high bid, the bidder must post another bid which he hopes will become the new unique high bid. But he does not know the value of other people's bids, all he knows is the value of his own bids and their position relative to the current unique high bid. He must therefore use intelligence, strategy, and quick thinking (game theory?) to determine the value of his next bid, and the one after that, etc., until he succeeds in becoming the new unique high bidder.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: