Hacker News new | past | comments | ask | show | jobs | submit login
TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data (highscalability.com)
73 points by svx on June 27, 2011 | hide | past | favorite | 23 comments



"You own your code and its effects - you design, you test, you code, you monitor. If you break something, you fix it."

I couldn't agree more. The more I work with code and people, the more I realize the importance of this statement. Having separate roles for "Architect", "Programmer", "Tester" and "Support" is excellent if you want to introduce bureaucracy and process into the work place and consider programmers as just "cogs in a machine". It's also excellent if you want to see how work can grind to a halt.

Let people own their work and they will only produce things that they are proud of.


The other thing it introduces though is "not my code". There used to be a guy we used to call the "Teflon Don". You couldn't get a bug to stick to him. He could always find a way to move a bug to someone else -- even if it came back, he would spend more time trying to figure out how to pass it on to someone else, rather than just fix it.

At the end of the day hire good ppl. They'll do the right thing most of the time. Hire the wrong people and you'll find that your incentive structure will always lead to deviant behavior.


(disclosure - I am the author of this article)

"At the end of the day hire good ppl. They'll do the right thing most of the time." - absolutely agree.

No system is perfect, ours is not. You need to decide what is important to your organization and emphasize it - they way we do things here at TA has its downsides, but overall, it works very well for us and what we want to do. The benefits for us significantly outweigh the drawbacks.

It is critical to have a boss who understands how software development works, and that mistakes get made. If anything, my boss wants my team to make more mistakes than we do :-)


But how do you find good engineers who are also good graphic designers? That seems to be the sticking point to me.


I think that's a slight misunderstanding of the article. When Andy talks about 'design' he means the engineering part - design of the code, architecture etc. We have a separate team of graphic designers who do just that and aren't engineers.

Disclosure: I'm an engineer at TripAdvisor.


amen


Really good article.

One point I do disagree with is this:

It is better to deliver 20 projects with 10 bugs and miss 5 projects by two days than to deliver 10 projects that are all perfect and on time.

You never actually have such a choice, but if one did exist I'd pick 10 perfect on-time projects. The problem is I don't know what the 10 bugs are. While I'd like them to all be typos in the footer of the contacts page, they could also be data corruption or a password leak.

If I had ppl that delivered on time with no bugs, but at half the speed, it then becomes my job to do a good prioritizing the projects such that the half we do are the right half. I think the constraint of being half as fast might even force us to make better choices.


For any project, you're always balancing quality, schedule, budget, scope and risk to try to get the best business outcome. Because of this, I don't believe in any of the hard and fast rules about the "right" way to code. It's far too situational.

That said, you make an excellent point about the importance of project prioritization. A management team who is good at identifying high ROI projects will stomp all over one who sprays pointless change requests at their dev team.


(disclosure - I am the author of this article) I think it is a matter of judgment. Bugs that bring the site down,I agree. Minor bugs that most people will not notice on a consumer site where we can quickly patch them, I would rather get more projects out. Where you land in between these positions depends on a lot of things.


I've helped my grandmother manage her business listing on there. It has brought many happy costumers and for at least 4 months a year we have a full booking of all spaces. However, the product is poorly done. It is hard to navigate and there are no tools to share accounts or have multiple listings within one account. I've also encountered numerous technical errors and limitations that make me think the product might not be secure. It feels like the more recent features are sitting on an outdated and overly complex codebase (just a guess).

So if any TripAdvisor employees are listening - thank you for a product that has worked well for us. But if you can improve it I am sure you could bring in more costumers on both ends (managers and travelers).


Hi jokull,

Would like to understand more, is there a way to contact you ?

Andy


jokull@solberg.is


Some reflections:

Culture

  - The "all engineering is organized by business function" is a business trick i've seen old dot-coms implement. When you have a business function that goes from making you money to costing you, just fire everyone in that business function and sell it off. You quickly get rid of what's dragging you down and continue your profitable business functions. However, this also creates duplication of some jobs, waste, and an insidious bureaucracy where teams are fighting each other. (I could have read that section wrong, but that's what it reminded me of)

  - Engineering Swaps: Really? You just uproot people so they have to re-learn how something else works and take time away from getting stuff done? Couldn't you just do a couple hour-long knowledge sharing sessions between developers?
Random thoughts

- Don't design too far ahead - doesn't fighting fires and coming up on sudden, unexpected deadline changes due to lack of foresight kind of drag on your employees? You can't keep overworked code monkeys forever. It's one thing to code "quick and dirty" to get a job done. It's another thing not to plan for the future.

- Put end to end responsibility on a single engineer - so when that guy's on vacation nobody knows how to fix the thing he owns that broke?

- Process - Doesn't say if you require change control, but some sort of change alert/control system is really useful to tell people when you're changing something which may affect others, so they can immediately see what could have affected the broken thing and call the person who could have broken it.


(disclosure - I am the author of this article) I think this comes down to judgment.

Design appropriately. Short sighted designs are as bad as ivory tower designs. Every situation is different.

If a person does not own something, no one does - everything comes down to the individual. This does not mean that there is no overlap in knowledge.

Source control is a must. So are scripts that detect changes in your area. So are tests.


"It is far better to do two queries (get the set of reviews with their member ids, then get all of the member from this set of ids and merge it at the app level) than do a join."

Is this way really faster? We've been moving the opposite direction in a Rails app (from iterating over data in ruby to joins). We have a fast RDS instance that seems to far outperform our app running on Heroku for complex data manipulation.


(disclosure - I am the author of this article) I am not sure about "faster", but it is more scalable and more flexible - especially considering that the datasets to be "joined" are small, page-sized datasets. We have a lot of different content types, with a central member database. Not having to have all of our content on one machine and not having the memory demand of doing joins allows us to scale more easily. This is not to say we "never" do joins (our member database has multiple tables), it is a matter of find the most appropriate modularity.


If they are joining across partitions, then perhaps. If they are forced to localize data to make it joinable, whereas doing the "join" at the app level makes it more horizontally scalable, then that might be an impact.

However against a single instance if you can join in your app faster than on the database for a trivial join, something is seriously wrong with your implementation. I have never, ever seen such a case where it wasn't a scenario where they should have analyzed their plan, to discover a monstrous issue they need to resolve.

In many high performance database systems the IPC to the database level is actually the most expensive operation. Doing two calls instead of one is always a net negative unless you're doing something wrong or fit isolated horizonal scaling scenarios.


(disclosure, I am the author of this post)

Hi hn_decay,

Our use case goes like this: a member database of over a 100M records, and a number of content databases each with tens or hundreds of millions of records (reviews, video, lists, wiki, this, that , the other thing, one content table with over a billion records, where all the content records had a member id. Our primary usage pattern is to grab a set of content records (say 10-200 at a time) with their member information.

Putting everything in one database and doing the join there does not scale for us, and severely reduces flexibility. We would need to continue to scale up our hardware to handle the sum of the content sets, and new content sets are being created on a regular basis. By putting these all into different databases you then have the choice (not the necessity) of keeping them on one or more machines. You can put on one machine a bunch of content sets that are relatively small, and put the big ones on their own machine. You can also scale the hardware to individual content sets - infrequently accessed content sets do not have to be on powerful machines, very frequently accessed sets can be scaled on bigger machines.

There are downsides, the two-query hit being the least significant, the extra query on a tuned database is on order of 1ms. Even if the hit was larger, I would still live it, scalability != performance

Andy


I was replying to the context of the post, and specifically spoke to horizontal scalability so I am confused that you felt it appropriate to "correct" that.

Having said that, hundreds of millions of records equals a small dataset. I still don't understand when that's held as some sort of edge case when it's easily accommodated on commodity low-end hardware.


I find myself surfing to tripadvisor often when travelling. They would have the potential to build a airbnb right in there..


I think they have.. AirBnB like sites have existed for a while: http://www.flipkey.com/?utm_source=ta&utm_medium=foot...


Full Disclosure: I work at TripAdvisor.

Andy's assessment is accurate and is part of the reason I really, really like it here. The other part is the folks I get to work with.

If you're interested in joining us, drop me a line. Info in my HN profile.


Why does your site use behind-the-window popups even though I have popup blocking explicitly enabled in my browser? If I were hired at TripAdvisor could my first project be to get rid of such scumbag behavior?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: