Hacker Newsnew | comments | show | ask | jobs | submit | zachanker's comments login

Real Estate agents post their own listings on Craigslist.

There are companies who provide the fancy HTML templates you see real estate agents use, but Craigslist doesn't allow automatic posting of listings.

-----


I worked at Placester for a couple of years and built the system that imports data from real estate agencies. When I left last year, we had coverage with around 90% of the MLS's in the US. Most of what you say is right, but some clarifications and context:

You don't need to be a brokerage to get access to the MLS feed. Each MLS has their own policies for how you can display the data, what logos, size and text needs to be shown on the page with their listings though. Which means it unrealistic to build Zillow/Trulia site off of MLS data. Placester builds them for individual real estate agents which is significantly easier for keeping the MLS happy.

Some MLS's are great and will give you access to the data without much hassle, others are not and you have to pay a lot of money. Even once you get access, you will get almost no technical help or useful documentation on integrating with them. Since MLS's are almost never related, you still need to talk to 300+ different companies in order to get coverage of the US.

There is a standard that most MLS's follow for their data, which is called RETS [1]. I would say about 80% of MLS's use RETS, the problem with RETS is that it's a standard in the same sense that CSS was a standard 10 years ago. The original library I wrote for RETS [2] is open sourced, and is littered with examples [3], [4] and [5] (to name a few), of inconsistencies across RETS servers.

If you can work through all of that, you're golden. It took us about 1.5-2 years to get the experience of seeing how MLS's work in order to simplify the integration process down to 1-2 days, with RETS typically requiring no (or minimal) engineering work.

[1] http://en.wikipedia.org/wiki/Real_Estate_Transaction_Standar... [2] https://github.com/zanker/ruby-rets [3] https://github.com/zanker/ruby-rets/blob/master/lib/rets/htt... [4] https://github.com/zanker/ruby-rets/blob/master/lib/rets/htt... [5] https://github.com/zanker/ruby-rets/blob/master/lib/rets/htt...

-----


Thanks for all your work on ruby-rets! It works like a champ for pulling in listings from MLSPin and CCIAOR. As you mentioned, dealing with all the "certified" RETS vendors is a nightmare. For the uninitiated (and fortunate), RETS has its own querying language called DMQL which is inconsistent across versions and MLS vendors. Even trivial tasks like importing photos are handled with vast difference across MLS vendors.

Despite all the technical hurdles, given their resources, I would be shocked if Zillow and Trulia DIDN'T import the majority of their Mlsdata via RETS. Most MLS providers allow 3rd party access to the data feeds. There is no way all the listing data is re-entered via agents.

-----


Seconded! RETS is a goddamn nightmare. For years we've used librets and it's the worst. When we found your Ruby-only library we got it working with our feeds within the day and I can't tell you how relieved we are not have to deal with librets compilation.

-----


Hah! Glad to hear some people got some use out of it. It was definitely a huge pain to work through all of that.

-----


That's what errbit is for. You only have to use the Airbrake libraries for reporting the errors, and errbit hosted locally collects/notifies/displays them.

If you already integrated with Airbrake, you just have to reconfigure the host it reports to, to your own errbit server and it'll swap everything over from deploy tracking to error reporting.

Errbit is an active project, https://github.com/errbit/errbit, and error reporting libraries don't tend to need a lot of maintenance.

-----


Since you're only using 1GB, you would pay 9.5 cents for the storage a month. Then about $122.76/month for the bandwidth since the first GB is free and it's $0.120 per GB afterwards up to 10TB.

Making it around $122.855/month total. This change doesn't really reduce your costs by much since most of it in bandwidth which hasn't changed.

http://aws.amazon.com/s3/pricing/ (standard storage pricing hasn't been updated yet)

-----


I become more and more amazed at the price of storage but in the same breath I am still horrified at the price of data at scale.

Right now I have a VPS at Carat Networks to throw crap on and I pay $15/m. For that I get 50GB of space and 500GB of transfer. I understand the speed and reliability is greatly improved with S3, but as a simple file host, it still makes sense for me to throw it on a vps or low-end dedicated server at 1/10 - 1/3 the cost of ^this^ projection.

-----


" … I am still horrified at the price of data at scale."

This isn't "the price of data at scale", this is "the price of flexible, reliable, available data at scale"

I think what some people don't understand, is that Amazon _aren't_ trying to compete on price.

With Amazon, you're paying a premium for the ability to scale, both up and down, very quickly.

Rackspace, Linode, and some-guy-subletting-racks-in-some-local-datacenter can easily beat EC2 prices for "general purpose servers". What Amazon does differently is let you quickly and easily go from 1 "server" to 10 or 100 servers, then switch most of them back off again 4 hours later. I deal with a great local hosting guy, who can (and does) fast track provisioning for me at times, but if I called him and said "Ummm, the CEO is on Oprah tonight, I need 100 additional webservers, a load balancer or two, and a dozen database slaves; to keep my not-architected-for-scale-but-suddenly-in-need-of-it web app alive at 8:30pm tonight", there's no way he'd be able to do it. And even if he _could_ there's now way he'd agree to if I said "and I only want to pay for it all until midnight, then shut all the extra down and go back to charging me for my single instance".

"$1000 per terabyte per year" might seem crazy expensive if a sensible alternative for your data storage requirements is to go to BestBuy and grab a 2TB external drive for ~$100. But that's a _very_ different thing to what Amazon are selling...

-----


I would rather pay 1/5th the price of Amazon for all the other days when I am not on Oprah though.

Our current CDN provides for 4k per month what amazon would charge 18k for.

Yes, that 4k is on a 12 month contract that we had to negotiate. We are paying for about 4 times the bandwidth per month that we are actually consuming at the moment, but it's just so much cheaper overall and the bandwidth we don't consume each month rolls over to the next. (We plan to consume it all one day!)

I firmly believe that the vast majority of AWS customers are paying for flexibility that they are not actually using 99% of the time.

-----


This is 100% accurate. My point was that S3 is good at being scalable infinitely, but you can use low-end hardware to scale up to point x for storage. This will be small and fail quickly when it comes to media companies, but for most web apps who need an image host or cdn, it'll go a long way at a fraction of what aws charges. My problem I guess is that I see younger companies looking at aws, linode and rackspace as the _only_ solution and I think that's unwise.

I realize I'm slowly going offtopic, sorry! AWS still rocks and I wish I could use my amazon gift cards there.

-----


Comparing a VPS to S3 is apples/oranges. The redundancy/backups/scale S3 provides over a single VPS is very valuable to people. This argument is silly on a post about S3. Yes if you want a cheap webserver to dump things you can get that. You can also get an EC2 instance with plenty of space pretty cheap to just dump things as well.

-----


s3 is not optimally priced for consumer file storage, i'd think it's better priced for webapps and such, where speed and reliability is super important.

if you web server is on ec2 you also get the lower latency of having everything in one place.

-----


Very true, and that's why S3 is excellent, but I still feel there's a lot of value in using low-end servers until you're running at a large enough scale where redundancy actually matters (not just we should use this cause it's what everyone else is doing).

-----


But the ultra low end tiers are free.

Your scale is limited by your wallet. You don't have to ref actor your infrastructure as your service grows.

-----


not to forget that S3 has various backups of that data.

-----


How much of that 500GB of transfer do you actually use every month?

-----


I use around 100GB of it, but at $15/m I don't consider that a wasted 400GB, instead it's just available for tunneling.

-----


This is the reason why companies like your host can offer 500GB or terabytes or even "unlimited bandwidth" for such a low price -- its because they sell to a lot of people and pray that 90% of them won't even come close to using their full bandwidth allotment.

If everyone that was paying for the 500GB was using anywhere close to 500GB at that price that company would go bankrupt very quickly.

-----


I thought I would run the numbers.

http://drpeering.net/white-papers/Internet-Transit-Pricing-H... is a good rough estimate of Transit Costs on the internet.

For 2012, it's around $2.34/megabit/second.

Testing the math "at break even" - $15 / $2.34 = 6.4 megabits/second. 6.4 megabits/second at 30 days in gigabytes = 2,073.6 gigabytes.

So, there's enough margin for everyone to be using the 500GB without that ISP going bankrupt. (Yes, I realize that they have costs for servers, cooling, real-estate, diesel, staff, security, etc..., but this shows we're in the right ballpark with a 4x margin)

-----


Right, at 500GB it's definitely a reasonable cost -- I think VPS providers like the parents host are much more sensible with what they advertise.

Virtual hosts like Bluehost ("UNLIMITED Domain Hosting, UNLIMITED GB Hosting Space, UNLIMITED GB File Transfer") and Dreamhost ("Disk Storage Unlimited TB + 50GB Backups, Monthly Bandwidth, Unlimited TB") however are the ones who are especially bad with their advertised offers (all for around $5-7 a month). You start using even a couple hundred GB of bandwidth a few GB of storage and they're happy to kick you off for "abusing resources".

-----


Thanks, that's what I figured, but some of the terms such as "instances" was confusing me.

-----


Started to use Stripe recently for a project and have used Braintree extensively for work. Your comparison is spot on.

There are some small differences such as, Stripe can send credit card details through AJAX to their servers so they are never sent to yours. Whereas Braintree uses Javascript encryption to do it, so the encrypted values are sent to your server. But I haven't seen anything that only Braintree supports yet.

Tradeoff with Stripe is that while you get a much simpler API, it (likely) will cost you more than Braintree depending on scale and what cards are commonly used.

-----


Besides client side (javascript) encryption, Braintree supports Transparent Redirect (https://www.braintreepayments.com/developers/api-overview) which lets you post data directly to Braintree.

The nice part of sending the data to your servers with client side encryption is that you can do validation before sending to the payment gateway. For example, if you want to ensure everyone enters a cardholder name, you can validate the non-encrypted fields before eating the cost of calling a payment gateway.

You can do some of these validations in javascript, but javascript is error-prone (firebug) and not as flexible (have access to a lot more data server side).

-----


Good info, thank you both.

-----


The aggregation framework is meant to fill a gap between SQLs SUM, COUNT, AVG, etc without requiring a full map/reduce. The Hadoop integrations are unrelated and are just a nice little bonus that they added.

-----


Seems that they do http://www.videolan.org/vlc/download-ios.html unless it's a different iOS version of VLC than the one on the app store that was removed.

-----


I'm debating buying an iPad and not being able to play all media formats is a big one. Can I drag mkv 720p files to my iPad and will VLC play them if I can get it installed via above? Also hows the battery life if trying to play a 720p or 1080p file?

-----


FWIW my Android Transformer plays everything using MX player. There's a seperate install for the codecs, but its easy and painless. All in the android play store. No sideloading needed.

-----


Perfect! Thank you for finding that! I skimmed their site but obviously not deep enough.

-----


In addition they changed the private repo indicator so it's no longer a very obvious yellow background, but a single label that says "PRIVATE".

It was actually useful to have a very obvious indicator that your repo was private and nobody else could see it.

-----


It would be nice if "Plans & Pricing" on https://www.mybalsamiq.com/ actually took you to http://balsamiq.com/buy?p=myb rather than http://balsamiq.com/products/mockups/mybalsamiq#pricingtable and then having to click another "Plans and Pricing" link.

Looks cool though, will definitely have to check it out.

-----


Fixed, sorry about that.

-----


It was crypt-MD5, the fact that they call it MD5 with salt is generous at best. They seem to have made the decision to move to crypt-MD5. I don't really have any faith in their ability to secure the servers.

-----


True, but I think this has been fixed, assuming their new site is live:

"The new Mt. Gox site features SHA-512 multi-iteration, triple salted hashing and soon will have an option for users to enable a withdraw password that will be separate from their login passwords."

-----


Which means that they're not using bcrypt, which means they still have no idea what they're talking about and are probably insecure.

-----


They could be using PBKDF2, but if they were, they probably should have said the magic words. Also, the iteration count is kind of important. If it's triple-iterated, that won't do much good.

-----


Even with the iteration count, SHA512 is not exactly meant to be slow. They're taking the long way around to try and get the security of bcrypt... without just using bcrypt.

-----


> Even with the iteration count, SHA512 is not exactly meant to be slow.

Increasing iteration count is synonymous with intending something to be slow. BCrypt itself uses a default of 2^10 iterations in most bindings. PBKDF2 + and an NIST studied hashing algo like SHA512 is a perfectly valid method.

-----


Iteration is valid, but what is this about "triple salting"?

Googling "triple salted" sha -gox gives me 13 results, of which 3 are about caramel cupcakes and none are serious evaluations of such an approach. It sounds like homebrew security.

-----


I can't see how it could mean anything at all. Your password is either salted or it isn't, a hash can't really be said to have multiple salts. Maybe they're using different salts in their various rounds of hashing, can't see how that would provide any more security.

-----


Not sure why I'm downvoted, SHA-512 is obviously better than MD5 and we don't know the details. The constant spewing that bcrypt is the only way to hash a password is getting old fast.

<edit> Ok, whatever, keep downvoting, fuckers.

-----


The reason you're being downvoted is because this has been explained a fair number of times on HN. The problem with using SHA-* or MD5 for hashing is that those algorithms are designed to be fast. This means that it's relatively easy for a cracker with a dump of the database to bruteforce passwords, since they can try gazillions of combinations very quickly. Hell, they can even parallelise the task on EC2 and get it all done in an hour.

By contrast, computing bcrypt takes a significant amount of time and CPU. It's slow. It's designed to be slow. It's designed so that you will need a LOT of CPU power to bruteforce it.

So, no, SHA-512 is not much better than MD5. It's still a fail.

-----


And bcrypt is better than sha512, why use an inferior option when you don't have to? bcrypt both exists and is free.

-----


Many are forced to use insecure hashing for compatibility reasons with outside vendors. Google email for orgs/colleges has two options for hash exchange (or used too... it may be different now) MD5 and SHA1. So you could not migrate user accounts unless the hashes were MD5 or SHA1.

-----

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: