

Show HN: Crawling car dealerships for a real time consumer search engine - webtrill
http://www.demanjo.com/

======
ambirex
I have built something very similar to this is the past, here are a couple of
observations:

Scraping:

\- Dealer web sites are run by a handful of different data brokers, for the
most part if you find a good way to scrape one (say dealer who uses
<http://www.dealer.com/> than you can extend your scraper to get others)

\- Dealer web sites, in general, are horrible to view.

\- Learn to love VIN explosion/decoding -
<http://www.researchmaniacs.com/VIN/VIN-Decoder.html> the dealers enter
features in so many different ways, it is your best chance to normalize the
data.

\- Normalize the ext. color, create a distinct list of all the crazy colors of
car makers give their cars and all the short hand dealers give the colors.
Create a map of the colors and apply when scraping

\- Scraping is like farming, a lot of initial work, but there is constant
upkeep for changing sites

Display:

\- User's don't want to search by city as much as what is close to them, you
should geocode the dealerships and display distance. For instance if I search
West Des Moines, I would expect inventory in Des Moines also to come up

\- Add searching by zip code, you can easily find database of the centroids.
It can also be a cheap way of geocoding the dealerships

\- Switch Mileage to use miles instead of KM, it looks like most of the
inventory is in the US an that is what the user will expect.

\- Use a ip2geo to set the initial location of the search, right now it looks
like it is all over the place, check to see if the browser supports geo
location and optionally set the initial search by that

~~~
MatthewPhillips
How did you do bulk VIN explosion? Pay for it (rates are high in my research)
or scrape one of the free sites?

~~~
ambirex
Built my own scraping some of the free sites, You can do basic make and year
off plain VIN, then scraped some of the free sites with break downs of the
various VINs to build a a map.

------
deefour
I think this is nice, and I already found a deal I haven't come across yet on
a vehicle I'm interested in, so thanks!

A few initial thoughts:

\- I can't click the radio buttons themselves within each options group; only
the link/label itself is clickable

\- I would rather the mileage filters be < 10k, < 20k, < 36k, etc. If I can't
have this, I want to be able to select multiple mileage filters at once

\- I want to select multiples of other filters too. Year for example. eBay
allows me to enter "2009-2011" or "2009-". I can only view one year at a time
on this site.

\- It took me a minute to realize I had to select a make before a model (yes,
I feel stupid for this); showing an empty stub where the options will appear
once I have selected a make is not intuitive - how about not showing the not-
yet-ready filters until they are relevant?

\- Seeing the "Contact Dealer" red button directly below the phone number made
me initially think I was going to be calling the dealer. I finally clicked it
after seeing no other place to view the dealer's own listing. I'd make getting
to the dealer's site a bit more prominent

------
jbattle
A few thoughts: \- I _think_ CarFax has an affiliate program that might let
you link through to their reports, keyed by VIN. These reports are a pretty
valuable tool to shoppers as the condition of a used car is driving 50% of the
buying decision.

\- One data point that is really important to dealers is how long the car has
been on the lot. Hypothetically you could just track how long the same car has
appeared at the same dealership. The longer the car has been sitting there,
the more motivated they'll be to sell it. I remember hearing 30 days is a long
time for dealerships to sit on a car. For buyers this could be good
information to have.

\- Getting the mobile experience right would be a huge win. So often the cars
the dealer lists aren't actually the cars they have on the site (I'm not sure
how much of this is intentional and how much is a matter of how fast inventory
turns over). So when you get to the dealership, you want to quickly and easily
be able to comparison shop - get that right and you cover people in that
really critical uncomfortable moment.

------
Kynlyn
What about trim levels? Trim levels are crucial in the buying decision. For
example, if you search for a Ford F-150 on your site, there is no way to see
if it's an XLT, King Ranch, Raptor, etc. The price and equipment difference in
those vehicles is huge.

What about options and equipment? Does the car have navigation? Sunroof? Most
consumers have specific option packages in mind when searching for car online.

This is why VIN explosion is necessary for any serious automotive shopping
site. If a consumer can't narrow the vehicles down to a trim and option
package level then it won't get wide adoption.

~~~
bfung
VIN explosion doesn't always return truck trims, as many times the actual
truck beds are added after the vehicle rolled off the assembly line.

So they will probably need more than VIN explosion for that; there are some
companies that do provide that data though.

~~~
Kynlyn
Trucks are certainly the most difficult to decode. But if you are using
something like ChromeData for the VIN data, and combine that with the info
from the dealer's site then you can usually narrow the vehicle down to a
specific trim level.

Not always however. Dealers frequently have incorrect or missing information
on their websites, so garbage-in, garbage out.

This is why scraping dealer websites for data is always going to be
problematic. Far better to work with the providers to have them send you the
data. It's faster, easier and you get far better data.

~~~
bfung
I used to work for a competitor in the same space as AutoRevo =) Chrome was
ok, but there was another provider that had exact vin matches in their
catalog. It was a bit more expensive, but made it so the trim field was a non-
issue. I don't remember the name, it's been too long.

~~~
Kynlyn
It might have been AutoData, but they have merged with Chrome. Edmunds has a
decoder, but it's pretty laughable. I'm not aware of any other major players
in that field outside of those.

Chrome offers 1-1 matches on VIN to style Ids for most OEM's, but it's an
additional cost.

------
harryf
Suggest a map that shows which dealer and where a car is located. There's a
subset of car buyers where the car itself is less important than the garage
they bought it from, in case they need to bring it back for repairs / tuning
etc. This is especially true of used cars. And very important on mobile

~~~
viraptor
Maybe it's something country-specific, but why would you do repairs in the
same garage? Or did you mean the dealer's guarantee period?

~~~
pc86
For folks who lease cars (a poor financial decision long-term, but lower
monthly payments) you need to take the car back to the dealership you bought
it from at regular intervals (3-10K miles).

------
dmckeon
Overall, impressive data, but UI/UX could be improved.

'Refine By' menu shows an arrow on mouseover, but user can only click on
words, not arrow or high-lighted area.

'Refine By' pop-out menus show a square, value as hyperlink, count. Square
looks like a checkbox (with rounded corners) but clicking on square produces
no result (this user expected a checkbox response, & ability to select
multiple models, colors, etc.)

Change of other filters does not require a 'go' button, but change of search
radius does.

Results appear to have a wide radius, but pull-down for location does not show
what default/pre-selected radius is (by experimentation, 10 miles).

Generally, the user should be able to tell what filters are currently active,
and what their values are.

Clear Price filter action appeared to clear all filters.

US states and placenames with multi-word names should have all elements
capitalized: s/New mexico/New Mexico/ s/San luis obispo/San Luis Obispo/

Price filter should allow either end-point to be absent.

Mileage should allow arbitrary range end-points, like price.

Year should allow a range, or checkboxes.

Consider having Refine by Make hide the less popular makes behind a 'more'
button. So, by default display top N makes & 'more', 'more' displays top N _2
(or all) makes & 'more', top N_4 or all, top N*8 or all, etc. (otherwise the
menu may grow to many dozens of obscure makes)

Support 'open in a new tab' on the hyperlinks show below the first search box.
Searching <color> <make> <letter> displayed 3 links with counts, but displays
nothing - sigh. (perhaps the back-end function is not yet implemented?)

Again, impressive data.

------
stblack
Great job! Nicely done.

A suggestion, also a source of competitive advantage: Allow selecting more
than one Make per search. Seems nobody does this. I would like to see all SUVs
except those by the big North American manufacturers. To do this I need to
execute many separate searches.

~~~
webtrill
Thanks!!

Will deploy multiple selections of filters by weeks end.

------
nkozyra
Servers couldn't handle it.

~~~
pc86
I really wish people would test their sites before doing this. I can
understand if someone else submits your stuff and you had no idea it was going
to happen (but even then...), but if you create an account with no history for
the explicit purpose of submitting to HN as the OP did, you should at least
test it under some semblance of load.

~~~
mootothemax
_I really wish people would test their sites before doing this._

I'm conflicted; whilst ultimately you're right, it's very easy to type those
words, and not as easy to test for a realistic load.

It's not as simple as throwing ab at your website, you need to use a proper
tool (e.g. JMeter), make sure you're testing realistic user behaviour (even
basics such as whether images and CSS have an impact), and ensure that you're
not getting a false sense of security (e.g. how many connections are
_actually_ hitting the server at the same time?).

So yeah, whilst I kinda agree with you, I think it's a lot easier to say than
to carry out.

~~~
beachstartup
amplified log playback works best in my experience.

of course, this requires at least some public traffic to play back. i've only
used proprietary tools for this in my personal experience, but i believe
jmeter has this functionality (log sampling).

~~~
pc86
This seems really interesting. When you say proprietary tools, I assume you
mean more custom-built solutions and less paid tools available to anyone?

~~~
beachstartup
yeah, tools built in-house to test specific apps, not commercial apps

------
Daduck
Just a weird error. If I accept the location tracking and use this url:
[http://www.demanjo.com/new/search?3=3670447604&0=3526732...](http://www.demanjo.com/new/search?3=3670447604&0=3526732295&1=2320905822)

Then it will show the search, but go to another search afterwards.

~~~
webtrill
Hi, can your provide more details into this error? What location are you in?

Works perfectly for me from my location.

Will be very helpful.

------
sologoub
Searched for WRX, but got a bunch of Hondas. Looks like Honda is a fallback if
model is not recognized...

------
machilin
It's clean and fast. How do you fetch the data and what's the revenue model?
Cheers.

------
agopaul
Funny, I made something very similar when I was looking to change my car 2
years ago or so, but did't open sourced it. The UI was awful, but it had email
notifications when a search query matched

------
dorolow
Seems very useful (looking for a car right now), but the filter by price range
function seems to return zero results with no regard to the values entered.

~~~
webtrill
Noticed, deploying a fix.

Thanks!

------
mgl
How did you manage to come into agreement with car dealers to crawl their
sites and use their photos in your aggregator? Congratulations!

~~~
carbocation
I would guess they did so in the same way that google asks every website
operator before crawling and caching. (I.e., I suspect they didn't come to any
explicit agreement. If google doesn't, why should they need to?)

~~~
Samuel_Michon
Given that Google has been sued over that countless times, as a small startup
I would be wary of following their example.

~~~
lincolnq
This is probably the wrong attitude towards founding startups. In general, you
shouldn't unnecessarily risk the business -- but if people are throwing
roadblocks in your way, lots of startups seem to generally do pretty well when
they play fast-and-loose with rules. The logic is that nobody's going to
bother to sue you until you get big and can defend yourself. Obviously taking
a big risk like this isn't ideal, but you shouldn't let it stop you from
moving forward with a business.

------
bd_at_rivenhill
I'm seeing a fair few duplicates in the results, probably need to work on the
algorithm for filtering these out.

~~~
webtrill
Noticed!! some dealerships are operating under several distinct domain names
which create these duplicates. We are currently working on a solution for
this.

------
lauri
There is a similar site in Russia: <http://auto.yandex.ru/>

------
pirer
Is there any reference for designing classified websites? In terms of
usability and experience.

------
negrit
How come there is no cars in San Francisco ? Yu should may be think of parsing
craiglist ?

~~~
webtrill
Our crawler is currently in motion and i see about 239 cars in SF.

We are solely focusing on dealership websites.

~~~
negrit
The only suggestions I have when I type "San Fra" are : San Clara, MB San
Josef Bay, BC San Josef, BC San Joseph Bay, BC"

~~~
webtrill
Your IP address resolves to Canada.

You can only search for cars in your country.

Will have to change this if required at some point

~~~
negrit
That's even more weird because I'm in France, using a french ISP without any
proxy or whatever and I can see cars from Texas, Nevada, ...

------
thejosh
You should be able to click the image on the homepage to go to the listing.

~~~
webtrill
Thanks.

Will incorporate.

Cheers!

~~~
thejosh
Great, the website is a really good idea!

------
GuriK
Great job. Love the idea. Can you tell us which technologies you used ?

~~~
webtrill
Everything is proprietary!! that is our big advantage.

Data store code name "Saycron" sits on the Demanjo's Distributed file system.

Indexing -- > uses a data structure i call "Octo-tree" Outperforms any
multidimensional index i came across in any academic papers.

Web server -- > Code name "SlimAPE". simple non blocking.

Web framework -- > Code name "HerikX".

I will write about all of these technologies later on my blog whenever i get
to setting up one.

~~~
inovica
I'd like to get in touch. Looked at your profile and no email so can you drop
me one if you get the opportunity? Thanks!

