
Ask HN: Labeling new datasets as a bootstrapped startup - haggy
Hi all. Im trying to validate and PoC an idea for a tool&#x2F;product that centers around the complexities of local city traffic signage. The central idea is to simplify parking in cities by highlighting things on a map like &quot;Street Cleaning Schedules&quot;, &quot;Tow Zones with variable parking times&quot;, &quot;Parking unavailable due to long-term construction&quot; etc. This product will require some form of Data analysis and ML. The initial dataset I was planning to use was Google street views for larger cities and their satellite towns.<p>My question is, as a lean bootstrapped idea, what services are available to me in order to help label data that won&#x27;t break the bank? This is not proprietary or overly complicated data but it can require several forms of labeling. Im thinking just the basics to start such as &quot;Has street sign (yes&#x2F;no)&quot;, &quot;Street Cleaning sign (yes&#x2F;no)&quot; (if has a street sign is Yes), etc. Eventually I&#x27;ll need to feed that labeled data into image processing pipelines that can extract what the signs actually enforce.<p>I know there are various companies out there like AWS Turk and others that employ teams to do this but Im not sure I want to sign an AWS contract before I&#x27;ve even validated the idea. Has anyone used these services before? If not, what are the alternatives?<p>All help is so much appreciated. Thanks in advance!
======
sixhobbits
There are a bunch of services that will let you do this.

At your stage, probably put up some notices in your local university offering
an hourly $ amount (or beer/coffee) in return for some manual labour.

Also look at Figure Eight, CrowdAI, Eureka, etc. There are a lot of
competitors in this space.

If you're looking for tools to help with this, look at
[https://prodi.gy/](https://prodi.gy/).

Amazon Turk doesn't require a contract I think. There are a lot of other
freelancing platforms out there where you can find low-skilled labour.

Feel free to contact me (details in profile) to discuss more. I am exploring
this area at the moment in any case.

~~~
haggy
I checked out CrowdAI and it appears they mainly focus on road and building
features but are open to work with customers to build out new features? Im
kind of confused by that one actually.

Prodigy looks interesting if I end up going down the "do it myself" route
though admittedly I'm no expert in ML yet so I was hoping to focus most of my
attention there but maybe labeling my own data to start is the way to go?

Figure Eight reminds me of your typical "consulting while using our tools in
the background" kind of service which might be a way to go.

Im just talking out loud so please correct me if I got any of that wrong :)

------
mtmail
> The initial dataset I was planning to use was Google street views

Does the Google Streetview license allow this? It could become a derived
dataset of a licensed database.

[https://www.mapillary.com/datasets](https://www.mapillary.com/datasets) is in
a similar business. They have pre-labelled datasets but also work with
developers and researchers to create more labels. There's
[https://www.mapillary.com/app/marketplace](https://www.mapillary.com/app/marketplace)
to submit tasks.

~~~
haggy
[https://www.mapillary.com/dataset/trafficsign](https://www.mapillary.com/dataset/trafficsign)

Found that at mapillary and going to look into it, could be useful!

------
codingdave
Almost every city and state DOT already has this data, because they create it
when they place the signs. If your business revolves around having unique
data, this is the wrong path. But if the business model is about what you do
with the data, I'd skip all efforts to gather it yourself and just get it from
your local jurisdictions.

~~~
haggy
Yea it's 100% what Im doing with the data, not curating a unique dataset
myself.

