Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Trying to pollute unethical America PAC's database with synthetic data (github.com/amarcheschi)
7 points by amarcheschi 48 days ago | hide | past | favorite | 2 comments
Before starting, i posted about america pac's unethical data collection a few days ago. https://news.ycombinator.com/item?id=41139801

Please read the linked cnbc's story, it explains almost everything there

Long story short, america pac collects user data while lying and hiding its true intentions. Apparently, the data they collect might (and probably will be) used for door to door canvassing for donald trump.

I find this deeply unethical and shady. I thought a nice idea to waste door to door canvasser's time was to pollute the database with fake names, fake emails...

i've taken the most used names and surnames from the web, i create an email by mixing and maybe shuffling some names/surnames and sometimes adding years to the email, i get a random real, geocodable address from random address package, i create a random phone number with the same code area of 2 swing states, Georgia and Arizona, create fake birthday and through selenium i continuously fill the database.

Of course, people might argue that this isn't ethical and that this could be done better. The first cause it might be discussed, and the second one because i did this in the shortest timeframe possible, i read the article yesterday and worked on it just today.

Now, for the unethical part, i feel like what they do is much worse, and i feel justified to do something that could make real people waste time rather than actually calling people who signed up not expecting what they signed up for.

for the "this could have been done better" crowd, you're completely right lol. I don't know whether your phone numbers have specific fixed numbers, i just know about the area codes. In italy we don't have area codes but the first 1-2 numbers are fixed or they can have only a few values, so it's easy to spot a fake number.

the email generation really sucks too, i basically create a mail by choosing through a random selection whether to use name,surname,and year (the year can be in different positions, such as at the beginning or at the end) or all of them, and sometimes i also shuffle the values to give a twist to it. To me this sucks because these emails are still kinda easy to spot after some time you see them, it could probably be filtered by some regex

I get the addresses through a library called random address, these are addresses in the public domain not linked to people or businesses, whatever that means. I've tried a few and they mostly look like addresses that aren't linked clearly to a home.

Unfortunately, random addresses only have addresses from 2 of the 6/7 swing states in it, Georgia and Arizona. If someone really has time to lose, it might be worth having a way to get addresses from other states, as well as fixing things that might be wrong because i don't live in usa (ie. phone number format? you tell me)




I like the impulse to digital direct action, even if too many years programming has me screaming “refactor!” constantly.


I know the code probably sucks, but I was kinda annoyed and it looked interesting to do. So I just did it fast while breaking things, perhaps this time for a good reason rather than making life more miserable for anyone else




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: