I gave a preliminary poster about this a few months ago:
This project is under active development as I wrap up my PhD.
I nevertheless think there are a number of issues raised by the possibility of a human population database, but to be clear: pplapi does not have actual people in it. I am currently writing about some of these issues.
That said. This is very impressive and has my imagination tingling with ways to apply it.
And secondly, I don't believe the author should pander to people who won't even visit the site and instead judge from titles.
i just googled the word virtual
not physically existing as such but made by software to appear to do so.
in effect or essence, if not in fact or reality; imitated, simulated.
simulated in a computer or online.
I think part of the problem here is that I've invented something we don't have precise language for, yet. I call the agent population synthetic and I call the database virtual, and that's just the nomenclature I'm using.
I'm absolutely shocked at the number of people who feel deceived that this isn't an actual database of the entire human population.
This is a serious problem, though. My work exists at the intersection of social psychology and artificial intelligence, and these fields do not share enough vocabulary. There are major barriers to communication between "intellectual silos," and I think all of the confusion in this comment thread about the word "virtual" illustrates this point very nicely.
What do we call such a thing? After reading the author's paper, I think "synthetic" is most appropriate because we would be synthesizing a city like NYC, but not a virtual representation of NYC. In fact, I find the word "virtual" in this context somewhat misleading.
I then searched the site, and couldn't find a description of what the project was, nor the data it contained.
Finally I came here to the comments to find out.
I consider the site a failure if it doesn't answer obvious questions about what it is, an unfortunate failure that is common in new 'lean startup' pages etc -
In defense against these sites that wants me to sign up etc before I know what it is, I simply forget them and their name and never mention them to anyone. It helps me sleep knowing I'm only supporting honest sites that explain what they are and what they do .. ymmv
Me: Mom, y didn't you sign me up for this?
Mom: Cuz you never asked for it...
Me: Uhhhh, I was little. I barely knew anything.
Have you given any thought to the possibility of providing this synthetic population on a finer grained basis? It could be extremely useful if this were available for, e.g., each U.S. county (or even something smaller than that).
Edit: I noticed that each person has a lat/long location. So maybe a different way of framing my question would be, at what level of geographic granularity does this reflect differences in the distribution of the various characteristics? And, assuming that it reflects sub-national variations, have you considered allowing a random agent to be selected within an arbitrary geographic area?
This, again, is really really interesting. When it comes to the fine-grained U.S. data, we may be approaching if-you-dont-build-it-I-will territory. :)
I'd adopt one for a $1. to help fund your research.
you need a way to find the closest one to you.
Religion: Muslim 10-15%
Income: $18466 USD
edit: from browsing more random "agents", there are a lot of 5 y/o's making tens of thousands of $ per year:
Country: United States
GPS: (36.073868, -103.923638)
Income: $57834 USD
Ahh, to be young.
So any time you lack cross-correlation data (and age vs income isn't a widely available public data source) it will assume the data is uncorrelated, and you'll get this kind of error.
Whether it reduces the utility of the data depends on the use case. I suspect it often will.
I did think that referencing social networks is a bit off, since this isn't a social network model. We have those, this isn't one.
Is the precision limited to the country level?
Of course, the devil is in the details, so when you zoom in on any individual you can see the "uncanny valley" of how the simulated agents aren't quite right.
Edit: I should have read the documentation first .
> The current database contains 7,171,922,938 agents and is approximately 6.8 TB in size.
Thanks for the report!
What does it mean by age 0 (unborn), how are those entries in there.
Do you plan to simulate deaths as times go by and their stated birthdate puts their age above expected mortality and add more new "babies" as well, is that what I'm seeing? If not would be cool.
Let's say I was looking for a startup opportunity. What kind of products can I build with this?
Can someone download the entire data set in JSON format, even for a small fee?