You can find a set of requirements that aren't. Eg 2-factor can include phone number. And activity requirements can be based on repo maturity (no just pushing to random empty repos).
And while some boy accounts may have them, I doubt many have most.
Also, you argue on semantics but the general idea of setting up a legitimacy test that factors in various things is very easily doable, the factors can be kept private, and you definitely can find ones that are generally hard to game.
>You can find a set of requirements that aren't. Eg 2-factor can include phone number. And activity requirements can be based on repo maturity (no just pushing to random empty repos).
Then you have people complaining about being "shadowbanned" (because there's no recourse if you're a person and the algorithm thinks you're not active enough), or that github is being anti-privacy (by requiring phone number). It's hard to win here.
I think the point is that these requirements are not published, and they are not requirements to use stars. Anyone can star, no one knows whether their account is contributing to the star count. Now, presumably you could star a thing and check if the number went up but maybe introduce slight randomness or delay to obfuscate even those details. I remember when reddit removed the total upvote/downvote counts from the ui
The point is that this is not arguing on semantics nor is it as simple as just a "set of requirements" that they just follow. Battling fraud online is an entire business in itself. Take Spotify plays, YouTube views, Google search ranking, Amazon reviews, reddit votes, etc. These organizations have significantly more incentives than GitHub to reduce fraud in these metrics, and while they do, it's still really really hard and it's very easy to show how these metrics are gamed/faked all the time.
It's not a matter of "here is a list of requirements that no one knows about, and here is slight randomness/delay to obfuscate".
How much do you think it takes to pay an actual human from a poor country to come to work each day at 8am, create one github account after another, enter them in a database, and leave at 5pm?
If you want to "study" how github handles stars because there is legitimate financial incentive for you in it, for $100 a day you can pay 10 or 20 of those people to create few thousands accounts a day. Do it few times a month, and throw these accounts in an automated system that creates random repos, pushes a few commits here and there, etc. Also "introduce some slight randomness or delay to obfuscate these events". Do some A/B testing to figure how the 300k accounts under your control affect a repo star system, then advertise a "GitHub stars service" "$0.50 per guaranteed star on Github". Your average VC funded startup could get 10k stars for $5k.They probably give AWS 10 times that a month.
Once github changes their requirements, do more testing, figure out what the requirements now are, then you're back in the game. If people do it all the time to Spotify, YouTube, Google, Amazon, Reddit, and Twitter, why do you think GitHub would somehow crack that nut?
As someone working with people on the other end of this table, I can tell you there’s a limit of risk, clarity and tech complexity that they are ready to bear. And it’s pretty low. It all works for them only because threads like this usually end up with “it wouldn’t work anyway if I, a six figure guy, had all the time and budget in the world to defeat it, so let’s do nothing” type of non-solution. Which creates a defeatist spirit culture. Paying third-world workers is often economically and structurally unviable for the most low-hanging bot-like activities and it doesn’t even stay that cheap either once the demand grows due to technical barriers. I, being a lot less paranoid and defeatist, also tell these guys that it won’t work because this and that, but then it works, because the solutions the defending side comes up with are either laughable or from “so dumb, I feel I’m gonna faint” category. You won’t believe the elephants that can fly under their radar.
people do it all the time to Spotify, YouTube, Google, Amazon, Reddit, and Twitter, why do you think GitHub would somehow crack that nut?
Because the listed projects do basically nothing, a bare minimum. They don’t even care as long as bots don’t play against their direct interest. Who cares at a media company, or a sales company, who exactly is at their top, as long as they are both not bad enough? Profits come either way. They all are shittiest examples of it who created, incorporated and are themselves part of this problem.
It’s akin to immune system. Its goal is not to protect you from every hiv and cancer, but to avoid constant infections from stupid low-effort attacks. You don’t have to make it prefect, but it must be there. The more cryptic it is, the less welcoming it is to game it through basic means, the better.
> the point is that these requirements are not published
Well-connected people will get the tip off. And your PR team will have to keep batting down conspiracy theories, since if there's one thing the nutters love it's black boxes.
Hmm. I don't have SSH, but have many GH projects, and have been active for a decade. So, I would be filtered out as not an active dev, with the spammers?
I've heard of gaming GitHub stars by asking their friends to star their projects which would get around all of your bullets. Hence why I said it would be hard to game.
there's various reasons webs-of-trust don't takeoff, but I can imagine a system where the metrics I see are only aggregated from friends-of-friends, and any other signal is just considered untrustworthy and therefor not worth observing
Do you still trust that system when your friends-of-friends are the ones gaming the system? Given the inherent network effects of manipulating webs of trust, I wouldn't be surprised if everyone had at least one friend-of-a-friend they shouldn't necessarily trust.
Given all the obvious bots and sketchy recruiters that try to connect with me on LinkedIn, who all appear to have at least one mutual connection, it probably won't work.
Do we have a similar issue on GH? I think the nature of the service and its target audience affect this problem in a big way. You can follow anyone on GH, but there's no mutual connection option at all. LI has following and mutual connections. LI also has a much wider audience.
How might a 'connection' look on GH? Will people freely connect, or will they appraise requests more closely?
I can only speak for me personally. For me the way that I use GitHub I don’t think the concept of “friends of friends” would be all that useful on GitHub.
There are a handful of people that I know IRL that I follow on GitHub. And a few hundred that I follow in total. Out of the handful of people I know IRL, and who I follow on GitHub, only two or three of them are active there any given week. All of the other people I follow I have very little idea who they are. Usually I follow people I don’t know if I come across their profile and either the profile itself or their projects make me follow them. But I star way more different repos than the number of people I click follow on.
For me, the main way of discovering new repos are:
- Frontpage of HN, and comments in posts on HN.
- Specific search results on Google when I have searched for libraries or programs that do specific things.
- Libraries on crates.io that I think might be interesting to look into in the future.
Maybe once or twice a month I happen to click on the main page of GitHub itself and see mentions of repos that have been committed to or starred or created by people I follow.
So for me I don’t think “friends of friends” is a particularly great signal for things to look at. Most of the people I follow, I don’t know much about them.
Likewise, for anyone that follows me it’s not necessarily any strong signal that I follow someone else in order to determine if activity from that someone else should be shown or weighted as more significant to my follower just because I happen to follow that other person.
If you do want a strong signal for who to boost for my followers based on my own activity, go and look at the dependencies that I am using in my own projects. That’s a pretty good indicator that I put some amount of effort and interest into looking at something. This could be done by GitHub itself, parsing the Cargo.toml files of my projects and extracting the dependencies section and looking up which of those dependencies are hosted on GitHub.
I can imagine access to raw data instead of some stupid come-on-game-me-able predefined indicator, and that I can run some private statistical analysis over it. People would use (and share) different algorithms and gamers will at least wander through this collectively created mud without any understanding except for the defaultest measures.
But of course this is too complex and “no one will use it” (tm). So we’ll better have a screwed up recommendation system that doesn’t work at all, cause that’s simpler!
Maybe so, but in this case, I don't think 'stars' is a good candidate for one of those metrics. I think the people worried about 'fake stars' are doing it wrong, and should just ignore the metric entirely.
Any metric that becomes a target ceases to be a good metric.
The wrinkle is that measures that don't easily quantify are more resistant. For example, showing provable use by other reputable or trusted projects, or a significant amount of resources allocated to maintenance, or ...
Really just anything that can't be reduced to a single number in a canonical way will in the long run prove far more useful for longer.
This of course shifts some of the burden onto potential users to assess things more critically, and forecloses direct numerical comparison. But the idea that you could just look at a number and make such comparisons was faulty from the get go.
Most people are happy living in a fair ecosystem - it's only the 1-2% of the population that seek control, money and power that start trying to exploit the system.
Only if we let that minority keep manipulating the system without consequences, it becomes the driving market force that the rest of the population also feels they have to comply to, to go along, as it already has happened in finance, academia, etc.
I'd push back against the 1-2%. I think the reality is, 1-2% is the group of people who will exploit the system and more importantly, have the means to do so. But the number of people who would exploit the system is probably quiet a bit higher, but it doesn't matter because they don't have the means to do so.
I don't really think so. The Amish have a nice system. Their society has many fewer bad actors compared to general society.
Actually one of the keys is repeated contact. People who have to interact again and again will try and game the system less. Not sure how to build that into a star system but why give up so easily? Do programmers give up when you say "this algorithm can't be made any faster?"
I don't think it's just the Amish. Collectivist cultures in general have (or maybe perceived to have, I don't know) fewer bad actors compared to individualistic cultures.
It doesn't matter if people have to interact frequently if there is no real consequences to that interaction. The punishment in those collectivist cultures involves social shunning, shaming, etc. Individualistic cultures almost pride themselves on how much they can disregard social shunning and shaming. Shameless people are celebrities and elected officials. They are admired as opposed to shunned and ignored. A bad actor in an Amish community is expelled and loses access to what that community offers. That would be illegal in the general society unless their "bad act" was actually illegal. Discriminating against someone for being a dickhead who exploits loopholes and unregulated corner cases (without explicitly breaking the law) would be illegal in many contexts.
> Not sure how to build that into a star system but why give up so easily? Do programmers give up when you say "this algorithm can't be made any faster?"
I don't think people have given up. Online fraud detection is a massive industry as is. Spotify plays, YouTube views, Google search, Amazon reviews, reddit upvotes, twitter's retweets, facebook likes/shares, etc all fall exactly into the same bucket. There is even a significant dollar amount attached to many of those more so that GitHub stars. All are frequently gamed/faked and it's a battle between the platforms and the adversary
Good points. I'll only add that I just mentioned the Amish because it's the only culture ("subculture?") that I've read thoroughly about. But I think in collectivist cultures it is indeed much harder to be a bad actor. Perhaps we should have a little more shunning...
For example, Tokyo has a lot of people and they actively dislike interacting with strangers but if you leave your laptop unattended while peeing at a coffee shop, it's very unlikely to have been stolen.
Me neither. I don't give a darn about GitHub. It could burn in hell for all I care. But my comment was about this phenomenon in general, not how it affects some Microsoft service.
It's not fighting against human nature, it's fighting against the incentives of our economic system and the people who exploit them. What you're doing here is sometimes referred to as naturalizing - acting like something is the natural state of things, when it is specific to a present social system.
I believe networks of human individuals can solve this to a good degree assuming a particular topology exists.
Like, imagine a group of professionals of decent sized, all specializing in a similar field, and having lots of strong connections between each other where they have ample opportunities to share information. It would be hard for an outsider to come in and astroturf their product without immense effort (like hiring shills to attend conferences). In-person networks also obviously solve the problem stars as reputation: reputation spreads naturally in these sorts of networks.
I think the problem comes with algorithmic scale. Maybe a solution would be to have more community building activities (maybe preferably offline).
> showing _regional_ stars like Apple/Google would be a start.
What does that mean? I thought regions only impact ranking not the net amount of stars (assuming we're talking about Apple/Google Maps). Which as far as I know, github doesn't do ranking.
People already use github as a professional portfolio. Facebook uses real ID, and how popular is it?
> What does that mean? I thought regions only impact ranking not the net amount of stars (assuming we're talking about Apple/Google Maps). Which as far as I know, github doesn't do ranking.
At least on IOS reviews and ratings are by country, I dont actually know about google play though. (I dont have an android to check since I am not poor)
I can see github platform internals caring about this for anomaly detection, but as a developer, who cares? I suppose a botnet could be making fake stars on a malware project or supply chain attack, but the problem there doesn't seem like it's the number of stars.