All metrics will be gamed at some point. I don't know exactly how you could even...

nwienert · 2025-01-02T20:11:46 1735848706

Only show "Active Developer Stars" by default:

- Only accounts that have a decent amount of activity (pushing code, commenting, etc)

- Has set up SSH

- Older than 2 years

- Account active consistently for at least a year

- Must have 2-factor enabled

- Filled out profile

etc

eddythompson80 · 2025-01-02T20:35:27 1735850127

All of those are very, very, easy to automate. There are plenty of bot accounts that have unintentionally checked the full list.

krick · 2025-01-03T01:22:07 1735867327

Ironically, I suspect there are more "real" accounts that don't check the full list. Reminds me of most primitive captchas somehow.

nwienert · 2025-01-04T09:22:05 1735982525

https://news.ycombinator.com/item?id=42583109

nwienert · 2025-01-02T21:19:50 1735852790

You can find a set of requirements that aren't. Eg 2-factor can include phone number. And activity requirements can be based on repo maturity (no just pushing to random empty repos).

And while some boy accounts may have them, I doubt many have most.

Also, you argue on semantics but the general idea of setting up a legitimacy test that factors in various things is very easily doable, the factors can be kept private, and you definitely can find ones that are generally hard to game.

gruez · 2025-01-02T21:34:51 1735853691

>You can find a set of requirements that aren't. Eg 2-factor can include phone number. And activity requirements can be based on repo maturity (no just pushing to random empty repos).

Then you have people complaining about being "shadowbanned" (because there's no recourse if you're a person and the algorithm thinks you're not active enough), or that github is being anti-privacy (by requiring phone number). It's hard to win here.

wholinator2 · 2025-01-02T21:59:27 1735855167

I think the point is that these requirements are not published, and they are not requirements to use stars. Anyone can star, no one knows whether their account is contributing to the star count. Now, presumably you could star a thing and check if the number went up but maybe introduce slight randomness or delay to obfuscate even those details. I remember when reddit removed the total upvote/downvote counts from the ui

eddythompson80 · 2025-01-02T22:19:55 1735856395

The point is that this is not arguing on semantics nor is it as simple as just a "set of requirements" that they just follow. Battling fraud online is an entire business in itself. Take Spotify plays, YouTube views, Google search ranking, Amazon reviews, reddit votes, etc. These organizations have significantly more incentives than GitHub to reduce fraud in these metrics, and while they do, it's still really really hard and it's very easy to show how these metrics are gamed/faked all the time.

It's not a matter of "here is a list of requirements that no one knows about, and here is slight randomness/delay to obfuscate".

How much do you think it takes to pay an actual human from a poor country to come to work each day at 8am, create one github account after another, enter them in a database, and leave at 5pm?

If you want to "study" how github handles stars because there is legitimate financial incentive for you in it, for $100 a day you can pay 10 or 20 of those people to create few thousands accounts a day. Do it few times a month, and throw these accounts in an automated system that creates random repos, pushes a few commits here and there, etc. Also "introduce some slight randomness or delay to obfuscate these events". Do some A/B testing to figure how the 300k accounts under your control affect a repo star system, then advertise a "GitHub stars service" "$0.50 per guaranteed star on Github". Your average VC funded startup could get 10k stars for $5k.They probably give AWS 10 times that a month.

Once github changes their requirements, do more testing, figure out what the requirements now are, then you're back in the game. If people do it all the time to Spotify, YouTube, Google, Amazon, Reddit, and Twitter, why do you think GitHub would somehow crack that nut?

wruza · 2025-01-03T06:29:52 1735885792

As someone working with people on the other end of this table, I can tell you there’s a limit of risk, clarity and tech complexity that they are ready to bear. And it’s pretty low. It all works for them only because threads like this usually end up with “it wouldn’t work anyway if I, a six figure guy, had all the time and budget in the world to defeat it, so let’s do nothing” type of non-solution. Which creates a defeatist spirit culture. Paying third-world workers is often economically and structurally unviable for the most low-hanging bot-like activities and it doesn’t even stay that cheap either once the demand grows due to technical barriers. I, being a lot less paranoid and defeatist, also tell these guys that it won’t work because this and that, but then it works, because the solutions the defending side comes up with are either laughable or from “so dumb, I feel I’m gonna faint” category. You won’t believe the elephants that can fly under their radar.

people do it all the time to Spotify, YouTube, Google, Amazon, Reddit, and Twitter, why do you think GitHub would somehow crack that nut?

Because the listed projects do basically nothing, a bare minimum. They don’t even care as long as bots don’t play against their direct interest. Who cares at a media company, or a sales company, who exactly is at their top, as long as they are both not bad enough? Profits come either way. They all are shittiest examples of it who created, incorporated and are themselves part of this problem.

It’s akin to immune system. Its goal is not to protect you from every hiv and cancer, but to avoid constant infections from stupid low-effort attacks. You don’t have to make it prefect, but it must be there. The more cryptic it is, the less welcoming it is to game it through basic means, the better.

JumpCrisscross · 2025-01-02T22:25:35 1735856735

> the point is that these requirements are not published

Well-connected people will get the tip off. And your PR team will have to keep batting down conspiracy theories, since if there's one thing the nutters love it's black boxes.

precommunicator · 2025-01-02T23:13:58 1735859638

> Eg 2-factor can include phone number

In GitHub organization settings you can require to only use secure 2FA which kicks anyone who use SMS 2FA out.

the__alchemist · 2025-01-02T20:42:03 1735850523

Hmm. I don't have SSH, but have many GH projects, and have been active for a decade. So, I would be filtered out as not an active dev, with the spammers?

nwienert · 2025-01-02T21:18:40 1735852720

Sure, but at least stars would be net more useful.

zitterbewegung · 2025-01-02T20:24:57 1735849497

I've heard of gaming GitHub stars by asking their friends to star their projects which would get around all of your bullets. Hence why I said it would be hard to game.

stevage · 2025-01-02T20:23:03 1735849383

So now all the bots are pushing code, have SSH etc...

jazzyjackson · 2025-01-02T20:06:52 1735848412

there's various reasons webs-of-trust don't takeoff, but I can imagine a system where the metrics I see are only aggregated from friends-of-friends, and any other signal is just considered untrustworthy and therefor not worth observing

drusepth · 2025-01-02T20:20:48 1735849248

Do you still trust that system when your friends-of-friends are the ones gaming the system? Given the inherent network effects of manipulating webs of trust, I wouldn't be surprised if everyone had at least one friend-of-a-friend they shouldn't necessarily trust.

morkalork · 2025-01-02T20:25:16 1735849516

Given all the obvious bots and sketchy recruiters that try to connect with me on LinkedIn, who all appear to have at least one mutual connection, it probably won't work.

jagged-chisel · 2025-01-02T20:40:15 1735850415

Do we have a similar issue on GH? I think the nature of the service and its target audience affect this problem in a big way. You can follow anyone on GH, but there's no mutual connection option at all. LI has following and mutual connections. LI also has a much wider audience.

How might a 'connection' look on GH? Will people freely connect, or will they appraise requests more closely?

codetrotter · 2025-01-02T20:55:57 1735851357

I can only speak for me personally. For me the way that I use GitHub I don’t think the concept of “friends of friends” would be all that useful on GitHub.

There are a handful of people that I know IRL that I follow on GitHub. And a few hundred that I follow in total. Out of the handful of people I know IRL, and who I follow on GitHub, only two or three of them are active there any given week. All of the other people I follow I have very little idea who they are. Usually I follow people I don’t know if I come across their profile and either the profile itself or their projects make me follow them. But I star way more different repos than the number of people I click follow on.

For me, the main way of discovering new repos are:

- Frontpage of HN, and comments in posts on HN.

- Specific search results on Google when I have searched for libraries or programs that do specific things.

- Libraries on crates.io that I think might be interesting to look into in the future.

Maybe once or twice a month I happen to click on the main page of GitHub itself and see mentions of repos that have been committed to or starred or created by people I follow.

So for me I don’t think “friends of friends” is a particularly great signal for things to look at. Most of the people I follow, I don’t know much about them.

Likewise, for anyone that follows me it’s not necessarily any strong signal that I follow someone else in order to determine if activity from that someone else should be shown or weighted as more significant to my follower just because I happen to follow that other person.

If you do want a strong signal for who to boost for my followers based on my own activity, go and look at the dependencies that I am using in my own projects. That’s a pretty good indicator that I put some amount of effort and interest into looking at something. This could be done by GitHub itself, parsing the Cargo.toml files of my projects and extracting the dependencies section and looking up which of those dependencies are hosted on GitHub.

wruza · 2025-01-02T20:42:40 1735850560

I can imagine access to raw data instead of some stupid come-on-game-me-able predefined indicator, and that I can run some private statistical analysis over it. People would use (and share) different algorithms and gamers will at least wander through this collectively created mud without any understanding except for the defaultest measures.

But of course this is too complex and “no one will use it” (tm). So we’ll better have a screwed up recommendation system that doesn’t work at all, cause that’s simpler!

kube-system · 2025-01-02T22:32:56 1735857176

Maybe so, but in this case, I don't think 'stars' is a good candidate for one of those metrics. I think the people worried about 'fake stars' are doing it wrong, and should just ignore the metric entirely.

yencabulator · 2025-01-03T15:05:46 1735916746

For my account, only count stars by the top 50% of the contributors to the projects I have starred?

1propionyl · 2025-01-02T20:41:53 1735850513

Any metric that becomes a target ceases to be a good metric.

The wrinkle is that measures that don't easily quantify are more resistant. For example, showing provable use by other reputable or trusted projects, or a significant amount of resources allocated to maintenance, or ...

Really just anything that can't be reduced to a single number in a canonical way will in the long run prove far more useful for longer.

This of course shifts some of the burden onto potential users to assess things more critically, and forecloses direct numerical comparison. But the idea that you could just look at a number and make such comparisons was faulty from the get go.

begueradj · 2025-01-02T20:07:13 1735848433

It comes down to fighting against the human nature. And that's a lost battle.

Set any law you want, our nature will push us to circumvent it even legally.

mentalgear · 2025-01-02T20:36:57 1735850217

Most people are happy living in a fair ecosystem - it's only the 1-2% of the population that seek control, money and power that start trying to exploit the system.

Only if we let that minority keep manipulating the system without consequences, it becomes the driving market force that the rest of the population also feels they have to comply to, to go along, as it already has happened in finance, academia, etc.

JumpCrisscross · 2025-01-02T22:28:20 1735856900

> Most people are happy living in a fair ecosystem

For varying and self-serving definitions of fair. (Almost everyone in the rich world is in an unfairly-advantaged minority.)

tcmart14 · 2025-01-03T03:58:36 1735876716

I'd push back against the 1-2%. I think the reality is, 1-2% is the group of people who will exploit the system and more importantly, have the means to do so. But the number of people who would exploit the system is probably quiet a bit higher, but it doesn't matter because they don't have the means to do so.

vouaobrasil · 2025-01-02T20:55:49 1735851349

I don't really think so. The Amish have a nice system. Their society has many fewer bad actors compared to general society.

Actually one of the keys is repeated contact. People who have to interact again and again will try and game the system less. Not sure how to build that into a star system but why give up so easily? Do programmers give up when you say "this algorithm can't be made any faster?"

eddythompson80 · 2025-01-02T22:50:16 1735858216

I don't think it's just the Amish. Collectivist cultures in general have (or maybe perceived to have, I don't know) fewer bad actors compared to individualistic cultures.

It doesn't matter if people have to interact frequently if there is no real consequences to that interaction. The punishment in those collectivist cultures involves social shunning, shaming, etc. Individualistic cultures almost pride themselves on how much they can disregard social shunning and shaming. Shameless people are celebrities and elected officials. They are admired as opposed to shunned and ignored. A bad actor in an Amish community is expelled and loses access to what that community offers. That would be illegal in the general society unless their "bad act" was actually illegal. Discriminating against someone for being a dickhead who exploits loopholes and unregulated corner cases (without explicitly breaking the law) would be illegal in many contexts.

> Not sure how to build that into a star system but why give up so easily? Do programmers give up when you say "this algorithm can't be made any faster?"

I don't think people have given up. Online fraud detection is a massive industry as is. Spotify plays, YouTube views, Google search, Amazon reviews, reddit upvotes, twitter's retweets, facebook likes/shares, etc all fall exactly into the same bucket. There is even a significant dollar amount attached to many of those more so that GitHub stars. All are frequently gamed/faked and it's a battle between the platforms and the adversary

vouaobrasil · 2025-01-02T23:25:29 1735860329

Good points. I'll only add that I just mentioned the Amish because it's the only culture ("subculture?") that I've read thoroughly about. But I think in collectivist cultures it is indeed much harder to be a bad actor. Perhaps we should have a little more shunning...

lupire · 2025-01-03T02:10:20 1735870220

You are describing small communities, not collectivist ones.

Large "Collectivist" communities have body count in the hundreds of millions.

yencabulator · 2025-01-03T15:09:32 1735916972

Repeated contact is just one mechanism, there are others that scale to country size. The end result is called a high trust society.

https://en.wikipedia.org/wiki/High-trust_and_low-trust_socie...

For example, Tokyo has a lot of people and they actively dislike interacting with strangers but if you leave your laptop unattended while peeing at a coffee shop, it's very unlikely to have been stolen.

keybored · 2025-01-03T00:06:36 1735862796

We’re talking about a stupid gamification and notoriety metric. I won’t be losing sleep over “human nature” failing to honor it.

vouaobrasil · 2025-01-03T10:18:39 1735899519

Me neither. I don't give a darn about GitHub. It could burn in hell for all I care. But my comment was about this phenomenon in general, not how it affects some Microsoft service.

seventytwo · 2025-01-03T04:56:41 1735880201

Amish culture doesn’t scale across humanity though. It’s a walled garden.

JumpCrisscross · 2025-01-02T22:27:48 1735856868

> one of the keys is repeated contact

The other is hierarchy. You can't automate reputation scoring.

thrance · 2025-01-02T20:17:27 1735849047

Not nature no, it's all about incentives. Oftentimes it's financial, for github stars it's prestige and visibility.

banannaise · 2025-01-03T15:58:54 1735919934

It's not fighting against human nature, it's fighting against the incentives of our economic system and the people who exploit them. What you're doing here is sometimes referred to as naturalizing - acting like something is the natural state of things, when it is specific to a present social system.

ozim · 2025-01-02T23:02:45 1735858965

Make a website where you hire people to check out the libraries and publish scores.

But people who have chops for that probably have high enough paying jobs not to care. As most likely no one would pay for reviews of libraries.

uludag · 2025-01-02T20:22:08 1735849328

I believe networks of human individuals can solve this to a good degree assuming a particular topology exists.

Like, imagine a group of professionals of decent sized, all specializing in a similar field, and having lots of strong connections between each other where they have ample opportunities to share information. It would be hard for an outsider to come in and astroturf their product without immense effort (like hiring shills to attend conferences). In-person networks also obviously solve the problem stars as reputation: reputation spreads naturally in these sorts of networks.

I think the problem comes with algorithmic scale. Maybe a solution would be to have more community building activities (maybe preferably offline).

mentalgear · 2025-01-02T20:29:52 1735849792

doesn't mean why shouldn't fight back. That's exactly why we need research projects like these: to maintain the balance.

jasoneckert · 2025-01-02T20:03:29 1735848209

Neither do I.

I believe the only thing anyone can do is take metrics of how the metrics are gamed, as this particular paper has done.

sedatk · 2025-01-02T20:42:21 1735850541

Prioritize the stars given by accounts you follow in the UI. Done.

p1esk · 2025-01-02T20:45:27 1735850727

I don’t want to follow anyone, but I do give stars to repos I like.

sedatk · 2025-01-02T20:54:32 1735851272

Then you'll have to start following the creators of repos you like to build a web of trust.

aydyn · 2025-01-02T20:22:24 1735849344

Requiring real ID and showing _regional_ stars like Apple/Google would be a start.

stronglikedan · 2025-01-02T20:40:35 1735850435

> Requiring real ID

Sir, this is an HN.

aydyn · 2025-01-02T23:27:07 1735860427

Oh thank god I thought we were at Wendy's

eddythompson80 · 2025-01-02T20:39:03 1735850343

> Requiring real ID

Yeah, people would love that for sure.

> showing _regional_ stars like Apple/Google would be a start.

What does that mean? I thought regions only impact ranking not the net amount of stars (assuming we're talking about Apple/Google Maps). Which as far as I know, github doesn't do ranking.

aydyn · 2025-01-02T23:26:35 1735860395

People already use github as a professional portfolio. Facebook uses real ID, and how popular is it?

> What does that mean? I thought regions only impact ranking not the net amount of stars (assuming we're talking about Apple/Google Maps). Which as far as I know, github doesn't do ranking.

At least on IOS reviews and ratings are by country, I dont actually know about google play though. (I dont have an android to check since I am not poor)

awkward · 2025-01-02T21:02:55 1735851775

I can see github platform internals caring about this for anomaly detection, but as a developer, who cares? I suppose a botnet could be making fake stars on a malware project or supply chain attack, but the problem there doesn't seem like it's the number of stars.