Hacker News new | comments | show | ask | jobs | submit login

I've been participating in Numerai for a few months now. (I've only made some beer money from it, nothing serious.) When you get the data, you have no idea what it is. It's just a file with ~70,000 data points, each of which has 21 features. Each feature is uniformly distributed between 0 and 1. All you have to do is make a binary classification of 0 or 1 (or, more accurately, the probability that the data point is in class 0 or 1). They don't tell you what the 21 features represent.

As far as you know, these predictions could be used to make currency trades, or stock predictions, or real estate purchase, or something more exotic. You really have no idea. And since you don't know what these data points represent you can't use any insider knowledge about anything to help you.

EDIT: This is total conjecture.

only 1 or at most a few of those 21 features represent real data. The real data represents similar information to the insider information which they wish to act upon.

Example data prep:

1. Insider source says that a contract is falling through, a patent is being filed for, quarterly numbers have been missed/surpassed etc.

2. Similar information is gathered from historic performance data of the company, similar companies or market segments.

3. The information is correlated with whatever metric they wish to move along and encoded in one of the 21 feature classes.

4. Repeat for whatever relevant information that can be linked to the insider source - i.e. competing companies, re-encoding separately for long and short positions etc.

5. Fill in remainder of 21 feature classes with noise.

6. Profit.

I'm not sure how seriously you were proposing this, but I would rank it's plausibility as being in the vicinity of Guam tipping over and sinking from the weight of a military base.

Inside traders are usually caught because they make profitable trades shortly before significant company events, or they're caught communicating with the tipper. This wouldn't protect from that.

Once they're being investigated having a plausible reason for making the trade is mostly irrelevant. If you have no reason other than inside information for trading and just say "I felt like gambling" it doesn't matter. If they can't actually prove you had the inside information you're innocent.

If they can prove you had illegal inside information it doesn't matter if you can prove that you made the trade for unrelated reasons. You're guilty.

Needlessly convoluted. It would be much simpler to create Numerai as described, and participate oneself, uploading the 'predictions'. The key Numerai person can de-anonymize it by looking up the mapping of the 1 stock with insider information, and adding a big buy on that one. (I'm not sure if participants upload their model to be run by Numerai or just provide their predictions based on a public data feed; I think it's the latter, but even if it's the former, you can just create a model optimized to emit a big buy on the key stock and otherwise random & self-canceling.) The rest of the data can (and should be, in case of SEC investigation, as using large-scale real data would generate lots of paper trails like payment to data providers) be genuine and released as described - who knows, the participants might actually find real signals which are profitable and also mask the insider trading.

What would be the point of all this? Step 1 is the illegal part, and the other steps don't erase that.

Well of course it is illegal, you just don't want to get caught, and this allows you to have plausible deniability.

it's like if law enforcement uses a legally questionable tactic to get information about someone. Then, with the help of that illegally obtained knowledge, they can go and re-create it (along with a legal paper trail) via other perfectly legal means.

Seems that it would make detecting it more difficult

aka parallel construction

The grandparent's point wasn't that the people doing the analysis are injecting insider information....it's that the data dump you've been presented could be some fancy encoding of insider information.

How would that work? If Numerai already has insider information, why not just use it directly? Why go to the bother (and risk) of anonymizing and distributing it?

As someone else suggested, think of it like parallel construction in law enforcement: an agency already knows someone is guilty, but they know it from illegal surveillance they can't use in court. So they give a tip to an unsuspecting officer to watch out for a car with a broken tail light at a particular location/time, and in the course of the traffic stop the officer finds drugs in the car, then they claim that's how they found out (despite the officer only being there because they already knew).

Plausible deniability.

There would be none if they were aware the information they hand out contains insider info.

How would you prove it? It's just an anonymous list of features to train a model.

I think it's unlikely that insider information comes in some nicely formatted dataset with 21 features week in and week out. I think of insider information as usually being a one-off tip that a company is going to take a particular action before they make that information public. This feels more like prosaic financial data. (But I'm not in finance so maybe my perceptions are way off!)

Hence why I phrased it "encodes insider information"

So you get your tip, and you generate a data set that will result in some data analyst getting the conclusion you want.

No, grandparent's point was the the AI data dump is misdirection from the actual trading logic.

Can I participate as an investor putting my money into the fund? Or I can only participate as a data scientist?

>They don't tell you what the 21 features represent.

Half of 42.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact