Hacker News new | past | comments | ask | show | jobs | submit | oli5679's comments login

this is pretty ridiculous

A. below is a list of OpenAI initial hires from Google. It's implausible to me that there wasn't quite significant transfer of Google IP

B. google published extensively, including the famous 'attention is all you need' paper, but open-ai despite its name, has not explained the breakthroughs that enabled O1. It has also switched from a charity to a for-profit company.

C. Now this company, with a group of smart, unknown machine learning engineers, presumably paid fractions of what OpenAI are published, has created a model far cheaper, and openly published the weights, many methodological insights, which will be used by OpenAI.

1. Ilya Sutskever – One of OpenAI’s co-founders and its former Chief Scientist. He previously worked at Google Brain, where he contributed to the development of deep learning models, including TensorFlow. 2. Jakub Pachocki – Formerly OpenAI’s Director of Research, he played a major role in the development of GPT-4. He had a background in AI research that overlapped with Google’s fields of interest. 3. John Schulman – Co-founder of OpenAI, he worked on reinforcement learning and helped develop Proximal Policy Optimization (PPO), a method used in training AI models. While not a direct Google hire, his work aligned with DeepMind’s research areas. 4. Jeffrey Wu – One of the key researchers involved in fine-tuning OpenAI’s models. He worked on reinforcement learning techniques similar to those developed at DeepMind. 5. Girish Sastry – Previously involved in OpenAI’s safety and alignment work, he had research experience that overlapped with Google’s AI safety initiatives.


> A. below is a list of OpenAI initial hires from Google. It's implausible to me that there wasn't quite significant transfer of Google IP

I agree there's hypocrisy but in terms of making a strong argument, you can safely remove your list of persons who (drum roll)... mostly _didn't_ actually work at Google?


my_ridiculous_list = ["Ilya Sutskever"]


I think this project is awesome and am quite disappointed with some cynical commentary from large American labs.

Researcher at Meta or OpenAI spending hundreds of millions on compute, and being paid millions themselves, whilst not publishing any of their learnings openly, here a bunch of very smart, young Chinese researchers have had some great ideas, proved they work, and published details that allow everyone else to replicate.

    "No “inscrutable wizards” here—just fresh graduates from top universities,    PhD candidates (even fourth- or fifth-year interns), and young talents with a few years of experience."

    "If someone has an idea, they can tap into our training clusters anytime without approval. Additionally, since we don’t have rigid hierarchical structures or departmental barriers, people can collaborate freely as long as there’s mutual interest."


Why did you group Meta with OpenAI here?


Lendable | Data Science & Software Engineering |Hybrid & Remote | UK, London | Full Time

Lendable have built the big three consumer finance products from scratch: loans, credit cards and car finance. Help us improve the best-in-class credit analytics that powers credit-decisioning.

Data Scientist https://jobs.eu.lever.co/lendable/dfdcc454-b226-4d6a-af1c-74...

PHP Engineer https://jobs.eu.lever.co/lendable/a7363938-4313-4f1e-b1ae-47...

Kotlin Engineer https://jobs.eu.lever.co/lendable/121800c7-c2d7-457b-aaf3-7e...

React Native Engineer https://jobs.eu.lever.co/lendable/3a9e02d3-6680-487b-b416-e1...

Security Engineer https://jobs.eu.lever.co/lendable/033ac43c-f3a2-4047-bc5c-b9...

Other jobs https://jobs.eu.lever.co/lendable?lever-via=8wqtPJSbHB&lever...


https://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004...

mixture density networks are quite interesting if you want probabilistic estimates of neural. here, your model learns to output and array of gaussian distribution coefficient distributions, and mixture weights.

these weights are specific to individual observations, and trained to maximise likelihood.


This approach characterizes a different type of uncertainty than BNNs do, and the approaches can be combined. The BNN tracks uncertainty about parameters in the NN, and mixture density nets track the noise distribution _conditional on knowing the parameters_.


I used fyxer.ai with my work gmail for the last few months and have found it’s been amazing.

I send about 50% of the drafts without editing, and I’m surprised by how much better at spam filtering it is than gmail.

Interested to try this out and compare.


Super interesting! Out of curiosities: would you be interested by a web version of this project?


I just spend loads of time on email.

Web version sounds great.


Adding one to both the numerator and the denominator when calculating average ratings isn’t a terrible idea.

In situations where you’re estimating probabilities (like the average rating of an item based on user reviews), there is a Bayesian interpretation of this adjustment.

Beta distribution is a conjugate prior to the binomial distribution, meaning the update process has a posterior distribution is also a Beta distribution.

By adding one to the numerator (the number of positive reviews) and one to the denominator (the total number of reviews plus one), you’re effectively using a Beta(1,1) prior (uniform distribution).

This approach smooths the estimated average, especially for items with a small number of reviews. This is a useful regularisation, pulling extreme values towards the mean and reflecting the uncertainty inherent in limited data.

https://en.wikipedia.org/wiki/Beta_distribution


If you withhold a small amount of data, or even retrain on a sample of your training data, then isotonicregression is good to solve many calibration problems.

https://scikit-learn.org/dev/modules/generated/sklearn.isoto...

I also agree with your intuition that if your output is censored at 0, with a large mass there, it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.


I hadn't heard of isotonicregression before but I like it!

> it's good to create two models, one for likelihood of zero karma, and another expected karma, conditional on it being non-zero.

Another way to do this is to keep a single model but have it predict two outputs: (1) likelihood of zero karma, and (2) expected karma if non-zero. This would require writing a custom loss function which sounds intimidating but actually isn't too bad.

If I were actually putting a model like this into production at HN I'd likely try modeling the problem in that way.


Did you dictate this? It looks like you typo'd/brain I'd "centered" into "censored", but even allowing for phonetic mistakes (of which I make many) and predictive text flubs, I still can't understand how this happened.


I was thinking of censoring, maybe I should have said another word like floored.

The reason I think of this as censoring is that there are are some classical statistical models that model a distribution with a large mass at a minimum threshold, e.g. "tobit" censored regression.

https://en.wikipedia.org/wiki/Censoring_(statistics)


Thanks for the explanation. I never paid much attention in my stats lectures so I deserve to have missed out on that term-of-art. I think the physics lingo would be to call it "capped" or "bounded" or "constrained".


thanks, it's very understandable that you thought i was mistyping 'centred'.


I'm not the parent commenter, but whisper based dictation is getting pretty awesome nowadays. It's almost as good as sci-fi.

(Fully dictated, no edits except for this)


I also thought that the commenter spoke "centered" and the speech recognition model output "censored".


I get this error when I try signing up.

ValueError at /accounts/signup The given username must be set Request Method: POST Request URL: http://www.rashomonnews.com/accounts/signup Django Version: 2.2.10 Exception Type: ValueError Exception Value: The given username must be set Exception Location: /home/deployer/newsbetenv/lib/python3.5/site-packages/django/contrib/auth/models.py in _create_user, line 140 Python Executable: /home/deployer/newsbetenv/bin/python Python Version: 3.5.2 Python Path: ['/home/deployer/rashomon', '/home/deployer/rashomon', '/home/deployer/newsbetenv/bin', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/home/deployer/newsbetenv/lib/python3.5/site-packages'] Server time: Mon, 28 Oct 2024 16:36:54 +0000


Ok thanks for letting me know. I haven't encountered this before, but I appreciate you telling me about it. Thank you so much!!!


One thing that can be useful with sampling is sampling a consistent but growing sub population. This can help maintain a consistent holdout for machine learning models, help you sample analytical data and test joins without null issues etc.

If you use a deterministic hash, like farm_fingerprint on your id column (e. g user_id) and keep if modulus N = 0, you will keep the same growing list of users across runs and queries to different tables.


My understanding of why bagging works well is because it’s a variance reduction technique.

If you have a particular algorithm, the bias will not increase if you train n versions in ensemble, but the variance will decrease as more anomalous observations won’t persistently be identified in submodel random samples and so won’t the persist in the bagging process.

You can test this. The difference between train and test auc will not increase dramatically as you increase number of trees in sklearn random forest for same data and hyperparameters.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: