Hacker News new | past | comments | ask | show | jobs | submit | cfusting's comments login

Thanks!


Creative idea - I'll think about this.


Check out the paper about the AI in the original FEAR. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d... Much of the "smart" behavior players loved was actually just a simple state machine with nice animations.


Awesome idea!


The author miss-understands how simulated data is created by GANS, VAEs, and other non-physics based simulations. Let's say you have a dataset and would like to create synthetic data using it and a GAN. Then you wish to estimate the distribution D of the data with a GAN. To do so the GAN learns the joint distribution P(X1, X2, ..., Xn) (where in the image case each X is usually a pixel) such that one may sample from D and obtain a new, synthetic image. Indeed, one will generate novel data but the distribution D that was estimated is merely a description of the original data at best and in practice a little bit (or a lot) off.

Now turn to the machine learning problem we sought to solve with the new synthetic data: what is the P(y|X1, X2, ..., Xn) where y is usually a class like "bird". In other words given an image predict its label. Since the data was generated knowing only the statistics of the original data, it can add no value beyond plausible examples developed using the original data itself.

Will this improve the accuracy of a model by providing additional edge case examples and filling in gaps? Somewhat. Will it understand data not represented by the original data and substitute for more thorough, diverse datasets? Absolutely not.

In terms of model improvement, yes synthetic data can help. In terms of the arms race? No. True examples provide knowledge that is unique. If one used a physics engine (GTA is popular for self-drivings cars) one can gather truly novel data; this is not the case for GANS.

It's concerning how willing people are to write articles on this subject without understanding the mathematics underlying the technology.

Do your homework and RTFM.


You are ignoring the fact that generative AI is not closed-loop algorithm. You can synthesize expected features in a data set and feed them to the detector - out of bounds of the generative neural network that rather serves the purpose of mapping into (a subset of) the proper input space.

The power of synthesis is not within the GAN or VAE, it is in the outside mechanism that guides the creation of content with specific domain knowledge about the feature space.

This might not replace the value of real data, but it will allow to accelerate bootstrap, improve coverage (at cost of accuracy), or provide free environments for auxiliary processes like CI/CD in many deep learning applications.

There is a lot of published material on synthetic data augmentation if you actually look for it.


Everything you said doesn't dispute the above comment and agrees with its core premise:

"In terms of model improvement, yes synthetic data can help. In terms of the arms race? No. True examples provide knowledge that is unique. "


I was rather commenting on the first part implying that training a neural network with the statistical distribution that comes out of a GAN or VAE does not add value beyond that generative model capabilities.

I do not agree on that because as I explained, with domain knowledge it is very much possible to shape the data generated for augmented learning - beyond the plain statistical variations of GAN and similar, which are obviously of very limited value in training.


We'd all be happier writing math; writing code is just a nuisance.


The Julia programming language's development started explicitly to address this sentiment.


It's just about package support and the community. If researchers and practitioners were choosing a language based on merit alone it would probably be Julia for native speed and support for scientific computing. It's nice to have a toy language you appreciate but recall the goal is to write math into algorithms; the language is just tool.


Ditto. I didn't realize people were getting picked up enough by the authorities to merit building a Windows clone.


The choice of language usually comes down to the packages. In any of the three aforementioned languages one can easily and quickly manipulate matrices short of an unwillingness to learn. Julia is nice because it's fast with native code. Python is nice because of Scipy. Matlab is nice because it decides how to spend your money without cause.

I'm an AI researcher / practitioner. For me code accompanying papers is very useful and usually this code is in Python. Occasionally it's Matlab but let's be honest, who cares about those papers :). I'd love to use Julia but the package support just isn't there. Ironically people like me are supposed to be writing this code but with a demanding job and a family it's not likely I will be improving their DataFrame effort anytime soon.

Anyway the MAIN reason I use open source software is because if it isn't working correctly I simply fix the code myself. This isn't possible in the proprietary world. Why would you trust your research or production work with code you can't see and edit?

There's been a lot of talk about documentation. Docs are secondary sources, like WIRED, read the code if you're serious about being correct. Even (especially) hired hands make mistakes and fail to write good tests.

This article reminded me of the fictional Simpson's news article "Old Man Yells at Cloud". It's funny, and he may have a point, but it has no relevance.


What kind of flipping are you talking about? I can transpose a matrix in numpy with X.T. What is cumbersome about this?


In python, I hate hate hate having to do

    import numpy
    import scipy.sparse
    import scipy.sparse.linalg
just to begin writing something.

You cannot create a literal array without calling a function. You cannot concatenate arrays without calling a function, and moreover this function has a different name depending on whether your arrays are full or sparse. The @ notation for matrix products is horrendous (and .dot is even worse). Arrays are not first-class objects of the language, and you have to use an external library. This is more infuriating by the fact that other, more complex data structures like dictionaries or strings are natively supported, even if they are mostly useless for numerical computation.

Compare the clean matrix flipping in octave

      z = [ kron(speye(rows(y)), x) ; kron(y, speye(cols(x))) ]
to the python monstrosity

      z = scipy.sparse.vstack([
              scipy.sparse.kron(scipy.sparse.eye(y.shape[0]), x),
              scipy.sparse.kron(y, scipy.sparse.eye(x.shape[1]))
          ])


lol don't blame the tools for your ignorance of them

    from scipy.sparse import eye, kron
    from scipy.spare import vstack as vs

    z = vs([
            kron(eye(y.shape[0]), x),
            kron(y, eye(x.shape[1]))
          ])


It's exactly the same thing that I wrote, isn't it?

The thing that kills my soul is the need for the "vs" function. Why aren't the symbols [] not enough ?


Because python is a different language from Matlab and python lists are lists and note 1d arrays?


When I was in my early twenties I had some amazing experiences that prompted me to ask the question, "if I died today, would I be ok with it?". I'm thirty five now and no matter how hard a day has been I have always been able to answer "yes".

Often the journey is hard, really hard, but if you are moving in a direction you are excited about and being true to yourself it's ok.

There is no success, there is no winning, there is only your own comfort when the lights turn off forever.

That's not to say you can't have financial success and career satisfaction. It's just to say that those things won't give you the freedom to die.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: