Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Has anyone tried to use AI to generate an emergent programming language?
49 points by aniijbod 5 days ago | hide | past | favorite | 41 comments
Even if nobody has even tried yet, can we guess what a language generator's neural AI's training data might look like, or whether it would use symbolic logic instead, or even both?

> In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

> “What are you doing?”, asked Minsky.

> “I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.

> “Why is the net wired randomly?”, asked Minsky.

> “I do not want it to have any preconceptions of how to play”, Sussman said.

> Minsky then shut his eyes.

> “Why do you close your eyes?”, Sussman asked his teacher.

> “So that the room will be empty.”

> At that moment, Sussman was enlightened.


i suppose it shouldn’t really surprise anyone that we are still having this argument perhaps 60 years later.

What's the over/under on having it still 60 years from today?

we’ve been arguing about tabula rasa since Aristotle, so it seems pretty likely that it will continue for another 60 years to me.

AlphaZero and MuZero: "Hold my perceptron."

Anyone care to explain the meaning of this?

This seems to be displaying a common disconnect between the perceptions of "AI" and what modern AI actually is.

There is no "emergent" phenomenon currently in AI. I am thinking that the closest concept to what the author is asking for is a generative model, but even this does not really come close to performing tasks like "discovering" a new programming language. And it certainly would not discover an "emergent" anything. If it did "discover" anything it would be an object very statistically similar to the objects used to construct the model in the first place (e.g., deep fakes).

Everything in current AI is data first (read: data only). You can sometimes synthesize data and use that, but you always start with real-world data, and the models you end up with are parameterizations of the data you trained with. Always. So, while there is a future where a technical answer can be supplied to this question appropriately ... We are as of yet not in that future.

I consider this a technically valid answer and not a dismissal of the author's question, btw.

> There is no "emergent" phenomenon currently in AI.

This is a very interesting claim!

Are you making this claim on a purely empirical basis, as in:

(1) "no AI has ever 'discovered' anything notably new/important/useful/interesting/valuable

or are you instead claiming/suspecting that:

(2) AI could not 'in principle' discover/generate anything falling into the positive categories above?

Great explanation. I've had this conversation about production systems multiple times before, and my experience that people are willing to throw quite a lot of money at magical thinking before realizing that there isn't really any "emergent" phenomena.

I pursued this question in grad school [1], leaning towards symbolic logic:

Suppose we want to learn a "natural" programming language. Training data would be example programs that we believe should be easy to express in any language. Since each of those programs will be expressed in a particular language, we'll need a notion of program equivalence across languages. As a toy framework let "language" mean a basis of combinators in pure lambda calculus; this is convenient because we have a (Hilbert-Post-)complete theory of behavioral equivalence among programs (H*, see Barendregt's classic book), and because the combinatory basis problem has been well studied since 1950s. Applying machine learning we can try to "fit a combinatory basis to data" in the sense of finding a finite weighted set of combinators, giving more weight to language primitives with shorter spelling. Our loss function will be the Kolmogorov complexity of our training data, actually gradients will be better if we use a softmax-Kolmogorov complexity. I used gradient descent to update weights of existing language primitives, and used greedy sparse dictionary learning to propose new language primitives. Most of the work was in proving equivalence and approximating Kolmogorov complexity.

It was a cute experiment, but hopelessly far from practical.

[1] http://fritzo.org/thesis.pdf (2009)

You win Risk by starting in Australia. What's a smaller problem with the same properties? User interface.

Our phones should learn a private language with us. My dog learns after one repetition; Zoom should learn to arrange my windows as I like, at least after 47 repetitions.

For the microdose-at-work developers, the trippiest arena would be in the visual realm, where our brains aren't so crippled by convention. Make a tablet/pen drawing app that develops a common language with the user.

> You win Risk by starting in Australia.

That depends. If there are two parties who are going for Australia let them eat each other while you go for LatAm. Closer to the end of the game the 2 Australian armies count for very little against the number of armies you get for winning at least some battle every turn. Pro-tip: hold off on exchanging any sets as long as you can. Have fun!

In my experience you win Risk by declining to play.

Board games are fun. And with kids around the house I'd rather play risk with them than seeing them spend all their time with their tablets. Distraction in the real world beats distraction online. I should know ;)

Don't get me wrong, I love board games and play them frequently with my kids and with other adults. I just don't play Risk.

Not to generate a compiler (very hard IMO) but to generate something that compiles and runs.

My next thought is to use a language agnostic spec and train an AI to create programs that adheres to that spec. OpenAPI for RESTful interfaces would be good.

IMO, using A.I. to generate programs (or languages) is the ultimate nerd snipe. I, sovietmudkip, have to admit defeat else my own productivity and happiness will suffer.

There is someone else out there whose personal incentives align with engaging in this difficult endeavor. But for me, I measure myself in projects realized, and that holds me back sometimes.

This is an interesting question. I think the first thing you need to decide is what will compile the language.

If you decide the language generator itself should be doing the compiling and processing, then congratulations, you are interested in the GPT-3 self-prompting community, and should go read everything Gwern wrote about GPT-3, then join the openAI GPT-3 slack. The short answer is that text transformers output a vector of likely ‘next tokens’ based on a token input. You can choose from this output vector using whatever rule you like, and feed it back in as input.

If you’re wondering how an AI might ‘talk to itself’ and program its own behavior internally, then you might like papers like this as a starting point: https://www.sciencedirect.com/science/article/pii/S105120041.... Short answer: because of how they are wired, NNs tend to have activation ‘areas’ as they process things, and this is represented as the connections between and weights of data flowing through very large matrices; not a thing that’s super easy for humans to interpret as ‘language’.

I believe OpenAI also is publishing more on interpreting how AIs work and think behind the scenes, so you may want to check their blog / published papers.

I know very little about AI. But I think one could try to get AI to write an existing language.

Let me explain: Python, for example, is a language specification which has different implementations in different languages, most popular being CPython (commonly referred to as Python).

So if you fed the existing implementations of python to an AI and asked it to create a new one in some other language - I think this could be very well be a fruitful experiment.

Area's of consideration would be in teaching the model what a human wants from a language. An AI might see that not everybody speaks one language and lean towards a symbol based language that transcends any individual human language bias.

Then how would you train that - show it what computer languages have come before, would it weight it upon how well they are used to get an idea of what humans prefer - the prospects of some java-C-Cobol mutant language do seem a logical output perhaps.

Area's in which I'd like to see AI focus would be - optimisation and code auditing. Which may well prove easier as you have a solid defined goal. Also such endeavours would prove invaluable down the line if you wanted to have AI come up with a programming language. As to do that you would need to train how humans communicate/think and how computers communicate/think and meet in the middle.

[EDIT - spelongs]

> teaching the model what a human wants from a language

Yes, I think this kind of requirement would lead us to a very interesting specification exercise which would address questions like:

(1) what are the different use cases relevant to identifying 'as yet unaddressed' programming language design issues?

(2) what are the 'success metrics' of a new language design?

(3) what are the 'unresolved shortcomings' of existing languages?

(4) What are the issues related to 'non-human language design'?

This last question is kind of about the opposite to 'readability as we know it', where you could anticipate a language suited to creating non-human-intelligible code, which might require tools to translate it into human intelligible form, kind of like the assembly-language to higher-level language issue but kind of in reverse.

A question/challenge that I was hoping would be posed: "but what would be the point?" Here are my answers: (1) To see if it was possible to make a programming language that was 'better' (in terms of performance, flexibility, readability, debug-ability, ease of learning, or any other metric) that any existing programming language? (2) to see if AI would or could create a language that was profoundly different to any existing 'human created' programming language (3) to see if this kind of exercise could give us any new insights into aspects of the nature of programming languages (4) to see if it was possible to use such an exercise to improve upon or refine existing 'language development tools' for creating domain specific Languages (DSLs)

One can argue the neural architecture + weights is the programming language of AI (for now).

Hmmm interesting question. Maybe you could you pass in thousands of code-bases and train an AI to figure out a common General Purpose programming language...

Not sure it could be done, but it would be interesting to see if an AI could come up with new abstractions that it sees in many different examples... Maybe like a new type of iteration or assignment

You don't just want the average of all other languages.

You'd want to somehow codify the effectiveness of each language. Like add up stats on frequency and severity of errors, maybe by time spent either developing or repairing per desired product, and use that as a metric of overhead.

For the language to be useful, it has to be useful to humans not just ais.

Otherwise the only use we could get out of something like that is maybe an ai could use it to write down a portable version of itself or anything else, which we could "use" only to store, replicate, & reconstitute whatever the ai wrote.

Facebook's TransCoder[1] does github-trained code translation between languages, suggesting the possibility of a polyglot or manticore language, cobbled together from existing languages.

[1] https://news.ycombinator.com/item?id=23463812

> suggesting the possibility of a polyglot or manticore language

This does not have to be true in order to translate languages.

But in the case of current programming languages, they all compile to the same standard instruction set architectures following the same basic laws of boolean logic, so it should not be surprising that PLs are translatable. Likely, the model has discovered some basic statistical representation of those things.

> This does not have to be true in order to translate languages.

I'm unclear on what doesn't have to be true?

I intended to observe that a system capable of recognizing idioms in multiple languages, built on the large training sets of those languages' existing code, suggests the possibility of idiom-level combinational synthesis of a novel language. Most easily as a course-grain polyglot (combining large chunks of such code), and increasingly speculatively as grain size decreases (eg different languages in the same file, or function, or expression), or as language options are pruned (a manticore - representational redundancy is removed as a feature present in multiple languages becomes only available in some/one of its forms).

Yes I am writing a language which can be used to construct computer programs based on pattern matching function inputs and outputs to form chains of computation. Still in the early stages and the A.I. component hasn’t been proved yet , even though early versions are in use at large tech companies

Can you link this project?

Can AI be instead used to program existing languages and develop applications? At least develop something more than a skeleton template and able to accommodate various requirements.

Why not train an algorithm to use a tool like Scratch or Blocky?

I think that this is question is in the direction of Stephen Wolfram's approach to AI. Namely: build a programming language that is capable of solving all kinds of problems (Wolfram Language), and then you will have condensed all the knowledge of the world into a programable structure with which you can more easily build AGI. They have the added benefit of collecting billions of user queries with Wolfram Alpha, which might serve as training data.

I’ve always fantasized about the eval() function in JS, which means if you feed it the correct random string, it could modify itself (which even the human brain can’t do). The AI version of Bogosort ;)

If you try to generate language artifacts, maybe you need to gather the sources of major projects, which large companies can do easily with their internal codebase.

You can modify quite a few objects in JavaScript, but you can’t modify JavaScript using eval. There’s no way to add an operator to it, for example (AFAIK. I’m not following its rapid evolution closely)

If you want that, use Smalltalk, a Lisp or a Forth.

Also, the brain does modify itself at least as much as you can modify JavaScript. For example, my brain nowadays accepts that 0.999999… is equal to 1, that there are equally many primes as natural numbers, etc.

take code from rosseta and do that google stuff with translating between two natural languages, networks generated third language

why you want yet another language?

although it's interesting stuff, maybe you make softdev easier :)

take code from rosseta and do the google thing with translating between two languages, where networks created third language as a bridge - thats your language

why you want tohave yet another language? ;)

its quite interesting :)

What methodology do you envision for this project?

Why not use blockchain?

A way that I could see a Blockchain relevance?

Well, once you had developed a tool for automatically generating Domain Specific Languages (presumably based upon getting the tool to educate itself about the domain in question) you could try to get it to develop a language specific to the Blockchain domain (perhaps intending to come up with an alternative to Solidity?).


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact