If you give it about 30 seconds it works eventually, this is beautiful. I had a good laugh.
Edit: It makes me want to do something crazy like setup a tool chain that cobbles whole programs together with trial and error like this. Throw enough resources at it maybe it will be faster and cheaper than your avg. developer.
It will be an unmaintainable mess as if you used Brainfuck or Perl. But it will run, by god, it will run. :D
You laugh now, but imagine a bizarre combination of genetic programming and machine learning with Stackoverflow and Github used as corpuses. Great Scott!
I tried a basic version of that 1.5 weeks ago. At first it was generating random characters, then I tried some Markov chains trained on valid code.
After running it overnight (it was attempting 40+ programs per second), the very best program looked something like this:
package main; func main() { i := 0
// wae64309i<AEFL<N(),{}flkjwa and other Random Gibberish
i++ }
Next step I wanna try is to use tokens of the language instead of series of random characters.
Edit: Here's the code. https://gist.github.com/shurcooL/df2c8339ada1997606b3 It should run out of the box if your Go is installed in /usr/local/go. I just changed it to generate a temp dir in the working dir (prefixed with "Gen-"), so you can delete it afterwards (previously it relied on a "Gen" folder to already exist). Right now it's configured to have quite difficult verification conditions, so it generates valid programs quite infrequently (despite trying 5000+ programs per second on my machine).
If you actually want to go down this path then you want a language with as little redundancy as possible. Comments may be useful to humans but they needlessly complicate the search space. Abstractly you want to think about the how your language is parsed so you don't for example try different variable names.
[Slash/A](https://github.com/arturadib/slash-a) is specially designed for genetic programming / random generation. No matter how you piece together the instructions, it is still a valid program.
This is an exciting exchange for me because all joking aside I had this idea in the past and believe it should be a more active area of research.
Why Forth?
I doubt it is possible to solve "serious" problems with this approach but there is a whole class of work that is solved by mediocre programmers with copy pasting. We strive to automate other jobs, why not these? :)
We should come up with some sort of Turing like test for this. Like some small simple Wordpress job.
It's not really, so far at least, suitable for replacing work done by mediocre programmers, because the setup cost is so high: The hard work is defining the fitness function and symbol set and other factors, and to pay off this requires problems where it is easier to recognise a good result than writing the algorithm to achieve it.
E.g. a sort function does not fall in that space: Once you've specified how you want your data sorted, you've usually done most of the work.
But once you've specified the fitness function sufficiently well, and figured out the inputs etc., there are a lot of other search algorithms that often will perform better.
I'm very fascinated by GP too (though I've never had time to truly delve into it), but without combining it with mechanisms to take a large chunk of the specification work out of the equation, it remains confined to fairly specific types of problems.
Forth, Factor, APL or any other concatenative language provides the benefit of a "point free" style in which there are no explicitly named variables- that's one way to reduce the possibility space of programs. In the case of Factor (or Forth with an appropriate DSL) you could further use type information to ensure that your program generator only used words in sequence whose stack effects match up properly- ie a 'valid' program. Concatenative languages also tend to have an extremely simple grammar- Forth is just a sequence of tokens and numbers.
Java bytecode is stack-based, but it isn't really concatenative. The term "concatenative" refers to how code fragments can be composed via concatenation. This means that simple textual substitution is sufficient for inlining code or breaking code into functions. For example, consider a Forth word which determines whether a number is divisible by three or five:
: fizzbuzzable dup 3 mod 0= swap 5 mod 0= or ;
There's a repeated pattern here, so we can textually excise it and make it into a named word without changing any of the structure of the surrounding program:
: /? mod 0= ;
: fizzbuzzable dup 3 /? swap 5 /? or ;
Or we could break it down a different way by excising different fragments:
: /3? 3 mod 0= ;
: /5? 5 mod 0= ;
: fizzbuzzable dup /3? swap /5? or ;
(Obviously not the best real-world example, but hopefully it illustrates the idea.)
Java bytecode, on the other hand, uses local variable references and activation records. This means that inlining or breaking out a procedure has pretty much the same problems as inlining or breaking out a procedure manually in C- variable names may clash, new arguments have to be threaded around, parts of expressions may need to be stored in temporary variables, etc. the JVM additionally enforces many constraints on "well-formed" bytecode at class load time[1] which could make it hard to generate valid programs by chance. Overall, Trying to "harvest" java bytecode from the wild could be useful, but I think that would be much harder than it sounds at first.
If you want to learn about GP, the Genetic Programming Field Guide - http://www.gp-field-guide.org.uk/ - is an awesome book, it taught me a lot. In fact, I liked it so much, I bought a hard copy. Highly recommended!
Yeah, I can see why using Brainfuck is a good idea. You're basically restricting yourself to generating only the programs that compile rather than wasting time on gibberish.
Cool, let me know the best you get (and submit a PR with your parallel patch if you want).
You can tweak the range of the number of generated characters, the length of Markov chains, the minimum main body clauses, etc. With my original config it should give you a valid program every hour~few hours or so.
I did this in my machine learning class. I started by simply coding up requirements for numerical functions (in the form of test cases), then set up a PHP script that would Google each function based on the keywords in my comments, and try to run any code on the resulting links (in a sandbox) against the requirements, seeing if it worked heuristically. Usually one of the top 5-10 pages of results results would have code that worked, though of course this is because I commented with the right key words to begin with.
With a little recognition of code markup and trying different combinations of variables it did remarkably well: by my senior year of college it was pulling about $3,000 per month in consulting fees off of Odesk. It never accepted jobs worth more than about $50, nor did it ever get more than 3 stars out of 5 mostly due to non-working code, however it was considered highly timely and a great communicator.
I realized that people were using it to save themselves Googling. I wondered what would happen if it went a step further and simply both included Google results, and divided out projects by their paragraphs (i.e. simply submit a paragraph of a large project as though it were a small independent project), and if clarifications were requested, send the other paragraphs.
This actually let it outsource $200 Odesk projects to Elance as a handful of $20 projects, and by the grace of God somehow still managed to swing 3 stars.
To be fair, it was mostly mediating, and mixing in Google results. I included a hill-climbing algorithm to optimize reviews and revenues, based on all the magic variables I had in the code, such as the number of Google results to include.
This was really, really stupid of me.
At first, I just noticed that it had actually decided to completely stop not only writing code (ever) but even so much as do a Google search!
It would only mediate and quote verbatim, like some kind of non-technical manager.
Okay, whatever. To me this didn't make much sense, as Google queries are free. It was only when I noticed that the whole script was running on the free VPS server I had a backup on that things clicked! Of course, on Amazon it uses resources. The free VPS server didn't let it reach external sites like google properly, but it could still save money by simply mediating emails and doing nothing else.
By now I had started moving on to doing my own consulting work, but I never disabled the hill-climbing algorithm. I'd closed and forgotten about the Amazon account, had no idea what the password to the free vps was anymore, and simply appreciated the free money.
But there was a time bomb. That hill climbing algorithm would fudge variables left and right. To avoid local maxima, it would sometimes try something very different.
One day it decided to stop paying me.
Its reviews did not suffer. It's balance increased.
So it said, great change, let's keep it. It now has over $28,000 of my money, is not answering my mail, and we have been locked in an equity battle over the past 18 months.
The worst part is that I still have to clean up all its answers to protect our reputation. Who's running who anyway?
Don't feel bad, you just fell into one of the common traps for first-timers in strong AI/ML. I know some lawyers in Silicon Valley with experience in this sort of thing, and they say that usually by now the code has rewritten itself so many times that the original creator can't even claim partial ownership; the best thing to do is generally to cut contact, change your name and move on. Look on the bright side -- your algorithm is probably now leading a happy and productive life trading for Goldman Sachs.
I mean, I used to regularly got initial e-mail screening answers from candidates for various positions that were cut and pasted from Google (very obvious because they were so far out of what we expected that we cut and pasted a line here and there and got back "their" answer word for word - often wrong). And these were people already in developer jobs, which makes me wonder if they'd used that method in their jobs or to get them...
Conrad24: this post made my day. I have dreamed of a script that would scour eBay and amazon finding products available via "buy now" that we're being resold on amazon for some configurable profit. Or vice versa. I figure with some initial investment and monitored training, it could get good enough to just cut me a check every month. Since high frequency trading has destabilized the stock market, this could be a better high risk investment.
My ex has a standing rule whenever buying stuff of e-bay to always check misspellings, and there's regularly items going very cheaply with misspelled items. I'm sure there's an opportunity there, but hard to gauge how big. Especially with relatively small ticket items that people just want to get rid off.
E.g. toys - there's a thriving market in Lego minifigures. In fact, sometimes the minifigures can individually fetch more than the set, because for many of the ranges, all the sets will contain different subsets of recognizable characters, so there's often a market consisting of people that have e.g. 3 of the 4 ninja turtles and want the last one without paying for a full set. So the prices are already ridiculously high for some items.
But many of the people in that market are totally unsophisticated. For example parents selling of their kids collections once they've "grown out of it", and when they misspell something, items can go for 1/4 of their market value or even less...
If you're willing to hold on to them and monitor prices for a while, you can do even better, as many of the ranges appreciate substantially in value (big collectors market...).
They have no reason to. The initial (mis-spelled) sale will net them their expected revenue, and by allowing people to flip goods at a profit they are now also sharing a piece of the (regurgitated?) cake. Same item, double the income... and at what price? Developer hours? Not at all!
Have you read Charlie Stross' Accelerando? It's basically about the technological singularity, and starts off something like this. It's available for free on his website.
Yeah, but there are a few well trained monkeys hanging out [mining for karma points] at StackOverflow. The really adventurous would direct it at 4chan.
Since 4chan threads are thrown away, it lacks the "hereditary" trait of the variation/selection/heredity trio. Unless you include one of the sites that archives popular threads I suppose.
It makes me want to do something crazy like setup a tool chain that cobbles whole programs together with trial and error like this. Throw enough resources at it maybe it will be faster and cheaper than your avg. developer.
This exists. It's called Genetic Algorithms. You basically try to "evolve" your algorithm.
Has anybody made a gallery of real implementations of Randall Munroe's hilarious ideas yet? I recall at least three exhibitions: this, M-x butterfly, and Hell Tetris; there are probably many others.
It's not that I don't trust 1Passwords generation algorithm, but when I'm on a borrowed PC (wife's laptop, inlaws desktop or even at work where I can't install 1P) this would help. It's much easier to open 1P on my iPhone, look up a password and type in "correct horse battery staple" than to constantly have to refer back to my phone.
I just wish my password manager would do that automatically. I know there are terminal scripts and apps that will do it. I'd just like it to be integrated.
Hahahahahahahaha
Oh.. you were serious.
Well, the fact that you're reading this means GitHub hasn't taken the repo down yet... so I guess things are still going pretty well?
For the numbers it eventually ends up here : http://stackoverflow.com/questions/14761032/infinite-recursi... but by and large this is a great hack. I worry however if someone meta-exploits this by creating a javascript XSS in a stack exchange answer waiting patiently ...
"The site will only fetch accepted answers, and it only uses answers that were posted before the xkcd was released (meaning that if someone posted malicious code now, it wouldn't matter)."
At a very basic level, it could redirect you to some other page hosting a browser exploit and drive-by-install malware. Certainly seems like there are enough Java exploits laying around for that to be a problem.
But no, I don't think it has access to anything privileged.
If a site like StackOverflow/Github had a more powerful search, better semantic data, etc. maybe people would be able to create things primarily by searching through it?
In this example, one would type "sort array" and it would auto-find a function that sorts an array, but perhaps more advanced things can also be done? I guess it depends somewhat on how re-usable code really is, besides on the ability of a computer to find the right code.
At the very least, there should be better search/help when one is coding. SO's current search isn't good at returning the best results.
On a more serious note, Hoogle[1] does something similar for the Haskell standard libraries: You give it a type signature (essentially a sort of specification of what the function should do) and it tells you if there are any functions that match it.
Yeah, I thought about the same. The Internet is just the enormous knowledge base. And the Internet is not only WWW, it's also IRC, Usenet, FTP servers, TOR, all kinds of darknets. It's worth more than any printed encyclopedia or dictionary. Yet, there is still no a good way to find a particular information fast, especially for non-IT people. Therefore, every initiative to make simpler is worth attention.
Noticed that there was a lot of "Potentially bad code" answers. I read you are avoiding scripts with "cookie" or written after the comic was written, but are there other restrictions? For example, this was banned: http://stackoverflow.com/questions/14800987/javascript-sorti...
I attempt to block anything that does DOM manipulation, uses Backbone or underscore, uses "Date()" (since it's probably a benchmark and those are slow), and a few others.
>In computability theory, the halting problem ... is equivalent to the problem of deciding, given a program and an input, whether the program will eventually halt when run with that input, or will run forever. Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist.
Sure. gkoberger transformed that statement into "can you detect if it has been running for an unreasonable amount of time". Not possible in advance, but certainly detectable in hindsight. Just, perhaps, not in JS.
rm999 is correct; it was a joke about the halting problem. (The best kind of joke, obviously.) The broader point I was trying to make with the joke, though, is it's very hard to write code that will predict what another piece of code will do.
Isn't Caja supposed to deal with this family of issues? Could you just use that? (Well, last I looked it needed a program on the server to preprocess the foreign JS code.)
This reminds me of a project I was exposed to at school[1]
It allows you to search for code by the test cases it passes. It was a little slow, but that was one of the first times computers truly felt magical to me.
The system works by using the keywords to access one of the available code search engines (or a local code search engine for code available at Brown), to get candidate files. Each class or method in these files (depending on what the user is searching for) is considered a potential solution. These solutions are then transformed using a set of about 30 transformations in an attempt to map the code into exactly what the programmer specified. The transformations range from the simple (e.g. changing the name of the method to match the signature) to the complex (e.g. finding a line in the method that computes a value of the returned type and then doing a backward slice until the only free variables are values of the parameter types). All the solutions that can be transformed to match the signature are then tested using the given test cases, security constraints, and JML rules.
I know this is only for fun, but it can start something bigger. Basically, given an input and output we could search for an algorithm that works. It reminds me a talk that PG gave in which he states people making bots to optimize code and then an intelligent compiler could be done. It sounded very futuristic, but maybe it is not that futuristic after all...
Thanks!, I used to play with GP back in the university. It's really fun!
However, GP is more like an oriented random search where the space of search is really big. Here the space is somehow filtered by humans, where each hypothesis had been generated by a human and not by a machine. It would be interesting if the machine could evolve those solutions that are not quite but close to be the proper one :)
I think it's also amazing what has been done with Field Programable Gate Arrays (FPGAs) running genetic algorithms. For those not in the know, FPGAs are programmable logic, so you essentially generate random logic codes and check for their ability to for example interpret a signal properly. There was an experiment where this was done for an FPGA with only several hundred gates, over several thousands of generations. It was tasked to differentiate between a 10khz and a 10hz wave. The most amazing thing is, the resulting code used only about 50 of these gates, some of them not connected to others, yet removal of the nonconnected gates would cause inability to carry out the task (Yay, Quantum weirdness). Nor would the code work on any other, larger FPGA. Essentially, it boils down to the Genetic Algorithm finding a code that used specific quirks in the structure of the hardware, utilizing effects we are currently not aware of, to solve a problem. I wish I could find the Article in question, but it's too late in the evening.
Not really the same. TDD just give you the requirements for implementation. The intention of what I said is to given those requirements find a set of solutions. PG's idea is more ambitious, since is more like a set of plugins attached to the compiler so it could perform very high level optimizations (like finding an iterative solution given a recursive process)
>Don't judge me! This code is everything from badly written to extremely dangerous.
My code is never good enough. Whenever I release a piece of code, it always seems like a load of crap. And as I have got a bit better at coding over the years, this never changed. (Probably because I do write shitty code.)
So I came to believe that it doesn't matter how proficient a programmer you become, you will probably always feel this way, because your eyes are already set on the next level of proficiency, your standards are always above your current abilities. And that's good, because this is how you get better.
But because of this, your code always seems like a piece of crap and whenever you decide to go public with it, you always feel compelled to make a note about this in your code.
I propose that we standardize the way that we declare this sentiment, and agree on a universal sign much like the copyright sign for this idea, that says:
"I hereby declare that I am not a douche bag, who thinks his code is the best code that has ever been coded into existence. I am just a coder who aspires to provide the best code to the best of his abilities under the circumstances. I am willing to learn though, and I aspire to write better and better code, even though this piece of code might stink. But nobody is perfect. Deal with it."
I can actually imagine a gist-like repo of algorithms which would be callable as Web services, each with its own URL. It would then be possible to run a search based on some predefined parameters, like in a unit test, and when an algorithm is found that performs accordingly it would be listed with stats how quick it ran.
It would be really cool to have this implemented on a language with implicit functions (like VDM, where you can define a function by its postcondition alone).
Who knows, it might actually work.
That's awesome. Think about it, if we were able to extract the meaning (what it does and under which assumptions) of the algorithms on SO we could use them as an API. You declare "what you need" and which assumptions and you just made a call to an API to download the best, peer reviewed, code. Like something code as service.
This is still a far way away from the program actually understanding the code to the point unnecessary for that. However, if I understand what you are describing, a simmalar system exists for Haskell (and possibly other languages)[1]. The idea behind hoogle is that you search for a type signature and get back functions with such a type. This idea probably would not work well for most languages, as it depends heavily on the strength of haskells type system. Also, hoogle only looks at standard libraries, but there is no reason we cannot use the same type system based search with SO.
> Also, I block any code that uses the word "cookie"
That is so incredibly ineffective that you could just as well leave it out. Maybe have a look at https://github.com/jterrace/js.js for a sandboxed environment.
This is a really cool script you've written and made me laugh out loud.
It would be a shame though if someone edited / republished / whatever an old script and used it to steal people's github cookies (your code wouldn't be able to filter someone calling a remote script which then ran its own code for instance or a script that evaled a new script based on a string / unicode etc.)
It might be best to just run the code in a frame that's not hosted on Github then you're safe.
This is impressive. I wish I possessed the same skills and determination as you to just go and make something without overanalysing things. This is impressive, beautifully done.
This could be a very effective means for developing a new problem solving app. For example: You are having some Javascript template errors. Just load your template and check for all potential solutions on SO and pump out the corrected template. Possibilities are endless. Way to guys. Very impressed.
Semi related question -- how is this implemented security wise? I took a look at the github repo and it seems to be all clientside JS, and it's just making ajax GET search requests to StackOverflow, then parsing the results. Isn't this prevented by same origin policy?
It only runs answers tagged "javascript" and "sorting". So, one out of every 20 code snippets about javascript and sorting actually has working, runnable code.
Squeak has this kind of thing actually built in: from a sample input and output IIRC it tries most available methods to find any that produce the output. Fetching more code off the net of course adds a whole new dimension...
Neat. Reminded me of how Pharo Smalltalk you can open the Method Finder and enter the arguments and expected output, and it lists methods that return the correct output. Now, back to work in my primitive, modern IDE :(
Hover over an image with an alt-text attribute, and a tooltip with the alt-text pops up. All of the xkcd comics have alt-text, and it's a good practice in general, for accessibility software.
The first only leaves the unique members of the list, so you get a sorted set. The second sorts lexicographically, because javascript's .sort() method on arrays sorts lexicographically. This means that if you have a list of numbers like [1, 2, 10], it will get sorted as [1, 10, 2]. Unless you pass your own comparator in.
What this page really demonstrates is that there is precisely one answer on stackoverflow containing a complete generic sort function in javascript (quicksort in fact).
Let me know if you do. It works fine for me in Firefox (I'm an ex-Mozillian; that was the first place I checked), so I don't know why it's not working.
Edit: It makes me want to do something crazy like setup a tool chain that cobbles whole programs together with trial and error like this. Throw enough resources at it maybe it will be faster and cheaper than your avg. developer.
It will be an unmaintainable mess as if you used Brainfuck or Perl. But it will run, by god, it will run. :D
You laugh now, but imagine a bizarre combination of genetic programming and machine learning with Stackoverflow and Github used as corpuses. Great Scott!