Rather than brute forcing all possible permutations, we start from all possible final states and BFS backwards towards starting states. There are probably tons of opportunities for further optimization, but this takes ~800 ms on my desktop, and it only uses one core. We don't need to track the number of turns required to end each state, since the frontier will always consist of games which require the same number of turns to end.
Edit: Shrinking my data structures dropped runtime by 50%, and easy & cheap parallelism dropped runtime by another 75% on my four-core desktop. New runtime: 110ms (Intel i5 6600k). The key to the parallelism is that since the total amount of money never changes, we can assign a different total amount of money to each thread, and they'll never have to search the same states.
Hah I knew there was a better way to solve it! I was sort of sick of the problem by this point and didn't feel like picking it up again :) Glad to see someone else found it and implemented as well and proved that it produces a much faster calculation.
I added this at the end with a link to your solution and your post. Honestly I never expected this post to make it to the front page so thank you so much for taking some time out to do so.
Wow, this is beautiful C++ code!
High level abstractions, not hiding any of the nitty gritty details, parallelism and all, yet very readable. Well done!
I felt the same - not that worse than in a modern language (of course someone should rewrite in Go with the new algorithm so we can decide if it would be good enough :)
Before I read the post I thought how I would solve it. Dynamic programming starting from end states should avoid lots of redundant work. Also making sure to sort the players by money, to reuse symmetries.
> EDIT - Thanks to Dolda2000 I have modified the Java version. It is now about the same speed as the GoLang version. Indeed the issue was the the games were created causing the Java version to have to simulate more games to determine if the game went long enough. With the changes it is running in about ~6 seconds now and has restored my faith in Java.
This is great, this comment deserves to be at the top. Let's not criticize the original article, and instead celebrate that its publication resulted in a crowdsourced solution. I love to see the community coming together to provide knowledge to us all. :)
I love this sort of post, and it makes me miss programming (i'm a Linux sysadmin type now)
The golden age of this stuff was when I was writing stuff for the Atari ST and Amiga. Fixed hardware, manuals that came with assembly instructions and timings, etc.
My greatest moment was when rewriting the Atari ST text output function. For those that don't remember, the ST had an interleaved bitmap [1] and no hardware blitter support, so everything was done with the CPU. Therefore writing text on the screen was hard. I wrote a program that speeded up the C code from the OS about 3x, then Darek Mihocka with Quick ST [2] beat my speedup. I was absolutely aggrieved about this, but spent weeks working out how he did it without any luck.
In the end the solution came to me genuinely when I was dreaming - the Motorola 68000 move.p instruction which worked quite well with the interlaced bitmap. It gave me a nearly 2x speedup and the rest of the optimisations I had meant I beat his code.
I do miss those days of a level playing field where the outcome was the result of your skills, rather than quibbling over the middleware, OS, graphics driver version used, etc etc.
No, it's better. I basically got burned out writing the same code for 5 different financial companies so tried something new. Loved Linux, loved the CLI, loved the power it gave me. I've literally saved a startup when paged in a nightclub at 3am and having to type very carefully to fix their database replication.
You need to pivot in work, especially as most of us will be working for 40 years+. I'm getting into the security stuff now, which I think is the new hotness.Only thing bringing me back to programming is golang which is absolutely superb for server-side stuff.
I always seem to end up in sysadmin work. I guess I'm good at it. If you're smart and can use tools to your advantage you can create great value. Using ansible and judicious scripting I can do the work that used to take half a dozen sysadmins who just ploded through repetitive work over and over manually. That's a good argument to get a nice salary.
Continue with your CS degree. In my experience, all good sysadmins have significant development experience. This is especially useful when working at small companies/startups.
I'd like to counter (but not disagree) with: in my experience, all good developers have significant sysadmin experience, whether professionally or self-taught.
As a tech lead of a team that is largely junior, I keep running into places where a lack of sysadmin-type knowledge seriously harms the ability of a CS degree holder (anywhere from BSc to PhD) to do real-world software engineering work unless they have either been self-taught in sysadmin sort of things or have had a couple years experience that has forced that sort of experience on them.
To give an example, a fresh-out-of-school CS degree holder that never played with Linux servers for fun in his own time might be able to implement some software and then easily learn to Dockerize it, but may be stuck for three days on getting it to talk to other services both locally and in a test environment because of essentially zero exposure to ideas in networking - and yet the fix to the problem takes about 5 minutes for someone with exposure to how networking works. ("What's a bridge?")
Another example comes from general familiarity with the server/client model: the rule of thumb to have a max number of worker threads equal to twice the cores available unless there is a reason to re-evaluate that.
All that said, I would still favor sticking with the CS degree as well. Basic practical familiarity with sysadmin concerns can be more or less picked up over time via personal projects and then as needed on the job, but the formal knowledge accumulated over a CS degree would be much harder to do that with. I would only suggest to make sure to spend time on personal projects that, for lack of a better term, are full-stack.
I'm a greaat developer, but a terrible sysadmin. I just don't have the mentality for it. I can pick up pieces, and I know the hardware, but I'll never me a good sysadmin. I have a lot of unix experience, but setting up my own laptop is always a beast of effort. I don't want to ever setup a continuous integration systems with source repo. I'll make a mess of it.
However, if you haven't had networking experience in college, you went to a terrible program. I had exposure to networking (at Cal) in 3-4 classes.
If you can program well, and like some aspects of sysadmin-ing, then becoming a Site Reliability Engineer (at eg Google) might be for you. There are always looking for people.
(I used to work as an SRE, before I made my way back into functional programming and finance recently.)
Devops right now is very lucrative. Those that understand the cloud (AWS/GCE) and the tools (Terraform/CloudFormation/Ansible/Kub/ECS/etc...) are extremely sought after.
I'm going to define "long timespan" as being ~40 years (roughly how long the boomers expected to work for)
I don't think it's "hard" per se, but it does require intentional effort. I can also see how, with a difficult work schedule, or family commitments, it may become hard or even impossible to carve out the time.
I'm about 23 years into this journey, and although I still remember plenty of esoteric trivia that was useful programming in the 90's, and is useless now, there still appears to be room to learn new things every day
It's illusory to think that we have a choice. Engineers are "dev-ops" now and sysadmins are "site reliability engineers." Everybody has to do everything.
Everybody needs to have _at least some degree of familiarity_ with everything, yes. But that has always been the case. There is not and there never has been a job posting for someone to design algorithms and do nothing else, nor to deploy HTTP servers and nothing else.
I expect that if you pick any part of any software/hardware stack, anyone with a reasonable amount of experience would be able to give real-world examples where familiarity with several (and sometimes all) other parts of that stack are not only beneficial but sometimes necessary. I also expect that illustrations would be able to found from any point in the history of computing, up to and including now.
I do hate the 'devops' prefix. It's used indeterminately by everybody unfortunately, including at a recent client that didnt have devs. It's job title inflation, a bit like calling somebody a barista because they press the coffee button.
It's great when used well. I'm a (Java) developer/architect type, but being involved more in infrastructure and deployments makes me a better overall contributor. If I follow your reasoning, my job would have been "deflated", but it doesn't feel that way.
I am a bit confused about what you are referring to. Are you saying that we do not have many choices when it comes to work? I am about to graduate, but a lot of people seem to be saying that once you gain experience it is easy to hop to different jobs.
After a while you get pigeonholed. You can hop jobs easily but it is the same shit, different name on the building. It is often easier to pivot inside a company then jump to a new company.
Personally I would appreciate when those who got pigeonholed into finding their consecutive jobs "the same sh*t" don't foreshadow that onto those yet to graduate. It's good that those coming into the job market newly know what to look out for, but the opportunities mentioned by the previous poster also mean that you can steer away from such shotty (I'm going with the auto-complete here) jobs... if you choose so.
Agreed. I've been a sysadmin, tech support nerd, developer, architect and now product manager. With each switch the value you bring jumps.
Next switch is likely starting a business.
37 y.o., still not tired of this!
I can see how someone could get bitter if they stayed in the same role/position most of their career, but in my experience this has tended to be people with a negative outlook/attitude/chip on shoulder in general... It really is what you make of it, people are people wherever you go, some are dicks, some are not.
Just accept the reality of the game and deal with it.
You don't need to spread your aura of doom everywhere you go.
I found it pretty easy so far. But I am willing to relocate intercontinentally---so there's less pressure on me to agree to the pigeonholing anyone might want to attempt.
It is, if you avoid getting stuck in a ghetto. Mobility on your part helps, too. (Both in a metaphorical sense, but also very directly: you are obviously going to have a bigger pool of potential suitors when the world is your oyster than when you are restricting yourself to one metro area.)
tl;dr: The Go implementation being faster has nothing to do with Java or Go. The version implemented in Go used a slightly different algorithm, which allowed hot inner loops to terminate early, greatly improving the performance of the algorithm.
I like Go, but it's not dramatically faster than Java. Any contest between the two of them will probably just be a back and forth of optimizations. They share pretty much the same upper bound.
I tend to agree from the naive Java approach, but with Java and multiple processes on the same machine they really should share a single JVM. Many people end up having one JVM per process. But the go runtime is less greedy and supports that logical thinking much more obviously.
To the posters point about running multiple Java aaps on a single machine, I think ram is the real issue. Java makes in my experience far less frugal use of it. I've seen Java applications eat hundreds of mbs of ram rewritten in Go using 1-2 mb.
Every time I hear this conversation play out, the Java folks point out that the GC is highly tunable/pluggable, so you can shift some levers and get lower memory consumption in exchange for more frequent collections. I'm not a GC expert, so I'm just repeating what I've heard.
This is true. The Java GC (and the whole VM) is one of thr most impressive pieces of software in existence. There are several GCs available, they're tunable, and they support monitoring better than any other I've seen.
No GC I know is happy without a heap several times larger than the program's requirements.
OTOH, Java programs tend to be built using frameworks that soak up memory like crazy. So there's that.
I'm surprised the warmup problem in Java hasn't been invoked here yet. Java has bootstrap and warm-up overhead which for comparisons like this one may have a decisive impact on the perceived results.
What will happen is (after the JVM starts), there will be (by default) 10000 interpreted iterations, then shortly after that a stop-the-world pause to replace slow interpreted code with its native counterpart JITed by a compiler thread, and probably a few more of those either optimising other, less hot parts of the code, or applying more aggressive optimizations on already JITed sections.
Go comes with AOT compilation and as far as I know, doesn't have much initial overhead. For a test that takes few seconds to complete,
I'd say this is not how you want to compare performance.
There is no stop the world point. There are compilation threads always running in the background patching hot code.
In Java 8 tiered compilation became the default. After a first threshold is reached, a method or loop will be compiled as a first approximation with further record keeping. At the second threshold, a more aggressively optimized version without record keeping will be put in place. This is kind of like the merging of the old C1 and C2 compilers into one to help get up to speed quicker.
When you want to measure this stuff you use JMH, Java Microbenchmarking Harness, an amazing piece of work that will warmup the JVM before executing performance tests. Be very careful writing your own Java timing code, and for the most part rely on JMH.
What you've said about the JVM overhead is true, but what the experiment showed is that on the time span of six seconds, implementing an algorithm that's this simple, the overhead does not result in the Java solution being slower than the Go solution.
Full native code compilation for a hot method can be triggered and completed in a fraction of a second after startup. The net result is that for algorithmic problems that finish in seconds rather than milliseconds, the JVM startup cost may not be an issue. This will only be true if, like this problem, you're loading a small number of classes. Most people's experience with crappy JVM startup time probably has a lot to do with with web application containers like Tomcat that load thousands of classes in the first five seconds. Probably a good test to do to assess real-world performance would be something that does file-system operations and employs some parsing libraries, rather than just building data structures in memory and using System.out.
You're correct that if you want to compare peak performance, you should run the tests differently, so that you're not measuring the JVM startup cost.
I think that there is probably a large overlap between the population of people who believe this and the population of people that don't really know what bytecode or JIT compilation are.
In fact, I have asked a number of people under the impression that Java is slow to explain why they think it is slow. The answers have two flavors:
1. Its high level abstractions are computationally expensive. Sometimes this idea is supported by saying that garbage collection is very slow, sometimes by saying that the inability to directly manage memory means that it is done inefficiently, perhaps even inescapably so.
2. Compiling a program to an intermediary language that is not assembly but must be instead executed by something else (in this case, the JVM) must be slow.
Controlling memory layout is one area that is very important for performance, where Java does not exactly shines. You essentially have to store your data in arrays of primitive types, to avoid memory overhead of objects (pointer to class, lock, gc bits) and additional indirection during access. While not exactly impossible, Java makes it quite inconvenient and costly to write code that takes control of memory layout.
Hence why there are ongoing improvements targeted to Java 10, to change exactly that.
Until then there are the IBM and Azul extensions, off heap memory, or optimizing in native code that 1% after using the profiler, where it really matters.
1) Controlling memory layout is not as big a deal as people think. The JVM is highly optimized (really it's probably the most optimized system ever devised) for allocating and deallocating objects on the heap.
2) It is in fact possible to control memory layout today because Java has access to offheap memory where you can have all the control you want.
This has actually been the case for nearly half a decade but for some reason a lot of people are convinced that "big arrays" can knock over the JVM.
Only by those that don't have experience in Java tooling.
High startup costs you say? Then don't use OpenJDK.
Get an AOT compiler to native code like Excelsior Jet or switch to a JVM that supports both AOT compilation to native code and JIT code caching like IBM J9 or Azul.
class scanning ? Again stop using OpenJDK and switch to a JVM that supports class sharing across VM instances like IBM J9.
For those that don't want ever to use anything other than OpenJDK, some of those features might eventually come there, like the newly introduced support for AOT compilation of the base module for Linux x64. With everything else planned for Java 10.
Bloated libraries? Well, there are bloated libraries in any programming language.
And by any change you are thinking of JEE, I doubt you had the pleasure to use C or C++ alongside DCOM, CORBA and SUN RPC, at enterprise level to fully understand the relief it was to change to JEE.
Java is not slow in practice. Many top trading systems have been written in it, including the excellent and widely licensed nasdaq architecture (omx, singapore, asx, swx, etc) and the independently designed Lmax Disruptor. You do not have to use any library you don't want to. there is nothing stopping you from building everything in preallocated byte arrays, and never triggering gc. Nio2 is fast. Jvm is a leading foundation of fast, complex distributed systems, particularly if you need multiple programmers working on a single source tree.
In my experience NASDAQ's system is both a stellar example of law-latency Java and, due to its rarity, a demonstration of the apparent difficulty. Every other trading system I have seen implemented in Java has suffered noticably from poor memory management (too much GC-related variance). I would expect that Nasdaq approaches their trading software much more like an embedded system, at which point the engineering discipline and mastery of the playform matters much more than the language.
Also as mentioned elsewhere in the thread, trading systems don't suffer from frequent cold restarts, so hotspot work amortizes extremely well. So Java is much more optimal there than e.g. a command line tool.
Yes Java is as quick as anything else once it is up and running; the point GP was making was getting to this warmed up state is what makes people perceive java as slow.
A situation that only occurs if one doesn't make use of JVM that cache JIT code like IBM J9, or willing to pay for an optimizing AOT compiler like Excelsior JET.
Also another example is those that perceive the JVM as slow due to using it with Clojure, when the slowness in this case is caused by the Clojure implementation itself.
Aren't you great. As a java developer I am well aware of them too. For most people java == java applets; or "hello world" with `javac Hello.java` and then `java -cp. Hello` followed by some interminable wait while all the apparatus of the JVM is marshalled for the sole purpose of emitting a static string and then exiting. And by interminable I mean 2 or 3 second but still.
Even in more enlightened scenarios a REPL on jruby is just eye-wateringly painful unless you've all the "secret speedups" enabled.
In many more modern development cultures such secrets are generally looked upon as "Yak Shaving".
The reputation for slowness comes from the warm-up time. There is always a very noticeable delay starting even a simple java app; whereas a native binary is relatively instantaneous. There are tricks to alleviate this issue in java (e.g. you can keep a JVM running in the background using tool like nailgun) but the default vanilla java behaviour is what most people will be familiar with.
Not as fast as dietrichepp's solution, but not terrible either, and probably the winner in terms of memory use. Since there's no state to keep track of, adding parallelism with OpenMP was trivial, too.
Wouldn't it be possible to reduce the number of input combinations to all combinations where the proportion of money for each player is the same? For example (1,4,6) should give the same result as (2,8,12), but with different scaling-factor.
Find all random games for those proportions which end after 12 turns, then change the scaling factor to make it less than 255?
The article says he doesn't buy that Java is slow. But the JVM is a JIT with a pretty significant warm-up time. Its designed for long running processes, and if you only doing things that are running for a couple seconds, then yes, Java is slow. Java becomes fast only after it's been running for a while.
The article is a counterexample to your claim. The Go program and the Java program ran in effectively the same amount of time.
I made this point in more detail upthread, but: for some CLI tasks that need to run in a few seconds, JVM performance will be adequate. For others, it will suffer.
I get the sense that the author doesn't know about algorithmic complexity. This has nothing to do with language choice and everything to do with the algorithms used.
I, on the other hand, get the feeling from reading the article that the author understands that this performance difference was due to implementation differences, but wanted a more attention-grabbing title.
My subjective impression is that this is the first time the author has come across this sort of consideration in implementation and wanted to write about it. There is nothing wrong with that - in fact, I think it's great! It's just that the title leads one to expect, well, quite a bit more.
Not the first time but the first time in a while and the first time I have ever written about the process. I was literally just writing down my experience post event.
The title was just taken from the Stackoverflow post which I agree in this context is a little click baity.
I never expected it to get this much attention though, as stated in the first sentence it was just something I did for myself more than anything.
Some of these types of problems can indeed be turned over to a constraint solver. I'd love to see a comparison with Prolog. Has anybody tried it?
The problem that I see with Prolog in this case is that a simple solution is simple because it is turning the search over to Prolog's built in constraint solver that knows nothing of the problem. This often leads to unacceptable performance with Prolog; performance problems can sometimes be addressed in practice by the appropriate placement of the cut operator within the logical specification of the constraints. This, at least in my experience, makes understanding what the program is doing very difficult.
Generally, I view Prolog (along with languages like Smalltalk) as a language appropriate for some problem domains that have come to the end of their lifetime because mainline, more generally useful, languages have adopted their features or incorporated them in libraries.
https://gist.github.com/depp/3a6f0377284fbb9b33984063856051b...
Rather than brute forcing all possible permutations, we start from all possible final states and BFS backwards towards starting states. There are probably tons of opportunities for further optimization, but this takes ~800 ms on my desktop, and it only uses one core. We don't need to track the number of turns required to end each state, since the frontier will always consist of games which require the same number of turns to end.
Edit: Shrinking my data structures dropped runtime by 50%, and easy & cheap parallelism dropped runtime by another 75% on my four-core desktop. New runtime: 110ms (Intel i5 6600k). The key to the parallelism is that since the total amount of money never changes, we can assign a different total amount of money to each thread, and they'll never have to search the same states.