When optimizing C# for space use in particular (which ends up being a time optimization for I/O heavy loads), I've leaned heavily on arrays of structs, and even bit-packed arrays (e.g. an Int32 array storing 14-bit integers, packed so that they don't align on index boundaries).
Programming at the level of C in C# loses most of the benefits, of course, but at least you can hide the optimizations behind pretty APIs, and write the non-critical parts in terms of an easier to work with lower layer.
For me, git's use case is being called from the command line, often interactively, between code editing sessions. It needs to start fast and finish fast, to not interrupt my workflow.
The JVM's cold start times are huge. On the order of a second. Noticeably slow. So slow, that in the common case of committing a few files, the JVM startup time would totally dwarf the time spent actually doing work.
Seems like there are many other potential uses for this: other Java IDEs, web applications, integration into Java SCM tools - places where Git would be very useful and start-time can be neglected.
I did this with javac, and went from 5000ms (ant) to 200ms (averages). It gets faster each time it runs, enormous speedup after the first run; but still very significant for the next 10; and continuing at a slower rate from then on.
The client needs to be non-java, to avoid the startup cost there. I run it as a console, so that is the "client" (wrapped in rlwrap, for filename completion/editing etc). My workflow is to compile in a separate shell; that's why I did it this way.
I don't like anything calling out to JNI unless it's the only sane way and it's been heavily tested.
Unmaintained code is just not something I would ever use in a production capacity/rely on for my daily work. Especially considering the changes which have come to java and the jvm since 2005.
Would it be a speed win to keep some JVM processes "half-started" if load and memory use allowed? The next started Java app will use one of the already half started JVMs.
You wouldn't need to do anything fancy to keep the JVMs around; if a half-started JVM wasn't used for so many seconds that it might be swapped out, you half-start another JVM and throw away the timed out one.
(Hmm.. I don't remember, but another process could take over stdin/-out/-err? Or you could just limit this to non-console applications.)
How much of the work time is in initializing the JVM and how much is specific for initializing the application? Could the JVM data structures (JIT etc) for an app be cached between runs?
Edit: Made a little clearer. Moved disclaimer to the top, so people know they can stop reading if I'm in the wrong forest..
Edit 2: I probably have read and unconsciously copied a similar idea that some Perl guy implemented?
It's possible to build a high level language that's as fast as C? Sure--but only if you restrict the programmer to the same amount of effort in both languages. If your application is one where it's worthwhile applying a great deal of extra programmer time in order to improve performance, a low level language will always win because it exposes more of the native machine.
The goal of a high-performance high-level language should be to provide C-like performance for a reasonably unoptimized application. Once one starts to optimize a program down to the last instruction, staying on par with a low level language becomes simply impossible for a high level language.
In my experience, every level of abstraction one creates away from the machine limits your performance by some amount. This can be demonstrated without even leaving assembly language!
Fastest possible: raw assembly code in a NASM-like assembler. With this, you can write basically any code possible with no limitations, at the cost of extremely high programmer time costs.
Shortcut: Use inline assembler instead of NASM to simplify calling convention and other niceties.
Cost: There's now a whole bunch of stuff, like calling convention optimization and computed jumps, which you can no longer do.
Shortcut: Use compiler intrinsics instead of raw assembly.
Cost: You can no longer tweak your algorithm to minimize register spills because you aren't directly controlling spills anymore.
Shortcut: Use a set of macros (like my project does) for handling calling convention, MMX/SSE abstraction, and other such simplifications.
Cost: You tend to overlook optimizations that apply to one possible output of the abstraction and not others, resulting in either messes of ifdefs or suboptimal code--the former of which is of course violating the abstraction.
Shortcut: Use a framework like liboil to write SIMD assembly instead of native code.
Cost: By using generic SIMD operators, you lose access to specialized architecture-specific operations, along with the aforementioned issue of register spills.
Here we haven't even gotten beyond assembler and we're already losing performance. Now scale this up to C and beyond: abstraction inherently comes at a performance cost. It isn't even merely a function of language: abstractions within a language reduce performance as well.
WRT SIMD operators, I'll point you to Mono.Simd.
FWIW, I think you'll make the same mistake pretty often if you continue to think of folks who play in the MS pond as bozos. There are reasons MS is a leader in many markets.
There's plenty of nomenclature abuse in that ecosystem, like using 'assemblies' to refer to packages or 'blittable' for SHM.
'Garbage Collected' (or GCed) works fine instead of 'managed', especially since 'runs on a VM' is pretty nebulous, and the whole C++/CLI thing adds some confusion.
Finally, more people using a word in an alternative way is the path to controlling its meaning, rather than avoiding the word altogether, cf "queer" and the like.
The combination of type and memory safety usually requires a garbage collector, though it's not strictly necessary.
I can't even count the number of times where I've worked on or watched a project where performance is important and the following series of events occurs:
1. Module is designed with that quote in mind: it must be simple to implement and performance is unimportant. Note that this isn't a prototype: this is a plan for the actual final product.
2. Module is built and finished, then submitted for a review. The reviewers, knowing the importance of performance for this module, point out a number of ways that the module is extremely suboptimal.
3. Because of the assumptions so heavily ingrained into the module, implementing these performance improvements--which could have been foreseen as necessary far earlier--requires a near-complete rewrite of a large part of the module.
It's exactly as if we finished our software and our customer decided that his requirements were actually totally different--except in this situation, it's entirely our fault, because performance was a requirement and we ignored it because we didn't want to optimize "prematurely". If you want performance, you have to design with performance in mind; if you don't, and then decide later you want performance, the cost of your earlier decision is magnified a hundredfold.
I have almost never seen that quote used correctly.
So yes, optimizations that are made before they should have been made, should not be made. It tells us little about optimizations other than that there is a time for which they are premature, and doing them at that time is premature.
The important decisions around performance have to do with design, not optimization. Knuth's original principle was quite explicit: forget about most small efficiencies. But people commonly use it to justify not thinking about large ones. "Don't micro-optimize prematurely" somehow got changed into "don't think about performance until the end of the project".
I have always taken this to mean what the original "premature optimization" quote is all about. I felt that making the statement a little simpler helps my focus.
Given how fast computers are these days, the powerful standard libraries high level languages give you and the fact that bugs per line is constant regardless of language - high level languages start to look pretty good. Programs don't need to be fast as possible, they just need to be fast enough.
If computing power is fungible, then a 10% performance improvement translates into a 10% saving on the hardware budget. This matters for data centers, and it matters for embedded systems where using a slightly cheaper component in a million devices might mean saving a million dollars.
I think the fallacy of the fast, modern computer comes from the fact that PCs are not very fungible at all; they come in discrete performance steps. And since your application is pretty unlikely to be the most demanding one, it will probably run just fine even without much optimization.
More like an observation that must be re-tested and re-established with each change in tooling.
Is Java bugs per line the same for IntelliJ IDEA and Emacs?
(And of course let's remember - bugs per line when? During development? In product?)
Now, if I could find a link for you, that'd be awesome. My google-fu is weak today.
Of course one needs actual performance numbers (via profiling for example) before knowing what to optimize. There is nothing worse than faith-based performance assumptions. I have seen months being wasted re-writing code in a lower level language to "make things faster" only to find out that it didn't make a difference, the code was disk or network bound and not CPU bound.
"But, JGit performs reasonably well; well enough that
we use internally at Google as a git server."
Rings the 'good enough' bell to me.
The other point is that with a distributed system like git, all of the load is on your local machine. With a centralized system like subversion, the server could do the heavy lifting (Wether this happens in practice is another matter). My point is, if you want lightening fast rev control for some reason, probably better to not use distributed.
Because branching and merging are slow? I can see this argument getting stuck in something of a rut.
Also it's a lot easier to try something experimental on a branch. If it works, merge it in. If it fails, abandon the branch.
Git makes branching and merging fast enough that using them as the default workflow makes sense.
Sure it's optimal to work linearly, but in the real world things come up. Now if you are looking for a reason to tell stakeholders to suck it up and wait, then poor branching support may help you, but that's a technical solution to a managerial problem. If you're interested in having the most flexibility to solve real world problems, then git's power is indispensable. Hell, you can work linearly all the time, and retroactively convert anything into a branch. Even if you don't need branches. Even if you are the sole developer.
Sorry if that's a little harsh, but statements like "My point is, if you want lightening fast rev control for some reason, probably better to not use distributed." smack of willful ignorance. Go use git for a while and then come back and try to say that with a straight face.
That's just the way I work.
Similarly, I don't use threads in programs unless I absolutely categorically have to. It's a similar sort of thing. Call me crazy :/
I did use git for a bit, and thought 'meh'. It didn't solve any real world problem I had.
I've always worked to constrain non-linear development to the absolute minimum required because the technical costs are easily dwarfed by the human communication overhead and inherent organizational complexity engendered in multiple disparate branches of development.
What you're talking about seems to be multiple, long-running development or release branches. Those are hard to manage and generally a bad idea in any VCS. But, with a system like Git, you don't tend to have long-lived divergent development branches. Typical branch lifetimes are more like hours or days, rather than weeks or months. There are exceptions, but by-and-large, branches are just used differently in git.
I've worked with both git and hg.
There is no "inherent organizational complexity" in me having a private branch that I share with no one and which is only used for an hour or two while I work on a critical bug fix.
What's the value of this private branch beyond simply committing to the actual branch?
If it's just a bug fix, how big can it be?
If it's more than a bug fix, why am I hiding this code from the team by implementing it on a private, local, non-backed up, non-code-reviewed, non-centralized branch?
Typical branch lifetimes are more like hours or days, rather than weeks or months. There are exceptions, but by-and-large, branches are just used differently in git.
Why wouldn't I just commit this work incrementally to the actual upstream branch, rather than hiding it for "hours or days" from the rest of the team?
To quickly comment on a couple of your other concerns, my local hard-drive is backed up, so that's irrelevant; and I personally feel that code review at the level of individual commits has very little value, and you should instead be reviewing complete feature implementations.
svn cp ^/trunk ^/branches/tentonova-bugfix-x
svn co ^/branches/tentonova ~/branch
I wouldn't call this a technical "hassle", and I'm not sure what organizational issues would arise here.
The vast majority of branches are never seen by more than one developer, they are simply an organizational tool to be used at your discretion without imposing any overhead on anyone unless you have good reason to.
Hiding your development branches on a shared codebase often incurs either communication overhead, or the costs of lack of communication.
Hiding your development branches on a shared codebase often incurs either communication overhead, or the costs of lack of communication.
If you're using the limitations of your VCS to manage team communication then you have bigger problems. The obvious analog to the "problem" you mention is people not checking in code because it's not ready yet. Maybe you think this is better because you only want deployable software in your main branch, but for large features that makes the history opaque and leaves your developer effectively without any of the benefits of version control while they are working on the large feature.
Frankly, a lot of the arguments against DVCS smack of the same sort of ignorance that the Java zealots were leveraging against Ruby back when Rails started picking up steam in 2005/2006. There's this fear that powerful features will lead to chaos and are in effect too powerful to be used safely. And the reality is that yes, in environments truly incompetent programmers work, there's definitely a strong argument to be made for limiting the damage they can do. But I think the past few years have borne out the fact that mediocre and merely-competent programmers can make strong use of these tools without leading to disaster.
Subversion 1.5, released in June of 2008, supports merge tracking.
If you're using the limitations of your VCS to manage team communication then you have bigger problems.
A simple but sufficiently powerful solution leads to simplified communication. If you're using the complexity of your VCS to hinder team communication and support cowboy coding, then you have bigger problems.
Frankly, a lot of the arguments against DVCS smack of the same sort of ignorance that the Java zealots were leveraging against Ruby back when Rails started picking up steam in 2005/2006 ... But I think the past few years have borne out the fact that mediocre and merely-competent programmers can make strong use of these tools without leading to disaster.
Nobody (intelligent) said there'd be disaster because of the "powerful features", just that operating in that manner would be more expensive than the much simpler alternatives.
Expending more effort with more powerful tools isn't actually an improvement, it's just busy-work -- constantly working on your muscle car instead of driving it.
This is a blub attitude plain and simple.
Apparently the new svn:mergeinfo property helps, but we haven't migrated at work so I don't know how well.
Every little bit not only adds up, it often multiplies in big enough projects.
In one of our big products, we have several build steps that touch upon the source control systems. 10 second VCS delay X 6 build steps, 1 second lazily written c++-header slowdown X 300 files, etc. and before you know it something that should take 10-15 minutes for a clean build now takes ~1 hour. True story. :(
Luckily, enough of us got so annoyed about this that it got fixed after a few weeks of dedicated effort. We now have a constant conscious effort to keep build times down.
XKCD "Compiling" http://xkcd.com/303/ is funny because we have all done it. If there is a noticeable lag in a development step (VCS operations, compiling, running the program, etc.), it is very likely that the developer will lose concentration and switch to reading email, browsing HN, reading /., BSing with co-workers, etc.
This takes the direct delays and turns them into exponential multipliers. The result is that 30 seconds of lag has a high risk of becoming 30 minutes of wasted time.
Now I won't be tempted to go back from scripting language country for a few more years... :-)
Edit: I should add that I have very fond memories about my youth and C, too.
C & Java for example both compile much faster than c++.
In fact, I would argue that all things being equal, the Java program is more likely to be faster (and yes, I realize there are a metric ton of potential caveats to what I just said). Why? Because Java allows you to focus on the "big picture" optimizations that really make all the difference.
On the other hand, given an infinite amount of development time and experienced developers, the C program will likely be much faster. In some cases this is necessary. But for most cases, I personally would rather just ship something than try to squeeze every ounce of efficiency out of it.
I imagine a Git-in-Haskell would be very close in performance to the C git. (Then why is Darcs so slow? Because it uses an icky imperative, mutable model, whereas git uses a immutable functional model.)
The only way JGit would ever be equal-to or faster than Git is by using better algorithms. And somehow, I don't see this happening ;)
Not actually true in reality. Modern JVMs can make on the fly optimizations based on the runtime profile, which would need to be done by hand in C.
As said elsewhere though, Java excels when used for long running tasks - servers - backends etc where it can optimize for the long term. It doesn't excel when you try and start up the jvm loads of times for quick individual jobs.
I don't understand why people are optimizing source control :/ fast vs fast meh I don't think it's an issue really.
I mention this because programmers seem prone to this, and letting theory trump fact is a great way to make sure that you never learn anything.
I think it's also important to note that the slowness is relative to an incredibly fast C implementation - taking twice the time sounds painful, but in reality the pauses caused by Git are barely perceptible as-is, and double of that is not likely to be very annoying.
The title takes one specific application in a particular niche, used in a certain way which a particular programmer can't get to run fast enough, and concludes that Java isn't fast enough. Faulty logic.
The argument about JGit vs C-git for me isn't something I feel you can learn anything from.
Really? In general? That statements seams worse than the original headline. Is Java faster than C for anything?
Yes Java can be faster than C in some situations.
Can C ever be faster than assembly? Well, in theory no. But in practice, sometimes.
Depends if you're a master assembly programmer who hand optimizes everything.
Unless the person writing the C implementation is grossly incompetent, I would like to see even one example for that.
I remember I used the shortest possible integers and the register keyword in the late 80's far too many times to count.
The point of somewhat higher-level languages like Java (I refuse to say Java is a high-level language - it would be one in the late 80s, but not today) is not to make programs run faster, but to make it easier to make them run correctly.
BTW, if it's so obvious why haven't you still provided at least one non-trivial concrete example of Java faster than C?
not that it would matter much in the case anyway, I guess.
That's something I've heard ever since hotspot came out, but it's not a silver bullet to beating C/C++ etc. Here are some server tests. Some are comparable, some are definitely not:
[Edit] Actually the state-state one averages over multiples restarts including JITed and non-JITed runs.