

Show HN: CommitQ – Programming language-aware Git repository hosting - caustic
http://commitq.com/blog/2013/04/01/hello/

======
goodwink
This seems really cool, I'd love to see it integrated with GitHub so that when
I push my code to my repos it notifies CommitQ of the diffs and it can process
them. I don't need a new host for my code, but more insight into what's going
on there? Yes, please, I'll certainly pay for that!

~~~
lukeholder
This is what <https://codeclimate.com/> do.

They started with code static analysis metrics, but have now added security
analysis too. Ruby only, but same concept applies.

~~~
goldfeld
I can't help but sympathize with the Morozov rant on O'Reilly when bleeding-
edge tooling like this get sold as services and not offered up as free
software for all the community to move forward (and it's not like both can't
coexist even.) It's all in the name of startups and "rockstar teams," when
open source projects themselves could benefit so much from code
analysis/review tools (possibly more so, given the more fractured nature of
the contributions.)

To the OP, please don't take the Code Climate (and so many others') road,
consider creating an open source community around your product and then
offering enterprise service on top of THAT. If you don't, you're gonna face
dire competition sooner or later, even if you are the first to market. If you
do, you have the chance to establish yourself an ecosystem that'll keep on
giving back, in both code and publicity.

~~~
rcfox
Free and open source software is awesome. However, if you spend all of your
time build great tools that you give away for free, then how are you supposed
to pay your rent and eat?

I know, there are plenty of open source projects that make money, and plenty
of open source projects that thrive despite not making money. None of those
are MVPs, like this project is. I doubt anyone who started on those projects
was depending on them to make a living right away.

~~~
profquail
I think what he's saying is that the tools could be sold with a model similar
to GitHub's -- the tools would be free to use for open source projects, but
you'd have to pay to use them for your closed source projects. The idea is
that open source software could benefit from having better tools to work with,
while the software company still turns a profit off of commercial users.

~~~
bfish510
How does one make sure that their code isn't used in a closed source project
though? What's to stop them from using it if no outsider will see their code?

~~~
profquail
You can't, if the code is hosted separately from the tool.

The solution is to provide the tool and the hosting together. For example,
GitHub or Bitbucket could acquire this service and add it to their offerings,
and it would fit right in with their existing business model.

~~~
rcfox
I think you're getting even further away from open source.

If this tool could connect to a remote repo (like the GitHub repo) and operate
on the changelog, then you'd be fine. No one would be able to use it privately
without publicly exposing their repo.

Perhaps then you could charge a fee to get an ssh key to add to you
authorized_keys file, which would allow for private use.

------
ziyadb
Interesting project. Semantically organizing code is a longstanding problem in
the world of software development. The primary concern with this project, or
any other attempts at solving this problem is practicality and actual utility.

I suppose utility rests on whether you treat it as a collaborative development
platform or merely as a host for your code (and keep track of discussions,
proposed changes, and what-not through other means).

Essentially, whether or not it turns out to be useful assumes that it is used
as the central collaboration platform to an extent. Though on the other hand,
merely tracking changes on a higher-level could be sufficient.

~~~
caustic
> I suppose utility rests on whether you treat it as a collaborative
> development platform or merely as a host for your code (and keep track of
> discussions, proposed changes, and what-not through other means).

They both naturally blend together. I honestly can't understand what is the
difference between these two that you are thinking about.

> Essentially, whether or not it turns out to be useful assumes that it is
> used as the central collaboration platform to an extent.

This is the initial plan.

> Though on the other hand, merely tracking changes on a higher-level could be
> sufficient.

Some day we can implement a client side structural diff tool.

------
roskilli
Giving more "engineering" tools to devs is finally getting more attention
which is awesome. I'd love to use this kind of tech but don't want to move
source repository host and also want something that would work with my stack
and intricately so, ie providing more than just code modification but
introduction of potential security holes and/or perf issues (like doing
database queries in for loops, etc), regardless of whether it was RoR or
ASP.NET MVC or X, Y, Z.

I would be really interested in some kind of open source project that offered
this kind of tooling with said vision that I could use and commit back to...

~~~
Ixiaus
On the topic of languages and static analysis:

You can do _some_ static-analysis on languages like Ruby/Python but their lack
of rigorous typing makes that job quite hard, actually.

Then you have other languages, king of the hill being Haskell - where
practically all of these tools exist (minus the web-browsable/social aspect)
because they are straight-forward (not easy, but straight-forward) to build as
the language's type system is much deeper and it also encourages the
programmer to be more thorough with their program.

Could you imagine having static-analysis built into your linter, where it
gives you recommendations, optimizations, and warnings WHILE you are writing
code? Yes, thank you Haskell. The best part about it too is the people
contributing to that eco-system aren't _average_ programmers, they are
typically PhD holding industry programmers or academics (who contribute for
free because there's no economic incentive to hide your work) that know far
more than I do; all I have to do is be smart enough _to use what they build_.

On the topic of these "engineering" tools available for devs:

It's funny because engineering (if you can even call it that) for web
applications only recently became a big juicy market [for engineers]. Most of
the "engineers" were working on software for embedded systems or really big
systems that were rarely ever end-user consumed as web applications are today.
So tooling for engineers has actually been around; _very good tooling_ in
fact.

It's just that we now have a slew of people writing software in a very big
market using languages that _ARE NOT SAFE_ to program in! That's the story
though, these languages {python | ruby | php | perl} are much easier to learn
and "get going" in than something like Haskell, O'Caml, Mercury, Scala,
Erlang, C++ &c... By being easier to pick up, [the languages] CREATED the
market we are all now participating in. The downside though, is we have a
sewer of code floating around because humans by nature, are terrible
engineers. Our #1 priority SHOULD BE building tools to check and verify our
end product. Some teams go so far as to have teams that are dedicated to
writing that software and another team dedicated to writing the software that
verifies the verification software! (this is typically when lives are at stake
though - side-thought: with the bitcoin thefts and loss of identity on the net
now though; one could argue that "lives" are at stake in an existential sense)

I consider Haskell itself to be top of the line tooling for engineerings - of
all sorts, even web! Aside from the static analysis tools outside of the
language, the language _itself_ is a dream-come-true for real-world
programmers such as myself. I fuck up my programs all the time, and my Python
programs are ugly and if my tests don't catch an error in my programming my
users end up catching it. With Haskell the only thing my users EVER catch are
_business logic bugs_ and not programmer error.

~~~
rwos
While static analysis can be quite a powerful tool, it doesn't magically fix
everything. Some of the typical web-application vulnerabilities are shell/SQL
injections, XSS, and CSRF. Also, passwords saved in a insufficiently hashed
form, or even in plain-text. All of those are bugs on the architecture level,
or "business logic bugs".

~~~
tikhonj
You can certainly prevent shell/SQL injection very easily with a good type
system. Yesod, one of the Haskell web frameworks, certainly does this.
Basically, you just give strings gotten from the user a different type than
normal strings so that you can't use them without sanitizing or explicitly
circumventing the sanitizer.

I don't know enough about web development to tell you if you can prevent XSS
or CSRF, but I wouldn't be surprised if you could.

The important insight is that a good type system can fairly easily do much
more than most people realize. Certainly far more than you can do with
languages like Java!

------
gabriel
This is pretty amazing. I regularly have conversations with other programmers
where we talk about needing this sort of tool.

Question: For every language, are you writing a full-blown language parser to
get the semantic information you need? Can I hook in new parsers to add
language support?

~~~
caustic
> Question: For every language, are you writing a full-blown language parser
> to get the semantic information you need?

That is correct. The structural diff algorithm is generic, it operates on
langauge-neutral feature trees. And those are built with the langauage-
specific parsers.

> Can I hook in new parsers to add language support?

Writing a parser is not an easy job. But some day we hope to open API for
developers to write custom plugins.

~~~
gabriel
This alone is awesome.

I've had the opportunity to write a small interpreter on the job before. Trust
me: I know it is no small task to bite off something like this! Tip of the hat
to you.

I'm currently stuck in the .NET world and one of the things I've been working
on lately was to have a program fix certain aspects of a medium-to-large code
base, but done via semantic parsing of the code base so that I know I'm
typesafe and such.

In C# there are open libraries like NRefactor, of course Mono itself, and in
the Microsoft world they are working on their own compiler as a service
product named Roslyn. Do you think any of these efforts would even help an
effort such as your are doing?

I'm asking not so much for C# stuff, but because I feel a momentum coming up
that could enable stuff such as a live coding environment (my code is in
execution as soon as I write it), and the idea of the debugger is the same as
my production run-time (not sort of the same, but really the same). I wish up
and coming languages would tackle this stuff head on today.

~~~
caustic
> Do you think any of these efforts would even help an effort such as your are
> doing?

If anyone else is going to make something similar for .NET, then yes.

But we are going to build our own custom C# language parser.

------
paulyg
This has some real promise. Congrats! Are you planning to parse PHP?

~~~
caustic
> This has some real promise. Congrats!

Thanks!

> Are you planning to parse PHP?

Sure, more languages are on the way! The problem with PHP, however, is that
when it is used in a HTML page as template language, you can't get much from
it. On the other side, when it is a simple class file with no HTML markup,
things should work as with any other language.

~~~
pbiggar
When it's used as a HTML page with interspersed PHP, you just convert the HTML
to strings. Look at the phc parser (<http://phpcompiler.org>). You can
definitely get good static analysis info (I wrote the phc static analyser).

------
rwos
The semantic diff is pretty cool. It will probably be pretty hard to apply
that to C (cpp) and Lisp (macros, and especially the programmable readers in
some Lisps). But maybe we don't actually need a solution for the general case
and displaying "most" changes in a semantic way is enough.

I imagine these higher-level diffs could be especially useful for things where
one doesn't have that much insight into the source code - for example,
scanning for API changes between two versions of a Framework (where maybe even
displaying only changes to public methods/exports would be enough). In any
case, diffs on the AST level seem like a good, and quite under-explored, idea.

~~~
caustic
> It will probably be pretty hard to apply that to C (cpp) and Lisp (macros,
> and especially the programmable readers in some Lisps).

C is really hard to parse because of preprocessor. We built a prototype parser
for it, and it mostly works, but it's not ready yet for the prime time.

Lisp must be a much simpler language to parse, though.

> scanning for API changes between two versions of a framework

Yes, we have this idea on the to-do list! Imagine that you are comparing two
different branches of development, or two different tagged releases, and see
high-level API changes between them.

------
miles_matthias
Cool stuff! This is something that can be improved instead of huge line diffs.
For some input, I generally don't use `git diff`, instead I alias `git df` to
`git diff --word-diff`, which makes it so much easier for me to see my
changes.

I'd love to see you open source the diff part of things and figure out a
different monetization process. If you can solve looking at diffs well, I'd
love to see it on every code website - github, bitbucket, etc.

------
j_s
Plastic SCM's free-as-in-beer desktop diff/merge tools include enhanced diff
functionality, including help when refactoring:

<http://plasticscm.com/features/xmerge.aspx>

~~~
psantosl
And we're working on something even better...
<http://plasticscm.com/sm/index.html>

------
psantosl
You might find this one interesting too:
<https://news.ycombinator.com/item?id=5520321>. It is about the new desktop
semantic merge tool...

