This seems really cool, I'd love to see it integrated with GitHub so that when I push my code to my repos it notifies CommitQ of the diffs and it can process them. I don't need a new host for my code, but more insight into what's going on there? Yes, please, I'll certainly pay for that!
I can't help but sympathize with the Morozov rant on O'Reilly when bleeding-edge tooling like this get sold as services and not offered up as free software for all the community to move forward (and it's not like both can't coexist even.) It's all in the name of startups and "rockstar teams," when open source projects themselves could benefit so much from code analysis/review tools (possibly more so, given the more fractured nature of the contributions.)
To the OP, please don't take the Code Climate (and so many others') road, consider creating an open source community around your product and then offering enterprise service on top of THAT. If you don't, you're gonna face dire competition sooner or later, even if you are the first to market. If you do, you have the chance to establish yourself an ecosystem that'll keep on giving back, in both code and publicity.
Free and open source software is awesome. However, if you spend all of your time build great tools that you give away for free, then how are you supposed to pay your rent and eat?
I know, there are plenty of open source projects that make money, and plenty of open source projects that thrive despite not making money. None of those are MVPs, like this project is. I doubt anyone who started on those projects was depending on them to make a living right away.
I think what he's saying is that the tools could be sold with a model similar to GitHub's -- the tools would be free to use for open source projects, but you'd have to pay to use them for your closed source projects. The idea is that open source software could benefit from having better tools to work with, while the software company still turns a profit off of commercial users.
How does one make sure that their code isn't used in a closed source project though? What's to stop them from using it if no outsider will see their code?
You can't, if the code is hosted separately from the tool.
The solution is to provide the tool and the hosting together. For example, GitHub or Bitbucket could acquire this service and add it to their offerings, and it would fit right in with their existing business model.
I think you're getting even further away from open source.
If this tool could connect to a remote repo (like the GitHub repo) and operate on the changelog, then you'd be fine. No one would be able to use it privately without publicly exposing their repo.
Perhaps then you could charge a fee to get an ssh key to add to you authorized_keys file, which would allow for private use.
That's is an interesting proposition. Especially considering that hosting is commoditized. Annotations and providing insights is the real value proposition in this case, so it might make more sense to offer that as a discrete service.
Interesting project. Semantically organizing code is a longstanding problem in the world of software development. The primary concern with this project, or any other attempts at solving this problem is practicality and actual utility.
I suppose utility rests on whether you treat it as a collaborative development platform or merely as a host for your code (and keep track of discussions, proposed changes, and what-not through other means).
Essentially, whether or not it turns out to be useful assumes that it is used as the central collaboration platform to an extent. Though on the other hand, merely tracking changes on a higher-level could be sufficient.
> I suppose utility rests on whether you treat it as a collaborative development platform or merely as a host for your code (and keep track of discussions, proposed changes, and what-not through other means).
They both naturally blend together. I honestly can't understand what is the difference between these two that you are thinking about.
> Essentially, whether or not it turns out to be useful assumes that it is used as the central collaboration platform to an extent.
This is the initial plan.
> Though on the other hand, merely tracking changes on a higher-level could be sufficient.
Some day we can implement a client side structural diff tool.
Giving more "engineering" tools to devs is finally getting more attention which is awesome. I'd love to use this kind of tech but don't want to move source repository host and also want something that would work with my stack and intricately so, ie providing more than just code modification but introduction of potential security holes and/or perf issues (like doing database queries in for loops, etc), regardless of whether it was RoR or ASP.NET MVC or X, Y, Z.
I would be really interested in some kind of open source project that offered this kind of tooling with said vision that I could use and commit back to...
You can do some static-analysis on languages like Ruby/Python but their lack of rigorous typing makes that job quite hard, actually.
Then you have other languages, king of the hill being Haskell - where practically all of these tools exist (minus the web-browsable/social aspect) because they are straight-forward (not easy, but straight-forward) to build as the language's type system is much deeper and it also encourages the programmer to be more thorough with their program.
Could you imagine having static-analysis built into your linter, where it gives you recommendations, optimizations, and warnings WHILE you are writing code? Yes, thank you Haskell. The best part about it too is the people contributing to that eco-system aren't average programmers, they are typically PhD holding industry programmers or academics (who contribute for free because there's no economic incentive to hide your work) that know far more than I do; all I have to do is be smart enough to use what they build.
On the topic of these "engineering" tools available for devs:
It's funny because engineering (if you can even call it that) for web applications only recently became a big juicy market [for engineers]. Most of the "engineers" were working on software for embedded systems or really big systems that were rarely ever end-user consumed as web applications are today. So tooling for engineers has actually been around; very good tooling in fact.
It's just that we now have a slew of people writing software in a very big market using languages that ARE NOT SAFE to program in! That's the story though, these languages {python | ruby | php | perl} are much easier to learn and "get going" in than something like Haskell, O'Caml, Mercury, Scala, Erlang, C++ &c... By being easier to pick up, [the languages] CREATED the market we are all now participating in. The downside though, is we have a sewer of code floating around because humans by nature, are terrible engineers. Our #1 priority SHOULD BE building tools to check and verify our end product. Some teams go so far as to have teams that are dedicated to writing that software and another team dedicated to writing the software that verifies the verification software! (this is typically when lives are at stake though - side-thought: with the bitcoin thefts and loss of identity on the net now though; one could argue that "lives" are at stake in an existential sense)
I consider Haskell itself to be top of the line tooling for engineerings - of all sorts, even web! Aside from the static analysis tools outside of the language, the language itself is a dream-come-true for real-world programmers such as myself. I fuck up my programs all the time, and my Python programs are ugly and if my tests don't catch an error in my programming my users end up catching it. With Haskell the only thing my users EVER catch are business logic bugs and not programmer error.
While static analysis can be quite a powerful tool, it doesn't magically fix everything. Some of the typical web-application vulnerabilities are shell/SQL injections, XSS, and CSRF. Also, passwords saved in a insufficiently hashed form, or even in plain-text. All of those are bugs on the architecture level, or "business logic bugs".
You can certainly prevent shell/SQL injection very easily with a good type system. Yesod, one of the Haskell web frameworks, certainly does this. Basically, you just give strings gotten from the user a different type than normal strings so that you can't use them without sanitizing or explicitly circumventing the sanitizer.
I don't know enough about web development to tell you if you can prevent XSS or CSRF, but I wouldn't be surprised if you could.
The important insight is that a good type system can fairly easily do much more than most people realize. Certainly far more than you can do with languages like Java!
Languages with good type systems (haskell, scala, ocaml) can certainly address these. I haven't been following liftweb (scala) closely but I give the author, DPP, credibility, when he says that he's addressed the first three.
This is pretty amazing. I regularly have conversations with other programmers where we talk about needing this sort of tool.
Question: For every language, are you writing a full-blown language parser to get the semantic information you need? Can I hook in new parsers to add language support?
> Question: For every language, are you writing a full-blown language parser to get the semantic information you need?
That is correct. The structural diff algorithm is generic, it operates on langauge-neutral feature trees. And those are built with the langauage-specific parsers.
> Can I hook in new parsers to add language support?
Writing a parser is not an easy job. But some day we hope to open API for developers to write custom plugins.
I've had the opportunity to write a small interpreter on the job before. Trust me: I know it is no small task to bite off something like this! Tip of the hat to you.
I'm currently stuck in the .NET world and one of the things I've been working on lately was to have a program fix certain aspects of a medium-to-large code base, but done via semantic parsing of the code base so that I know I'm typesafe and such.
In C# there are open libraries like NRefactor, of course Mono itself, and in the Microsoft world they are working on their own compiler as a service product named Roslyn. Do you think any of these efforts would even help an effort such as your are doing?
I'm asking not so much for C# stuff, but because I feel a momentum coming up that could enable stuff such as a live coding environment (my code is in execution as soon as I write it), and the idea of the debugger is the same as my production run-time (not sort of the same, but really the same). I wish up and coming languages would tackle this stuff head on today.
Sure, more languages are on the way! The problem with PHP, however, is that when it is used in a HTML page as template language, you can't get much from it. On the other side, when it is a simple class file with no HTML markup, things should work as with any other language.
When it's used as a HTML page with interspersed PHP, you just convert the HTML to strings. Look at the phc parser (http://phpcompiler.org). You can definitely get good static analysis info (I wrote the phc static analyser).
The semantic diff is pretty cool. It will probably be pretty hard to apply that to C (cpp) and Lisp (macros, and especially the programmable readers in some Lisps). But maybe we don't actually need a solution for the general case and displaying "most" changes in a semantic way is enough.
I imagine these higher-level diffs could be especially useful for things where one doesn't have that much insight into the source code - for example, scanning for API changes between two versions of a Framework (where maybe even displaying only changes to public methods/exports would be enough). In any case, diffs on the AST level seem like a good, and quite under-explored, idea.
> It will probably be pretty hard to apply that to C (cpp) and Lisp (macros, and especially the programmable readers in some Lisps).
C is really hard to parse because of preprocessor. We built a prototype parser for it, and it mostly works, but it's not ready yet for the prime time.
Lisp must be a much simpler language to parse, though.
> scanning for API changes between two versions of a framework
Yes, we have this idea on the to-do list! Imagine that you are comparing two different branches of development, or two different tagged releases, and see high-level API changes between them.
Cool stuff! This is something that can be improved instead of huge line diffs. For some input, I generally don't use `git diff`, instead I alias `git df` to `git diff --word-diff`, which makes it so much easier for me to see my changes.
I'd love to see you open source the diff part of things and figure out a different monetization process. If you can solve looking at diffs well, I'd love to see it on every code website - github, bitbucket, etc.