Hacker News new | comments | show | ask | jobs | submit login
Show HN: GigaDiff – derisk your release with automated bug prediction (gigadiff.com)
38 points by onlyrealcuzzo 5 months ago | hide | past | web | favorite | 21 comments



Congratulations on launching! This looks like an interesting project.

Your website copy could use some spellchecking – here are some typos I noticed on https://gigadiff.com/features :

  * probility -> probability
  * chnage -> change
  * developes -> developers
  * liklihood -> likelihood
Good luck!


Hey Aaron, really appreciate you letting me know [=


huh, Gigadiff strikes me as a terrible name, because it's actively misleading. Without reading the description, I'd think it's about diff that can run on really big files.


Hey pronoiac,

Appreciate the feedback.

To give some context: GigaDiff started as an improvement on "diff" -- almost a diff++ -- in that it would show users which lines changing are associated with bugs in JIRA and occur in stack traces, have been marked as technical debt, or violate linting issues.

But the reason I got all that data together was to use it as signals for ML to detect bugs.

A rebrand in the future is possible. I'm not even sure where I'm going with this project just yet.

Thanks!


Hey HN,

Happy New year!

I hope to help some companies achieve a more stable environment this year.

Bugs and "tech debt" unfortunately take up the majority of development time where I work. So I wrote something to make it easy to identify problematic functions, giving developers the context they need to make intelligent decisions about when and how to change certain lines of code.

I'd love to get any feedback I can.

Thanks!


This looks interesting, however:

* Please make the pricing public. I do not work for that big of a company, and hidden pricing usually means "very expensive", which means I probably I will not have budget for it

* Have you considered a "free-for-open-source" plan ?

* The public documentation is really too limited to get a good idea of what it does. There are plenty of tools in this area (which often disappoint), and most people will not take the time to setup their project just to get an idea of what it does. Is it a webapp ? Can I self host it ? What languages are supported ? Which tools does it compare to (more like Sonar? more like ELK ?) and why is it better ?

* When logging in using Github oauth, read+write permissions for all my public+private repos was asked.

* After logging in, the "Read the Docs" link gives an HTTP error 502.

* If I get it right, your tool will analyze all source code + runtime logs. I am pretty sure my company would not agree for closed-source software from a new, small company to get this level of access to our data, so you have to make to make very clear what data goes where...

Good luck!


Agreed on public pricing -- this looks like something that would be quite useful to my team, but I'm not even going to check it out unless I know the ballpark we are looking at.

The read/write access also is a red flag to me, unless there is really solid info on why write access is needed, and how everything is secured.


Hey Nicoulaj,

I'm planning to have this be free for open source. I may even open source it. The ballpark figure I want to target is $1k per month for medium size companies of 20-50 engineers/qa. But I'm still trying to get more data on that.

I'm offering a free trial until March right now. I do all of the integration for you. If you're interested in a demo, you can email me at yahn007 at gmail dot com.

Appreciate the feedback.


It sounds like it creates a mapping from lines to "bugginess." How much value do you think there would be in some sort of semantic analysis, e.g. "new function foo is suspect bc it calls bar, which has shown up in a lot of stack traces lately"?


> How much value do you think there would be in some sort of semantic analysis

Disclaimer: I'm the founder of GitSense (https://gitsense.com), which is also focused on predictive defect analysis, among other things.

Incorporating semantic analysis, by cross referencing semantic code changes with bug reports/static analysis report/stack trace reports/etc., will be absolutely critical for defect analysis and automatic code generation, in my opinion. It's also not trivial, both from a computation, storage and retrieval perspective. For example, running semantic diffs analysis on every revision (on any possible branch) and cross referencing the results with external data like bug reports, continous integration results, etc. is a very expensive/complex operation.

In order for ML to work, you need good datasets and in order to generate good datasets, you need lots of raw data (static analysis results, code change history, etc.) that can be data mined and cross-referenced, to produce meaningful data. Creating good datasets for ML, in a scalable manner, will require you to rethink how to extract, store and retrieve code related information.

With GitSense, it's designed to be installed on every developer desktop/workstation, which is how I solve the computation problem. Since every developer workstation is designed to be a continuous indexing machine, indexing can be distributed across dozens, if not hundreds or thousands of machines. Being able to index and cross reference as fast a possible, is absolutely critical, since the goal is to prevent developer mistakes from happening.

Generating semantic analysis is fairly straight forward. Incorporating it, is where the challenge lies.


Hey nickelbox,

This is among the several problems that I'm trying to solve. As a developer, when you're calling an existing function -- you don't really have any data at your disposal WRT the quality of that function. Likewise for modifying individual lines. To me, it seems obvious to want this information.

But I'm trying to test that hypothesis and see if other developers feel the same way.

Thanks,


Congrats on launching!

Why isn't the pricing non-public? Is it customized for each client / use-case? If yes, can you please brief what factors made it infeasible for making it non-public?


Hey dotmanish, in short, there's some research I want to do and people I want to consult before making that public. If you're interested, you can email at yahn007 at gmail dot com and we can chat.


I did something similar years ago for my BSc thesis, but for assessing quality. I’ll keep an eye on this.


SaaS? Never ever I would trust a SaaS for tasks like that. Honestly speaking I'm pretty tired of SaaS-everything. It just does not make sense and negatively affects the progress of humankind.


Saas makes sense because many large organizations are incapable of running large data processing applications locally.

Think of it a bit like food delivery. In the past each restaurant had to hire their own delivery driver and manage the delivery operation. This is highly non-trivial and as a result only a few restaurants would take on food delivery. Now, we have services that specialize in food delivery, and they service multiple different restaurants. They have the expertise on how to operate a food delivery service and so customers get consistent and good experience across a range of restaurants. It even enabled many restaurants who otherwise wouldn't have delivery to easily add food delivery to their offerings.

That's a win-win-win for all parties. Software as a service is a similar thing. It works well for large complex applications that require a team of sysadmins with application domain knowledge to run. Yes, some huge companies will take this on themselves, but much like food delivery it's not feasible for a great many companies.


Actually, it is the smaller and midsized companies who benefit from shared food delivery services. The big ones can run it themselves and not have to pay the middleman. (Though in the end they will join the service again as it is not only a service to deliver food but also to index restaurants and order. So to have people pick your restaurant for delivery you have to join the google of food. But that is not important here.)

Point being, enterprises happily run this locally. Either as/on their cloud on premise or on their servers. Just for our team we have 300 something servers running multiple VMs. I can easily spend a few on a solution like this to do data analysis on our repos. We already have a systems team doing only tooling (build servers, jenkins, test systems, regression dashboards etc.). This would be just another thing for their list.

Having our code leave or accessed outside our premise / DMZ is much, much harder. That would pull in some very heavy discussions with the 3rd party and our legal department.


It's not just having the servers, but also the expertise and domain knowledge. Putting aside application specific knowledge (e.g. running SAS or the like), not all large companies have in-house expertise that can administer a Linux virtual machine. In my experience, it's not surprising to find IT departments that don't even know about ssh.

Remember, there are more enterprises than technology enterprises. Finance, audit, law firms, oil and gas, insurance, these places often don't have the in-house experience to run non-Windows tech stacks.


But would those companies need this service? I can hardly inagine they do.


Large companies are great at running large data processing apps. They have whole IT teams to do that. Many SAAS products are very different from food delivery, they want to access your kitchen and see all the raw ingredients going into your recipes, the excuse being “we can’t trust you to pay us what our service is worth. Well doesn’t the trust go both ways?


I appreciate your analogy, food delivery is a great thing to make it a service.

But the source code analysis is not an inherent service. This can be done in house with a greater outcome. Win-win-win and one more win for security when running in-house.

My point is that it is bad to offer exclusive SaaS solutions to non-inherently serviceable problems. This is a huge fad and delusion of nowadays.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: