
Launch HN: DeepSource (YC W20) – Find and fix issues during code reviews - dolftax
Hi HN! We&#x27;re Jai and Sanket — founders of DeepSource (<a href="https:&#x2F;&#x2F;deepsource.io" rel="nofollow">https:&#x2F;&#x2F;deepsource.io</a>). We’re automating objective parts of code review using static analysis to ensure the code is free of common issues (anti-patterns, bug risks, performance bottlenecks, and security flaws) before a reviewer looks at it. This prevents the reviewer from having to manually point out objective issues and ensures they don’t make it to production.<p>After college, Sanket co-founded DoSelect where I joined as the first engineer. Both of us have been contributing to open-source projects for a few years then. In the beginning, we didn’t have any processes setup around code reviews. We had some IDE plugins to run the linters, and some team members used them as pre-commit hooks. We didn’t have any tests back then and used to spend too much time on some pull requests pointing out improvements and if the pull request was very large, we never reviewed it — direct merge. Then the engineering team started to grow, multiple folks started contributing to the same repositories and pull requests were often stuck for 5-7 days without any activity. To make sure the new commits are free of the common issues, we added multiple static analysis tools as part of our CI jobs. This became a pain sooner than expected as they were throwing hundreds of lines of logs in the CI and we had to fight through duplicate issues. Critical issues were hidden amongst other minor issues and false-positives, and often missed. Once a while, we tweaked the linter config files with the issues that didn’t make sense to us — to reduce noise in the CI logs. It didn’t work out after a while and we invested in a couple of commercial code quality tools but ended up disabling them as well. Their issues weren’t categorized or prioritized, analyzers were never updated with new rules, didn’t have any way to report false-positives.<p>We came across a paper — Lessons from building static analysis at Google [1]. It is a beautiful paper with the following insights: 1) Static analysis authors should focus on the developer and listen to their feedback 2) Careful developer workflow integration is key for static analysis tool adoption 3) Static analysis tools can scale by crowdsourcing analysis development.<p>We started building DeepSource in December 2018. The initial release supported Python and integrated with GitHub. Our approach was to first curate all the issues available from open-source static analysis tools, de-duplicate them, add better descriptions with external reference links — so you just add python analyzer to the `.deepsource.toml` file with some metadata (version, test patterns, exclude patterns,.) and analysis will run on every commit and pull request. To cut down the noise, we only show you newly introduced issues in the pull-request by default, based on the changeset — and not all the issues present in the changed files. We also provide a way for you to report false-positive issues directly from the dashboard. If the report is valid, we update the analyzers to resolve it within 48 - 72 hours. After this release, we started writing our own rules by walking through the Abstract Syntax Tree to find patterns. So far, we’ve 520+ types of issues in the Python analyzer. Some of the custom issues we added recently are: File opened without the `with` statement, using `yield` in comprehension instead of a generator expression, use items() to iterate over a dictionary.<p>A few months back, we released the Go analyzer and also added support for GitLab. We’re working on supporting Ruby and JavaScript and integrations for Bitbucket and Azure DevOps. The analyzers are not limited to programming languages, and we added one for Dockerfile and Terraform as well. DeepSource is free to use for open-source repositories and we make money from private repositories based on a per developer per month&#x2F;year subscription.<p>Lately, we realized some of the issues were occurring in tens of files. Though DeepSource reports them, one had to manually fix all the occurrences. We just released autofix support in Python for 15 most commonly occurring issues to start with. Autofix uses Concrete Syntax Tree to visit the issue location and make modifications in the code for which the issue is raised, and then generate a patch for that modification. When an autofix is available for an issue, you can view the suggested patch and on approval, a pull request will be created with the fixes. We&#x27;re working on improving the coverage of issues we can autofix across the analyzers we support.<p>Give us a try: <a href="https:&#x2F;&#x2F;deepsource.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;deepsource.io&#x2F;</a>
Here is the documentation: <a href="https:&#x2F;&#x2F;deepsource.io&#x2F;docs&#x2F;" rel="nofollow">https:&#x2F;&#x2F;deepsource.io&#x2F;docs&#x2F;</a><p>We would love to hear your experience using these tools and feedback&#x2F;suggestions on how can we improve! Please let us know in the comments. We’re also at founders [at] deepsource.io.<p>[1] <a href="https:&#x2F;&#x2F;research.google&#x2F;pubs&#x2F;pub46576&#x2F;" rel="nofollow">https:&#x2F;&#x2F;research.google&#x2F;pubs&#x2F;pub46576&#x2F;</a>
======
jakearmitage
> Enable analyzers by adding .deepsource.toml

Why is this needed? Can't you imply the necessary analyzers from my codebase?

There is no support for Javascript, Typescript, PHP, Java or C#. No HTML or
CSS support. Is there a roadmap?

Also, the name implies use if Deep Networks and AI. Am I mistaken? If not,
what kind of AI is used here? Seems like just an automatic runner of static
analysis tools.

~~~
dolftax
> Why is this needed? Can’t you imply the necessary analyzers from my
> codebase?

Sure. We can probably infer the languages used in the repository. But we need
metadata like test glob patterns, exclude patterns, runtime versions (Python
2, Python 3) to improve the accuracy of issues. For ex: Usage of assert
statement in application logic is discouraged as it is removed when compiling
to optimised byte code (python -o producing *.pyo files). Ideally, assert
statement should be used only in tests. Also, we haven’t found a way to infer
Python 2 vs Python 3 accurately. Can you think of a way? That would be
helpful.

> There is no support for Javascript, Typescript, PHP, Java or C#. No HTML or
> CSS support. Is there a roadmap?

We strongly believe in starting out with a few languages and add as many
issues as we can (with the ability to autofix most of them) -- before we go
broad. That said, we released Ruby in beta couple weeks back and are currently
working on the stable release. We’ve also started working on JavaScript (with
TypeScript support) a month back and we should release the beta version of
JavaScript analyzer in approx a month from now.

> Also, the name implies use if Deep Networks and AI. Am I mistaken? If not,
> what kind of AI is used here? Seems like just an automatic runner of static
> analysis tools.

It’s just the name :) We do not use Machine Learning or AI at the moment — the
reason being we’re optimizing for high accuracy, and a rules engine that uses
AST parsing helps us do that reliably. We do plan to use learning in the
future to capture data around which issues are being fixed the most and which
are not, and then show issues in the most relevant order to users depending on
their context.

------
FanaHOVA
Congrats on the HN launch guys :) Excited to see Javascript being added to the
list of supported languages soon.

~~~
futhey
How can we get notified when Javascript (Node.js?) support launches?

~~~
dolftax
We'll tweet about it at
[https://twitter.com/deepsourcehq](https://twitter.com/deepsourcehq)

------
punkohl
How is this different (or better?) than existing products that offer the same
service, such as Codacy.

Have been a paying customer of Codacy’s for ~2 years and they support most
languages out of the box at this point, with Git integration similar to your
own.

Curious on your thoughts.

~~~
sanketsaurav
A few differentiators:

* More issue coverage — for Python, we detect 520+ issues. We also enable you to run things like type checking (if you're using type hints) just by enabling it in the config.

* Custom issues — we have an analyzer team that keeps adding new, novel checkers to the analyzer for common bugs and anti-patterns.

* Fewer false positives — we've optimized our analyzers for reporting less than 5% false positives. On the lowest level, we write augmentations to each checker to remove known false-positives and noise. On the application level, we enable users to very easily ignore issues (for a file, all test files, some file patterns), and also report a false positive. We monitor all false-positive reports and proactively improve our analyzers to resolve them.

* Autofix — we just released this, which allows you to automatically fix some commonly occurring issues directly from DeepSource. In future, we will add more autofixers for issues, so at least 70% issues that we detect can be reliably autofixed.

~~~
punkohl
Based on the points above, I am still not convinced this is significantly
better than existing players, but I could be wrong. Additionally, some of the
problems you've mentioned in other comments have already been solved by your
competitors.

Do you think the fact that you're a late entrant into this market makes it
difficult and/or challenging for your team? Why have your customers chosen you
over other platforms? I'm mostly curious and not trying to put you down.

------
bradleybuda
Congratulations on the launch! Have been following the team's progress for a
while and truly impressed with the pace of feature development while keeping
the core product extremely simple. Happy to be a customer.

------
soumyadeb
This looks awesome - congrats on the launch.

Quick question: I tried setting it up but its asking for Write access to the
pull requests. I am a bit wary about giving write access - is this required?

~~~
dolftax
There are two GitHub apps we maintain. One with read access (DeepSource) and
one with write access (DeepSource Autofix).

By default, on signup, you would be installing the app with read access --
this enables us to pull source code from GitHub on every commit and pull-
request, run analysis and report issues as GitHub checks. This is sufficient
if you would like to use DeepSource only to flag issues.

With the release of Autofix -- when a fix is available for a flagged issue,
DeepSource creates a pull request to the repository with the patch. For this,
you would be asked to install the app with write access (DeepSource Autofix).
Note that, DeepSource always creates a separate branch with the fixes and
creates a pull request. We do not perform any write operations beyond the
above mentioned scope.

------
Supermancho
This is a serverside pre-PR hook for an analyzer, as I understand it.

I was hoping this was a code review tool that allows you to modify the PR
without making a commit-merge-push loop, which could have approved changes
automagically pulled locally (for the loop). This would save a TON on small
edits that many PRs require, including any additional comments that people
might want to add to code that come up during PR...modern PRs is where context
goes to die.

~~~
dolftax
We went ahead with integration with providers like GitHub and GitLab to have
these checks in a central place as it is the easiest way for a team to adopt a
tool like ours. Also, just having a local or IDE plugin doesn't ensure these
issues never make it to trunk unless everyone in the team follows it strictly.

That said, for the convenience of developers, we're working on the ability to
run the analysis and the fixes using our CLI. [1] This opens up doors to use
the CLI and build IDE plugins in the near future.

[1]
[https://github.com/deepsourcelabs/cli/issues/15](https://github.com/deepsourcelabs/cli/issues/15)

------
sobolevn
There's a better solution: use open-source cli tools that do just that!

1\. 520 Python checks? Use `wemake-python-styleguide` (wrapper around flake8)
that has bigger amount of checks: [https://github.com/wemake-services/wemake-
python-styleguide](https://github.com/wemake-services/wemake-python-
styleguide) There's also `pylint` with a set of awesome checks as well.

2\. Type checking? Use `mypy`: it just a single command!

3\. Autofixing? Use `black` / `autopep8` / `autoflake` and you can use
`pybetter` to have the same ~15 auto-fix rules. But, it is completely free and
open-source

I don't like this whole idea of such tools (both technically and ethically):

\- Why would anyone want to send all their codebase to 3rd party? We used to
call it a security breach back in the days

\- On moral side, this (and similar) projects look like thin wrappers around
open-source tools but with a monetisation model. How much do these companies
contribute back to the original authors of pylint, mypy, flake8? Ones who
created and maintained them for years. I will be happy to be wrong here

~~~
dolftax
> There's a better solution: use open-source cli tools that do just that!

We do not deny that you can't run the open-source tools locally. Be it one
line command, or be it setting up pylint or flake8 with dedicated
configurations. DeepSource is a tool meant to eliminate the need to set up all
those open source tools locally or in your CI pipeline. So that you don't need
to

\- Fish for issues amongst hundreds of lines of logs in the CI

\- Figure out and update linter config to remove duplicates and false
positives (for ex: Bandit throws errors like `assets statement used` in a test
file — which is a false-positive. Bandit doesn’t know that it is a test file
by default)

\- Some issues needed better description of why is that an issue, for ex: why
should default file permissions be 0600? Justification on why is it
necessary,.

\- By default on every commit or pull request, linters run on all the files.

\- If there are issues that occur in say 50 places, one have to manually fix
it.

> 1\. 520 Python checks? Use `wemake-python-styleguide` (wrapper around
> flake8) that has bigger amount of checks: [https://github.com/wemake-
> services/wemake-python-styleguide](https://github.com/wemake-
> services/wemake-python-styleguide) There's also `pylint` with a set of
> awesome checks as well.

Our focus at the moment is not on style issues. In fact, amongst the
categories of issues we raise (anti-patterns, bug-risks, performance,
security, style, documentation), style issues are the most debated on by our
users as it is really subjective. We’re thinking of removing style issues by
default (as an opt-in) and are working on running formatters like `black`,
`yapf`, .. with a single line config in `.deepsource.toml`. Our analyzer team
actively adds custom rules which you don’t get from the open-source tools. The
following issues for example:

\- Raising another exception when `assert` fails is ineffective. For ex:
`assert isinstance(num_channels, int), ValueError('Number of image channels
needs to be an integer')`

\- If the condition would not be satisfied, user would be expecting a
`ValueError`, but this would be raised: `AssertionError: Number of image
channels needs to be an integer` which should be

\- `yield` used inside a comprehension (which breaks code in Python 3.8)

\- Write operation on file that is opened in read-only mode

\- I/O detected on a closed file descriptor

> 2\. Type checking? Use `mypy`: it just a single command!

Sure. If one prefers running it locally (or) as part of their CI. But if you
already use DeepSource to flag issues, it can be enabled by a single line in
.deepsource.toml file.

> 3\. Autofixing? Use `black` / `autopep8` / `autoflake` and you can use
> `pybetter` to have the same ~15 auto-fix rules. But, it is completely free
> and open-source

We are working on adding support for autopep8, black and autoflake in coming
weeks. They mostly auto-patch stylistic issues [1]. Thanks for letting us know
about pybetter. It looks like a great tool and fixes ~9 issues [2].
DeepSource’s autofix aim is to fix more than 3/4th of issues we detect and we
detect 522 issues in our Python analyzer. We have dedicated engineering team
actively working on the analyzers. As of today, following are some of the
issues our Python analyzer can autofix (which I couldn’t find it among the
open-source tools):

\- No use of `self`

\- Usafe of dangerous default argument

\- Module imported but unused

\- Function contains unused argument

\- Debugger import detected

\- Debugger activation detected

\- Unnecessary comprehension

\- Unnecessary literal

\- Unnecessary call

\- Unnecessary typecast

\- Bad comparison test

\- Empty module

\- Built-in function `len` used as condition

\- Unnecessary `fstring`

\- `raise NotImplemented` should be `raise NotImplementedError`

\- `assert` statement used outside of tests

Same goes with Go and other analyzers we support.

> I don't like this whole idea of such tools (both technically and ethically):
> > Why would anyone want to send all their codebase to 3rd party? We used to
> call it a security breach back in the days.

We follow strict security practices [3]. In a gist, 1) We do not store your
code, 2) Source code is pulled in an isolated environment that has no access
to any of our internal systems or the external network, 3) As soon as the
analysis is completed, the environment is destroyed and all logs are purged.
Also, there are many tools that developers use everyday (Travis CI, Circle CI,
GitHub) where the source code is sent to the cloud — I don't think it is
accurate to call it a security breach. That said, we have on-premise setup of
DeepSource in the roadmap. We’re working on SOC 2 Type 2 compliance as well
[4].

> On moral side, this (and similar) projects look like thin wrappers around
> open-source tools but with a monetisation model. How much do these companies
> contribute back to the original authors of pylint, mypy, flake8? Ones who
> created and maintained them for years. I will be happy to be wrong here

We have kept the tool completely free to use for open-source projects. We’ve
also partnered with GitHub Education and made it free for students. We’re an
early stage company trying to build a business in automating objective parts
of code review and making it easier for every developer to adopt and use
static analysis. With all transparency, we had plans to sponsor open-source
projects but got sidetracked due to various reasons. We will be backing some
of the open-source projects, in next couple of weeks.

[1]
[https://gist.githubusercontent.com/jaipradeesh/6ad8404fef253...](https://gist.githubusercontent.com/jaipradeesh/6ad8404fef253547ec8ca9b7fd187938/raw/f34ebfde7933b9cc7f1015e0b9176d1eabaccc64/tmp.md)

[2]
[https://gist.githubusercontent.com/jaipradeesh/b8a0e6b526f73...](https://gist.githubusercontent.com/jaipradeesh/b8a0e6b526f73583cd95f46e2b440b86/raw/2163e83c4382fc7bdb52f9c5c189f88143dd1297/gistfile1.txt)

[3] [https://deepsource.io/security](https://deepsource.io/security)

[4] [https://vanta.com/guides/vantas-guide-to-
soc-2](https://vanta.com/guides/vantas-guide-to-soc-2)

------
jujodi
So our CI pipelines are always set up so that failed linting means blocked
merge capability. Your PR isn't ready for review if it's failing rubocop for
example. Do you intend to integrate your tool into this type of workflow but
by making the lint issues apparent via comment on the PR in GitHub vs in the
CI?

~~~
dolftax
DeepSource integrates with GitHub checks [1] and via the dashboard, you can
select the issue types (anti-patterns, bug risks, performance and security
issues, style, type checks and documentation), which when detected, will cause
analysis runs to fail and pull requests to be blocked.

[1] [https://pasteboard.co/IZfSThC.png](https://pasteboard.co/IZfSThC.png) [2]
[https://pasteboard.co/IZfT8uw.png](https://pasteboard.co/IZfT8uw.png)

------
slewis
Cool! Can you provide some real world examples of issues you flag? I poked
around on your site and didn’t see any.

~~~
sanketsaurav
A few issues from our Python analyzer:

* Dangerous mutable default argument passed in functions

``` def some_func(arg=[1,2,3]):

    
    
        ...

```

which should be

``` def some_func(arg=None):

    
    
        if arg is None: arg = [1,2,3]
        ...

```

* `yield` used inside a comprehension (which breaks code in Python 3.8)

* file opened with the "r" flag, but a write is attempted on the file

* i/o detected on a closed file descriptor

* providing an unexpected keyword argument in a function call

------
pabs3
Is there a list of the open source static analysis tools that you are using?
Do you have any proprietary tools you have written?

------
einpoklum
When you offer support for C++, we'll talk. More challenging to parse and
analyze, of course.

~~~
dolftax
Sure. I've left you an email.

------
pedro596
Congrats! Any plans to add support for more languages?

~~~
sanketsaurav
Ruby is already in beta, stable release in the next 3-4 weeks. Next up is
JavaScript. Rust, Java, and PHP are further down the line.

~~~
ftonobo
How does it compare to static analysis as rubocop actually does. Especially in
who decides what anti-patterns are

~~~
sanketsaurav
For our analyzers, we actually do use existing static analysis behind the
scenes in addition to our custom checkers that we write by hand. So our Ruby
analyzer, which is in beta at the moment, does use Rubocop behind the scenes.
We’re working towards the stable release of Ruby analyzer which uses
augmentations to remove false positives and decrease the noise — since
guaranteeing less than 5% false positives is one of the primary values that
DeepSource adds. As the anlayzer moves towards stable, we'll add custom issues
to it.

The general categorization of anti-patterns is based on the consensus of the
community around the language, and also some obvious things based on objective
reasons. Although we understand that everyone has their own flavor of
conventions — so it is very easy to triage and ignore specific issues in
DeepSource.

