
Launch HN: Datree (YC W20) – Best practices and security policies on each commit - shimont
We are Shimon and Eyar, co-founders of Datree (<a href="https:&#x2F;&#x2F;www.datree.io" rel="nofollow">https:&#x2F;&#x2F;www.datree.io</a>). We&#x27;ve built software to help engineering teams automate the adoption of development best practices, coding standards, and security policies.<p>When I (Shimon) was the manager of a 400-developer company&#x27;s infrastructure engineering team, we had an issue where a developer committed AWS secret keys into a public GitHub repo. We were very, very lucky that the bad actors who quickly got ahold of the keys &quot;only&quot; spun up compute instances to mine bitcoin.<p>Mistakes happen and they happen to the best of us. No developer wants to make mistakes, especially ones impacting production. Those mistakes can be not only costly to the business, but emotionally painful for the developer.<p>After finding out about the issue, I had to search for any other leaked secret in our repositories to make sure we were no longer exposed. The next thing that I had to do was to take steps to help folks avoid making this mistake again.<p>It&#x27;s easy to create a policy that says &quot;do not commit secrets to GitHub&quot; (which was what I did) but in reality, this is much harder to implement. I would do things like sending a mass email to all of Engineering and having code reviewers check for it manually during code reviews. Problem is, these approaches don&#x27;t work consistently—if at all.<p>The bigger the engineering team—and the faster it ships software—the bigger this problem becomes. Also, developers today operate more independently and have broader responsibilities; they are responsible for not just writing code, but also testing, and deployment to production. You might expect that developers would follow best practices, standards, and policies, but of course, in practice, these things fall through the cracks. That&#x27;s why we built Datree.<p>What we built is a rules engine, which is essentially a server-side git-hook platform. We connect it to the organization’s source control, scan the layout of the repository, parse all structured files like YAML &#x2F; JSON &#x2F; XML &#x2F; Dockerfile, and build a catalog with the organization’s metadata—such as packages used, container images, and all the properties in the structured files.<p>The engine performs an automatic check each time code is committed to GitHub. This happens before the code can be merged to master. It runs just like your CI tests. It checks if the rules you&#x27;ve set are followed—and tells the developer when they aren&#x27;t and how to fix it, but not like your CI configuration, Datree is running on the org level so you can apply any rule on all of your repositories in just one click.<p>You may be asking “is this another static code analysis tool?” We see Datree as completing or complementing those tools, not competing with them. We’re seeing our customers create a rule with Datree to check and verify that static code analysis step is integrated and executed as part of their CI flow, instead of going over each CI config file in their repositories and updating it manually.<p>Rules could be anything: development best practices, lessons learned from post-mortems, security policies, or compliance standards. For example, a very popular rule is to prevent secrets from being merged into the master branch. Leaking secrets to source control is a common and potentially costly mistake (see <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=19825202" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=19825202</a>).<p>Often people ask us, “what rules should we adopt?” Because of this, we started curating industry best practices and turning them into rules they can simply enable when they use our product. Datree now comes with more than dozens of rules packs for all kinds of popular technologies (like Docker and serverless), languages and frameworks, tools (like GitHub and Travis CI), and even use cases (like SOC 2 compliance). Of course, you are free to create your own custom rules.<p>To date, Datree has run 100,000+ checks for Engineering teams large and small, including Microsoft, Globalgiving, Cybereason, and Gigster (YC S15, 400+ engineers).<p>We’re sure many HN members will have encountered similar problems and&#x2F;or have expertise in this area. We’d love to hear from you: How do you ensure the adoption of development best practices for your team? What works and doesn’t? Thank you!
======
ThePhysicist
Funny we built almost the same product 6 years ago (sold it to a competitor),
we even did automated refactoring of Python code. We also developed a regex-
like language that could operate on abstract syntax trees / annotated graphs,
which we wrote all our checks with. We were working on extending that with a
graph database backend and symbolic execution, basically building a large code
graph that we would perform pattern matching on. We didn’t finish this work as
we sold the company before that, in retrospect I often wonder what would have
happened if we had kept developing it.

From my experience it’s quite hard to monetize developer tools except maybe
when focusing on security, so it’s good you seem to have that as a focus as
well. Good luck!

~~~
shimont
I would love to hear more about your experience! could you please email me at
Shimon [AT] Datree IO?

I believe that now is the right time for a solution like Datree. I think so
because of the way we develop software has evolved, companies moved from
Waterfall into Agile, there is developer autonomy and the move towards
distributed micro-services has brought many companies to the reality of having
hundreds and thousands of git repositories, each one with its own
configuration files for CI, Docker, Kubernetes, etc.. its really hard managing
all of those distributed pieces :)

~~~
Terretta
Hi Shimon, this is a very cool direction, something we are working on hard
right now too. Talk to us if interested? Ping my coordinator to set up a quick
call, his email in my profile. Ask him to include head of dev journey in our
call too. Cheers!

~~~
shimont
Will do! thanks

------
cddotdotslash
This is really awesome! One area I'd recommend looking into is automated
scanning of cloud infrastructure templates (Terraform, CloudFormation,
Troposphere, ARM Templates, etc.) These get pushed to source control all the
time and often contain tons of policy violations.

The pricing feels a bit steep, especially considering that it's 3.5x the cost
per user of GitHub itself ($8 vs $28) but I suppose most enterprises wouldn't
mind at their scale compared to the cost of a breach.

~~~
eyarz
We do support Terraform, CloudFormation and ARM Templates because those are
all structured files. Unlike your source control, we are doing heavy compute
processing every time a new PR created, so our costs are higher...

------
harrisonjackson
>What we built is a rules engine, which is essentially a server-side git-hook
platform.

Isn't it too late once it is committed to github? It seems like this would be
much more useful as a service running as a precommit hook on each workstation.
Probably harder to ship/monetize that but as far as actually solving the
problem wouldn't that be better?

~~~
shimont
Initially, we started as a CLI tool, but as you said, it is part of the
problem, how do you make sure all of your developers are using the CLI/pre-
commit hooks?

This is why we choose to integrate on the pull-request level. It is not
perfect, but at least your plain text secrets will not be merged into master
and go in onto your developer's laptops and your servers(less). :)

We try to find a balance between perfect and achievable in an easy way for our
customers

~~~
lasryaric
But it’s already in the git objects and therefore accessible to anyone who
clones the repository? I am not 100% sure about that. Can someone confirm?

~~~
shimont
We educate our customers on how to delete the branch and remove it from
history: [https://docs.datree.io/docs/do-not-include-secret-
files](https://docs.datree.io/docs/do-not-include-secret-files)

~~~
ben509
I think you're miseducating your customers.

If creds leak, rotate those creds. Then, you check your logs to make sure
there was no intrusion.

"Rotate the creds" gives the absolute best guarantee that they're useless.
Three words I can explain to a nervous manager.

"What if someone got ahold of those creds?"

"Well, boss, here's the window in which it could have happened, and let's go
over these logs together to see if it did."

Scrubbing the repo? I'm skeptical that you're getting rid of anything without
push --force, and you sure as heck aren't running `git gc --prune` on the
remote system, let alone `bfg`.

~~~
shimont
I totally agree! you should rotate the keys! we explain how to get rid of it
in terms of Git. This is in addition to rotating it. Sorry for not being clear

------
__jal
We do several of these things, and bundling them together looks nice; I
imagine troubleshooting the pipeline is much easier. We would need the
enterprise version because we are on-prem, and our user count compared against
the 'pro' edition makes me think this would be a hard sell - high 5
figures/year to replace a few shell scripts is tough.

~~~
eyarz
I believe we can provide more value than what can be achieved with a few shell
scripts, for example, the built-in best practices, the rules management
option, and more ;) A volume-based discount is also available for enterprise
customers. If you would like to hear more - feel free to reach out to Eyar
[AT] Datree IO

------
elpakal
>The engine performs an automatic check each time code is committed to GitHub

What if we don't use GitHub but something else? Are you hooks able to run
purely in git?

~~~
shimont
Currently, we support GitHub and working on releasing our support for GitLab
and BitBucket. We plan on running on top of existing git hosting solutions

------
almathes
Why wouldn't someone just use github actions and token scanning.

[https://github.com/features/actions](https://github.com/features/actions)

[https://developer.github.com/partnerships/token-
scanning/](https://developer.github.com/partnerships/token-scanning/)

~~~
__jal
For starters, that "just" is swallowing:

\- Identify the relevant tokens you want to scan for, and create regular
expressions to capture them.

\- Create a token alert service which accepts webhooks from GitHub that
contain the token scanning message payload.

\- Implement signature verification in your token alert service.

\- Implement token revocation and user notification in your token alert
service.

And that would replace one piece of what this does.

~~~
stevenpetryk
It always warms my heart to see someone fighting the "why not just..."
comments on here. Everyone underestimates how much goes into a project.

~~~
dang
Jerry Weinberg used to say that whenever you hear the word "just" on a
software project, replace it with "have trouble". Similarly, replace "should"
with "isn't". "That should be easy" -> "that isn't easy"; "we should just use
git" -> "we'll have trouble using git".

[https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...](https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=weinberg%20word%20just%20trouble&sort=byDate&type=comment)

------
toomuchtodo
Do you offer an on-prem version for orgs that couldn't use a SaaS provider for
this sort of functionality?

~~~
eyarz
yes, we do.

------
theanirudh
Looks good. We were trying to implement this using a mix of CI, pre-commit
hooks and Gitlab PR templates, but it was limiting. This looks like exactly
what we needed.

Regarding custom rules, does the tool run automated tests for those too?

~~~
eyarz
just to clarify, Datree runs automatic checks for each custom rule that a user
creates

~~~
theanirudh
We have a Java + Mongo stack and recently we had an issue in prod where code
was deployed with a @Indexed annotation. When deployed this will create an
index on startup and can lock up the DB. We had this issue years ago, but it
slipped pass code review and we had a 10 minute downtime because of this. Can
this be prevented using custom rules in Datree?

------
yani
I like the idea and I would like to use the product but the pricing needs to
target more than large enterprises. $3,360 as a starting price is more than I
am willing to invest. For 10 users, I pay $1,620.40/annually to use Jira +
Slack + GitHub.

Also, I do not want to request a demo, I want to try it out myself, then
invite a few more people to try it out.

------
bluefox
Hello Shimon, nice to see your post here, all the best to you and your team
from an ex-colleague.

Would you mind sharing an example of a custom rule?

~~~
shimont
Hey :) Here are several examples of custom rules:

* Verify that CI configuration includes running certain jobs (e.g. third-party packages scanner).

* Ensure that all Docker containers are using a pinned down tag and not "latest"

* Verify that every commit is tied to an issue tracker (e.g. JIRA) ticket for traceability.

------
debaserab2
Any plans to expand to other VCS hosting services(bitbucket specifically)?

~~~
eyarz
Yes, we plan to add GitLab and BitBucket support in the next quarter.

------
dhagz
So this sounds like git-hooks-as-a-service. Am I right in that assessment?

~~~
elpakal
that's my take as well but maybe im missing something

~~~
elpakal
i don't mean to diminish though, very cool idea!

~~~
eyarz
the git hook is only the integeration part. the core of the system is the
rules engine and the logic around that.

------
vira28
Wondering if there any open source project which does similar things?
(surprised if it's not)

~~~
eyarz
you will need to glue together (and maintain) a bunch of different open-source
projects to achieve the same capabilities - here are some:
[https://github.com/danger/danger-js](https://github.com/danger/danger-js),
[https://github.com/probot/probot](https://github.com/probot/probot),
[https://github.com/Yelp/detect-secrets](https://github.com/Yelp/detect-
secrets),
[https://github.com/github/licensed](https://github.com/github/licensed), and
many more...

------
simplify
Curious, what tech did you use to implement this?

~~~
eyarz
Micro-services on serverless architecture with NodeJS, GoLang, React and a
little bit Python

------
Chico75
How do you deal with false positives?

~~~
shimont
Every policy can be edited and tweaked to your use case using our engine and
Regex. In addition, you can use our dashboard to view all executions

------
mtmail
The URL [https://www.datree.io/](https://www.datree.io/)

~~~
dang
Added above. Thanks!

