Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Bearer – Open-source code security scanning solution (SAST)
106 points by gmontard on March 7, 2023 | hide | past | favorite | 56 comments
Hi HN,

we’re the co-founders of Bearer, and today we launch an open-source alternative to code security solutions such as Snyk Code, SonarQube, or Checkmarx. Essentially, we help security & engineering teams to discover, filter and prioritize security risks and vulnerabilities in their codebase, with a unique approach through sensitive data (PII, PD, PHI).

Our website is at https://www.bearer.com and our GitHub is here: https://github.com/bearer/bearer

We are not originally Security experts but have been software developers and engineering leaders for over 15 years now, and we thought we could provide a new perspective to security products with a strong emphasis on the developer experience, something we often found lacking for security tools.

In addition to building a true developer-friendly security solution, we’ve also heard a lot of teams complaining about how noisy their static code security solutions are. As a result, they often have difficulties triaging the most important issues, and ultimately it’s difficult to remediate them. We believe an important part of the problem lies in the fact that we lack a clear understanding of the real impact of any security issues. Without that understanding, it’s very difficult to ask developers to remediate critical security flaws.

We’ve built a unique approach to this problem, by looking at the impact of security issues through the lens of sensitive data. Interestingly, most security team ultimate responsibility today is to secure those sensitive data and protect their organization from costly data loss and leakage, but until today, that connection has never been made.

In practical terms, we provide a set of rules that assess the variety of ways known code vulnerabilities (CWE) ultimately impact your application security, and we reconcile it with your sensitive data flows. At the time of this writing, Bearer provides over 100 rules.

Here are some examples of what those rules can detect: - Leakage of sensitive data through cookies, internal loggers, third-party logging services, and into analytics environments. - Non-filtered user input that can lead to breaches of sensitive information. - Usage of weak encryption libraries or misusage of encryption algorithms. - Unencrypted incoming and outgoing communication (HTTP, FTP, SMTP) of sensitive information. - Hard-coded secrets and tokens. - And many you can find see here: https://docs.bearer.com/reference/rules/

Rules are easily extendable to allow you to create your own, everything is YAML based. For example, some of our early users used this system to detect the leakage of sensitive data in their backup environments or missing application-level encryption of their health data.

I’m sure you are wondering how can we detect sensitive data flows just by looking at the code. Essentially, we also perform static code analysis to detect those. In a nutshell, we look for those sensitive data flows at two levels: - Analyzing class names, methods, functions, variables, properties, and attributes. It then ties those together to detected data structures. It does variable reconciliation etc. - Analyzing data structure definitions files such as OpenAPI, SQL, GraphQL, and Protobuf.

Then we pass this over to a classification engine that assess 120+ data types from sensitive data categories such as Personal Data (PD), Sensitive PD, Personally identifiable information (PII), and Personal Health Information (PHI). All of that is documented here: https://docs.bearer.com/explanations/discovery-and-classific...

As we said before, developer experience is key, that’s why you can install Bearer in 15 seconds, from cURL, Homebrew, apt-get, yum, or as a docker image. Then you run it as a CLI locally, or as part of your CI/CD.

We currently support JavaScript and Ruby stacks, but more will follow shortly!

Please let us know what you think and check out the repo here: https://github.com/Bearer/bearer




Vote manipulation is against HN's rules and will get you banned here, so please don't do it again.

https://twitter.com/g_montard/status/1633119734991405058

https://twitter.com/g_montard/status/1633119274838392841

This is the one point that's in both the site guidelines and the FAQ:

https://news.ycombinator.com/newsguidelines.html

https://news.ycombinator.com/newsfaq.html


Oh, I’m really sorry about that, I didn’t know (my fault) mentioning we were on HN was against the rules. Calling that « vote manipulation » is quite exaggerated imho but I get it.

Ultimately I think I got carried away by the great community reception.

Anyway thanks for letting me know, I’ll avoid doing so next time.


https://news.ycombinator.com/newsfaq.html#ring exactly describes the circumstance and while it doesn't have its own heading in the guidelines:

> Don't solicit upvotes, comments, or submissions. Users should vote and comment when they run across something they personally find interesting—not for promotion.

seems to also match what dang is drawing attention to. My (outsider) suspicion is that the number of dead comments from new accounts on this thread drew attention to the goings-on


Hello HN community,

I'm Cédric Fabianski, Co-founder and CTO @ Bearer.

This is a big milestone for me personally and I'm super happy to be able to contribute to the Security space and help improve the security of others' applications.

This is by far the most challenging project I've ever worked on but as people say, if you don't make security simple and accessible enough, there is no way engineers are going to care about it.

Let me know what you think! Any feedback is more than welcome!!


Thanks, this is very cool, I've been clicking around a lot! I like what you've got going, and I like how it has a resemblance to Rubocop.

My first feedback -

It's a little too many "clicks to code" given (1) how easy it actually is, and (2) aimed at developers.

Personally, I'd slap the `brew install` + `bear scan` on the initial landing page, just under the "Get Started" link. (with an "Other Installation Options" link, 'cuz brew)`. You do a pretty good job of this (the GIF) but I look at "clicks to code" as an indicator of how focused on ease-of-use the provider is, and you're more focused on it than the landing page suggests to me. (Sinatra is the reigning champ at this).

Next -

1. Pronto integration. I'd like to be able to plug it into things like Pronto, so we make sure we're not introducing new problems while we're not ready to deal with existing ones.

2. Github PR comments. It's not clear to me if the output of the GH action will create comments in a PR ala Pronto / Rubocop. It looks it probably does, so just show me a picture so I know for sure?

3. YAML option for recipes

4. I needed to upgrade to XCode 14.1 (from 14.0). Why was that necessary? Seems like it shouldn't be?

5. THANK YOU for providing links to source code from the docs! (I checked out a couple of rules). I would pick a couple of your favs and link to them from the "Custom Rule" page, too.

6. I'd definitely run a few from-scratch workshops for custom rules, recipes, etc; point people at the docs and ask them where they run into even the smallest friction. Your docs are really good but I did need to scroll and click around a bunch as I came to an understanding. Smoothing that out would be nice! (Think Rails Guides vs Rails Docs)


Elastic 2, for those who care about such things: https://github.com/Bearer/bearer/blob/v1.0.0/LICENSE.txt


Absolutely!

We wanted to find a good balance with a license to allow any team to use it for their own usage no strings attached and at the same time protect us against a big vendor tempted to package our work under their product without us getting a dime... Unfortunately, it happens in this world :(


AGPLv3 would ensure any changes by a big vendor would remain freely licensed. The current license for this project fails to meet the Open Source Definition (Criteria 6: No Discrimination Against Fields of Endeavor) since it restricts offering Bearer as a managed service.


That's right, we don't want to have someone doing managed service on top of us without a getting a license (or just an agreement). Basically, it's the AWS vs Elastic case, that resulted in this license.

Happy to revisit the license in the future when we feel more protected, but for now, we've seen so much bad behaviors in this industry with big vendors taking advantages of small companies like ours.


> …we've seen so much bad behaviors in this industry with big vendors taking advantages of small companies like ours.

Firstly, I have absolutely no problem with your choice of license so don’t take this as a criticism of your project.

What I do take issue with is people releasing software under a “free” license and then complaining about people taking them up on their offer. This isn’t “taking advantage”, this is taking what they are freely giving.


Not taken, just wanted to give the context of why this license.


nothing wrong with this license. Don't like it, then don't use it. I don't needs somebody to tell me how OSS is defined or yours.


Always great to see more SAST options.

"Contact Us" for pricing immediately disqualifies any product I'm looking at however, I'd suggest making pricing very clear on the site.


Once we're a bit more ready on the Cloud version, we'll release the pricing. Honestly I also hate when pricing is not available, so I'd like us to avoid this going further! Thanks for putting this back in my radar.

Anyway, with the OSS, you don't need to care about pricing :)


How does this compare with Semgrep, which to my understanding is the dominating open-source SAST offering to date?


I wouldn't say dominating tbh, but clearly one of the good solution out there for sure.

Probably the biggest differentiator is our ability to detect sensitive data flows and map those to the different security findings. It allows finding unique risks as sensitive data leaking in loggers for example, but also dynamically prioritize issues based on the type of sensitive data at risks or even decide it's not important if none are.

Let's say you're connecting to an unsecure API, we're going to assess if you're sending sensitive data or not there, depending on that we'll change the priority of the risk. If none are involved it would be a low risk, if PHI are involved it would be critical.

For the rest, I let you be the judge of the UX, quality of findings, speed etc.


Interesting! Thanks for responding. Sounds like it's bridging SAST and threat modeling.


The big missing feature for these kinds of tools is a workflow and relationship for dev teams to mark findings. Marking them as "false positive" or "only applies if these other conditions are true", or "yes, but we have a mitigation/exception". etc. A fast workflow that allows for less blockers, reduced noise and a focus on things that actually matter.


Totally agree. I love the idea of SAST-in-CI, but I ran this on a handful of repos I manage (ranging from 40k-100k SLOC) and there were too many false positives for me want to add this as build-breaking criteria to our CI pipeline. Not unique at all to bearer in any way of course, as you point out, but still a real problem.

I suppose an alternative would be to not have this be a zero-sum part of CI, but maybe as a qualitative summary that gets autogenerated as part of the PR / code review process. The noise issue is still a real one as people will eventually ignore the noisy summaries or filter/whitelist them into relative oblivion.

I like the idea of "only applies if these other conditions are true". In all the false positives I encountered so far, if given the option I would be able to declaratively express when and when not to apply the rule. I'd even be ok with inline ignore comments to that end which, while not ideal, is something folks are already used to for other idioms like test coverage et al.


We need to open for configuration the filtering and prioritization logic that essentially does that today, but so you can apply your own logic.

I advise to start today by looking first only to critical alerts, with our scoring based on sensitive data impact that should be a good first step in triaging.


Btw if you have some exemple please share or even better write an issue, we’d be super happy to look at it and fine tune the rules.

It’s just a 1.0, we can do much better for sure :)


I'll cherry pick an example: the default cookie config rule(https://github.com/bearer/bearer/blob/main//pkg/commands/pro...).

We have many places where `cookie: <EncryptedString>` is used in our code and it triggers that rule. There are a few issues with this:

- Most of the expressions where we use that pattern are used to send a full encrypted cookie string. The use of `cookie` is not the name of a key in the cookie string, its the whole cookie.

- All of the data in the cookie string itself is encrypted and also sent over https. Just matching on a regex expression won't tell you this information without an accompanying AST to verify.

Notably, we're using hapi and not express but my notes above would still apply to some use cases in express as well. Its possible I am missing the actual value of that rule, but just matching on the expression is going to generate a ton of false positives.


Thanks for the feedback here; it is much appreciated :) I do know your point around catching encryption is more general than this example, but I’ve made a small improvement to the default cookie config rule regex to address one of the false positive cases mentioned https://github.com/Bearer/bearer/pull/754


This still generates the same false positive for me, in all of the previous repos I tested on.


Thanks for the report back; that's interesting. Perhaps I misunderstood your example. Feel free to write an issue if you like, and I can investigate further.


I’ve introduced the `absence` trigger that does that If express is present but helmet is missing then break.

Do you think that’d help achieve what you have in mind?


I think the design flaw in most of the problematic rules was from too simple of regex matching. Looking for a string pattern should be a clue to do some deeper analysis (maybe verify via AST), not necessarily to flag the string alone as security failure.


The rules do work on the AST but the current cookie rule is not as advanced as it could/should be. For example, we really should treat encryption as sanitizing the value.

We'll take another look at the rules with this in mind. If you are able to share the (rough) approach you take to build the cookie string it would help us to ensure we're covering the specific case(s) you have.


Workflow is coming with our Cloud offering, with all the cool integration you can think of as Jira or Slack.

On the "marking" part, we have two options that will be available super soon: 1) Directly in the code, by adding a special comment that will ignore findings. 2) In the Cloud, an ignore action will forever park an issue, even if it changes line etc. (smart fingerprinting applied). We can't really have that in the OSS since it's state-less.


Gitlab has a great security dashboard for this. It organizes the output of multiple tools in a place where you can discuss, triage, ignore or track an issue to resolve it.

https://docs.gitlab.com/ee/user/application_security/securit...


Also, super expensive, you need the $99 plan :) https://about.gitlab.com/pricing/

Integration with SCM is clearly a top priority for us, especially directly in PR. GitHub SARIF is a nice way to integrate third-party into their Dashboard, we're commited to it.


100% agree

(shameless plug, the product we are working on for the last 1.5 years aims to solve exactly that… either via a PR bot, slack / teams etc). Ping me (see profile for details) if it’s interesting.


Github actually has this feature (only for open source and enterprise IIRC) when there is a SARIF output


SARIF output is on our Roadmap btw!

Github code scanning is not so great from what we've heard so far, but also it's very expensive, you need to be on the Enterprise plan...


First of all, thank you for making and sharing this. I have a few technical questions, if I may.

Does Bearer perform data-flow analysis? If so:

1. Is the analysis inter-procedural?

2. Is it sound? (Does it only report findings that it’s absolutely certain in but missing others; or does it report all possible findings even if some of them report false positives)

3. How are sources and sinks of information specified?

4. I see it supports JavaScript and Ruby. Any plans on adding other languages? Is the current analysis implementation amenable to adding support for other languages?

5. What’s the analysis behavior around dynamic language constructs (e.g. eval)?

6. What’s the analysis behavior around missing symbols/dependencies?


Thanks for your questions. Yes we do perform dataflow analysis:

1. Not yet but we are exploring ways to support that

2. The analysis part is sound. False +ves (mainly) come from limitations with what you can specify in the rule language. We're working on this however.

3. We don't make that distinction in the rules language currently. Sensitive data detection (which is built-in) is effectively treated as a source. But we need to allow rules to specify sources. I don't think the limitation matters to finding issues, but more to how well they are reported (you effectively only get the sinks reported at the moment).

4. We plan to add other languages but are mindful of the balance of depth vs breadth of support. Is there a particular language you'd like to see support for?

5. There is no support for these currently unfortunately.

6. As it's intra-procedural, we take quite a basic approach to these (with some special cases in the engine). In terms of dataflow, we treat unknown function calls as identity functions (assume the output is somehow influenced by all the inputs). Obviously this is not ideal in terms of false +ves, but we need to work on inter-procedural support first to do a good job of this. In terms of type analysis, we will try to infer unknown types locally from field/property access.


This is a great looking project - we've been looking for tools similar to this to add an extra layer of validation to our codebase. Are you thinking about supporting Java in the future?


Thank you! We were actually thinking Java or PHP for the next one, so I guess it's a +1 on java :D


another +1 for Java then


We hear you


+1 for Java, because that, possibly, means… Clojure? :)


You're pushing it ^^


I wish these tools would just auto fix it for me. I hate messages like this:

> CRITICAL: Only communicate using SFTP connections.

If you know what’s wrong, then fix it. My integration or unit tests will fail if your fix doesn’t work.


Well, we're getting there, at least into proposing some fixes.

Automatically fixing is tricky, it means changing your code that can get automatically deployed in production without any other checks.. Dangerous. Not sure if you want to trust anyone to do that, tbh.

Also, considering all the edge-cases there are, it's impossible to guarantee that a fix won't break your code. If someone does, they just lie to you.

But I understand why you'd love that, as a developer, I do too :)


> changing your code that can get automatically deployed in production without any other checks.

I’ve never worked at a place that didn’t have at least 2:

Code review checks

Qa checks

Automated testing

If an edge case breaks the code, then great! The developer can fix it (if the tool can’t). Even if 2% of the time, the system fixes it properly that’s 2% of the time the developer didn’t have to roll up their sleeves.


I agree, in theory :)

But I’m happy you say that and gives me hope our future automated remediation suggestion can be easily adopted.


I think these tools have to have the automation baked into the checks from v0. Adding it later can be a mess without the right abstraction.


You can't just fix that in code. FTP and SFTP are completely different protocols that use different servers.

You need a new server to talk to in order to fix that. And if it's a customer server maybe it can only do FTPS rather than SFTP.


Yeah… so this example is saying “you need to redesign your infrastructure before you can merge this change in.”

If sftp is a requirement, it should have been captured earlier in the process and not after the integration code was written.


In an ideal world security tools like this one should be useless… but unfortunately we don’t all live in this world where security requirements are all captured, understood and implemented correctly.

This is what just an exemple, think about application level encryption, leakage in logger messages etc.


Tracking and mapping where your sensitive data goes is challenging and manual approaches always fall short. This is a very unique unique approach to preventing sensitive data leakage.


Also check out Wazuh, for another great solution in this area.


While it is great, Wazuh is not close to a static analyzer.


Excellent product! I was a bit skeptical, but it worked on the first try on my Rails app and helped me discover a few issues!


Had the chance to try it a few weeks ago. Took only a couple of minutes to setup, and It gave me a a few interesting warnings about PII on one of my projects.

Feels like it would be a great tool for a team that is just starting to pay attention to security risks and vulnerabilities.

Will follow next evolutions of your tool, thanks for sharing!


Congrats Bearer team, looks awesome




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: