Hacker News new | past | comments | ask | show | jobs | submit | ievans's comments login

Looks like the `ets` readme has a direct comparison:

> The purpose of ets is similar to that of moreutils ts(1), but ets differentiates itself from similar offerings by running commands directly within ptys, hence solving thorny issues like pipe buffering and commands disabling color and interactive features when detecting a pipe as output. (ets does provide a reading-from-stdin mode if you insist.) ets also recognizes carriage return as a line seperator, so it doesn't choke if your command prints a progress bar. A more detailed comparison of ets and ts can be found below.


I wrote up a Semgrep rule as a comparison to add! (also tree-sitter based, `pip install Semgrep`, https://github.com/semgrep/semgrep, or play with live editor link: https://semgrep.dev/playground/s/nJ4rY)

    pattern: |-
       def $FUNC(..., database, ...):
           $...BODY
    fix: |-
      def $FUNC(..., db, ...):
          $...BODY


So the argument is because the vulnerability lifetime is exponentially distributed, focusing on secure defaults like memory safety in new code is disproportionately valuable, both theoretically and now evidentially seen over six years on the Android codebase.

Amazing, I've never seen this argument used to support shift/left secure guardrails but it's great. Especially for those with larger, legacy codebases who might otherwise say "why bother, we're never going to benefit from memory-safety on our 100M lines of C++."

I think it also implies any lightweight vulnerability detection has disproportionate benefit -- even if it was to only look at new code & dependencies vs the backlog.


Absolutely agreed, and copying from a comment I wrote last year: I think the fact that tree-sitter is dependency-free is worth highlighting. For context, some of my teammates maintain the OCaml tree-sitter bindings and often contribute to grammars as part of our work on Semgrep (Semgrep uses tree-sitter for searching code and parsing queries that are code snippets themselves into AST matchers).

Often when writing a linter, you need to bring along the runtime of the language you're targeting. E.g., in python if you're writing a parser using the builtin `ast` module, you need to match the language version & features. So you can't parse Python 3 code with Pylint running on Python 2.7, for instance. This ends up being more obnoxious than you'd think at first, especially if you're targeting multiple languages.

Before tree-sitter, using a language's built-in AST tooling was often the best approach because it is guaranteed to keep up with the latest syntax. IMO the genius of tree-sitter is that it's made it way easier than with traditional grammars to keep the language parsers updated. Highly recommend Max Brunsfield's strange loop talk if you want to learn more about the design choices behind tree-sitter: https://www.youtube.com/watch?v=Jes3bD6P0To

And this has resulted in a bunch of new tools built off on tree-sitter, off the top of my head in addition to difftastic: neovim, Zed, Semgrep, and Github code search!


Don't forget Zed! https://zed.dev


> Don't forget Zed!

Mac only, for now.


What's crazy is that the landing page doesn't even mention Mac at all.

I'm getting very annoyed by things that don't mention they only work on Mac until you go to install them.


Looks great! It has lsp support for code completion? Supports C++?


LSP support is semi-built-in, but lots of improvements to come in that area apparently to support more language servers. With Python, it currently only has Pyright built-in which is more of an annoyance if you're working with code where the venv is inside a container but there's very active tickets on their GitHub about building out the LSP support. I currently use it as my second editor - I have Sublime set up to be pretty much perfect for my usage, but Zed is catching up fast. I find I'm very fussy about editors, I can't get on with VSCode at all, but I feel warm and fuzzy toward Zed - the UX is great, performance superb, external LSP support is probably the one feature stopping me using it as my primary editor.


I tried Vs code a ton of times. It is reasonably good, but I am SO used to Emacs that it is almost impossible to move from there for me.

Vs code is better at debugging and maybe slightly better at remote connections, that yes. But for the rest of things I am way more productive with Emacs than anything else.


Okay, but how does that work with language versions? Like, if I get a "C++ parser" for tree-sitter, how do I know if it's C++03, C++17, C++21 or what? Last time I checked (which was months ago, to be fair), this wasn't documented anywhere, nor were there apparent any mechanisms to support langauge versions and variants.


You can probably rely on backward compatibility of the language and use the "latest." The question is, which version is the grammar written against?


And then there's all the variants of SQL...


That's what I was looking at in the very beginning. Here's how it unfolds: Grammar page (https://github.com/tree-sitter/tree-sitter-cpp) reference two documents at the very end:

- Hyperlinked C++ BNF Grammar (https://alx71hub.github.io/hcb/)

- EBNF Syntax: C++ (ISO/IEC 14882:1998(E)) https://www.externsoft.ch/download/cpp-iso.html

The second doc has a year in the title, so it's ancient af. The first one has multiple `C++0x` red marks (whatever that mean, afair that's how C++11 was named before standardization). It mentions `constexpr`, but doesn't know `consteval`, for example. And doesn't even mention any of C++11 attributes, such as [[noreturn]], so despite the "Last updated: 10-Aug-2021", it's likely pre-C++11 and is also ancient af and have no use in a real world.

Who might have thought. /s


So I see nothing really changed :(.


don't forget old man emacs is now using tree sitter


helix (https://helix-editor.com/) is using treesitter and LSP as well


I'm surprised there are so many negative comments on this release, which I suppose is timed for discussion at Blackhat/Defcon.

The report's identifies its audience as four groups of stakeholders: (1) federal civilian executive branch agencies (2) target rich, resource poor entities where federal assistance and support is most needed, including SLTT partners and our nation’s election infrastructure; (3) organizations that are uniquely critical to providing or sustaining National Critical Function (4) technology and cybersecurity companies with capability and visibility to drive security at scale

The overlap with the HN audience is probably primarily under the last category, where they have 5 objectives listed in the report (increasing threat modeling, secure software development frameworks, accurate CVE data, secure-by-design roadmaps, and publishing stats like MFA adoption and % of customers using unsupported product version). These all seem like good priorities for an agency like CISA and I've been impressed by their level of direct industry interaction even in our company's corner of the security (appsec) space.


Looking at the details of the plan to secure America's IT infrastructure, it leads me to Secure Software Development Framework which then leads me to an Excel spreadsheet which then leads me to a tick the box exercise I can get from any generic consultant.

https://csrc.nist.gov/files/pubs/sp/800/218/final/docs/nist....

This is how big corp rubberstamps their security "review". As an American, I was hoping for the government to come up with a real solution. Like telling the big tech companies, that if America goes down the toilet, so do you. So stop with nonsensical security theater, and come up with real solutions. Like how to identify who is doing what. Real identity authentication and real logging. No more VPN/TOR/I can use any IP address I want then spoof a federal employee. No more I can arbitrarily change any setting/value because MSFT/UNIX doesn't believe in auditing.


You're doing a lot of hand-waving. As someone who has managed remote access to... Internal networks, I'll say it isn't as easy as shoulder surfing at a coffee shop anymore to get into a secure network.

And if that isn't known, I do consulting!


I’m a “target rich, resource poor” entity which is where federal assistance would most be needed, so I read the report eagerly. I didn’t like it and it doesn’t seem to contain any useful plans or roadmap.


Are you a federal agency? That's most of the target of CISA's strategic work.


I do work at the federal agency level, which is what I think you’re asking. I am clearly the intended audience of CISA’s strategic work, and that work is of very poor quality at the moment (as shown by the document we’re discussing) and does not serve my interests. CISA also declined to take my feedback in an unpaid advisory role, which is the first time in my life that this has happened. I’ve never met another organization that goes to as much lengths to avoid the possibility of hearing from its customers.

Where in the linked document do you see any part of their vision to listen to their targets and solve their problems? Their strategy does not include allowing any reporting of problems and threats, nor gathering any feedback about the security issues on the ground. In fact their document doesn’t even contain basic contact information. It is an opaque document discussing non-threats and ignoring gathering information about threats, understanding and responding to them. It is the worst strategic plan from any organization on any subject and fails to mention any mechanisms toward necessary outcomes.

https://www.cisa.gov/sites/default/files/2023-08/FY2024-2026...


I suspect the issue here is that you are not, in fact, the intended audience of this work.


I was relying on backup codes until I recently learned (via a PSA on HN) that you should actually backup the TOTP QR codes and not rely on the 2fa backup codes because they may not provide the same level of access.

Specifically, the HN post claim was that using a backup 2FA code to get into your Google account won’t allow you to add a new authenticator app.


Sad to see this as someone who enjoyed these as a kid and entering robotics competitions based off of them; hopefully the new products will keep the spirit alive.

There was even some decent FOSS tooling that developed on top of Mindstorms: I used NXC (Not eXactly C, https://bricxcc.sourceforge.net/nbc/welcome.html) which was a C-like language for programming Lego Mindstorms. It looks like the last release of NXC was in 2011.


Similar story here. I remember getting my first exposure to embedded development at a relatively young age through brickOS (https://brickos.sourceforge.net/), which was a complete replacement operating system for the original Mindstorms RCX (predating the NXT).

That version of the hardware was so old that it didn't even have non-volatile storage. Every time you changed the batteries, it would boot into a minimal ROM bootloader which was just powerful enough to download the rest of the firmware into RAM, via an infrared connection to your PC. That had the nice side effect of making the RCX very hacker-friendly, because it was almost impossible to permanently "brick" it (ha!).


My first real programming was NQC (Not Quite C, probably related to NXC?). I owe it my programming abilities.

Lego Mindstorms was one of the best creative learning tools you can give a child; my life started the day I stopped using the building manual and started building my own stuff by trial and error.

The world would be a better place if everyone grew up with the opportunities that I did. I wish schools would just let children do whatever with Lego instead of filling your day with restrictive lessons in loud classrooms.


I grew up using mindstorms in FLL and I just began coaching a team this year. I'm pretty impressed with the new Spike Prime, and although it has its quirks, over all I'd say it's a good improvement over the last generation, plus I get to teach the kids python which has honestly been a blast!


Pybrick/python runs well on both Mindstorms EV3 and the new thing (Spike Prime).


Both Semgrep Supply Chain and govulncheck (AFAIK) are doing this work manually, for now. It would indeed be nice if the vulnerability reporting process had a way to provide metadata, but there's no real consensus on what format that data would take. We take advantage of the fact that Semgrep makes it much easier than other commercial tools (or even most linters) to write a rule quickly.

The good news is there's a natural statistical power distribution: most alerts come from few vulnerabilities in the most popular (and often large) libraries, so you get significant lift just by writing rules starting with libraries.


(Disclaimer: I work at Phylum, which has a very similar capability)

Not all of it has to be manual. Some vulnerabilities come with enough information to deduce vulnerability reachability with a high degree of confidence with some slightly clever automation.

Not all vulns come with this information, but as time goes on the percentage that do is increasing. I'm very optimistic that automation + a bit of human curation can drastically improve the S/N for open source library vulns.

A nice property of this is: you only have to solve it once per vuln. If you look at the total set of vulns (and temporarily ignore super old C stuff) it's not insurmountable at all.


In what format is that information coming in? There are no function taints in githubs OSV data or NVD's data that I can see.


> Both Semgrep Supply Chain and govulncheck (AFAIK) are doing this work manually, for now.

Ya I get that, but surely you don't have 100% coverage. What does your code do for the advisories which you don't have coverage for? Alert? Ignore?


Since security vulnerability alerts are already created and processed manually (e.g., every Dependabot alert is triggered by some Github employee who imported the right data into their system and clicked "send" on it), adding an extra step to create the right rules doesn't seem impossibly resource intensive. Certainly much more time is spent "manually" processing even easier-to-automate things in other parts of the economy, like payments reconciliation (https://keshikomisimulator.com/)


That's 100% coverage which is ideal but will take time to get to.


All the engine functionality is FOSS https://semgrep.dev/docs/experiments/r2c-internal-project-de... (code at https://github.com/returntocorp/semgrep); but the rules are currently private (may change in the future).

As with all other Semgrep scanning, the analysis is done locally and offline -- which is a major contrast to most other vendors. See #12 on our development philosophy for more details: https://semgrep.dev/docs/contributing/semgrep-philosophy/

Relevant part of the changelog is a good idea--others have also come out with statistical approaches based on upgrades others made (eg dependabot has a compatibility score which is based on "when we made PRs for this on other repos, what % of the time did tests pass vs fail")


Ah okay, thanks for the information.


We added support to the Semgrep engine for combining package metadata restrictions (from the CVE format) with code search patterns that indicate you're using the vulnerable library (we're writing those mostly manually, but Semgrep makes it pretty easy):

    - id: vulnerable-awscli-apr-2017
      pattern-either:
      - pattern: boto3.resource('s3', ...)
      - pattern: boto3.client('s3', ...)
      r2c-internal-project-depends-on:
        namespace: pypi
        package: awscli
        version: "<= 1.11.82"
      message: this version of awscli is subject to a directory traversal vulnerability in the s3 module
This is still experimental and internal (https://semgrep.dev/docs/experiments/r2c-internal-project-de...) but eventually we'd like to promote it and also maybe open up our CVE rules more as well!


Here is a good writeup of some of the pros and cons of using a "reachability" approach.

https://blog.sonatype.com/prioritizing-open-source-vulnerabi...

>Unfortunately, no technology currently exists that can tell you whether a method is definitively not called, and even if it is not called currently, it’s just one code change away from being called. This means that reachability should never be used as an excuse to completely ignore a vulnerability, but rather reachability of a vulnerability should be just one component of a more holistic approach to assessing risk that also takes into account the application context and severity of the vulnerability.


Err, "no technology currently exists" is wrong, "no technology can possibly exist" to say whether something if definitively called.

It's an undecidable problem in any of the top programming languages, and some of the sub problems (like aliasing) themselves are similarly statically undecidable in any meaningful programming language.

You can choose between over-approximation or under-approximation.


I saw that Java support was still in beta. But it makes me wonder if it's going to come with a "don't use reflection" disclaimer, then...?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: