Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: SubImage (YC W25) – See your infra from an attacker's perspective
135 points by alexchantavy 28 days ago | hide | past | favorite | 31 comments
Hi HN! I’m Alex, and along with my co-founder Kunaal, we are thrilled to introduce SubImage (https://subimage.io): a tool that lets your security team fix issues before they’re found by attackers. Teams use SubImage to map their infrastructure and emulate adversary behavior. Here’s a video of how I would use it to hack our own company: https://www.youtube.com/watch?v=P_meu4_aIVA.

SubImage is our hosted offering built on top of Cartography (https://github.com/cartography-cncf/cartography), the open source security graph that we created at Lyft in 2019, originally shared on HN here: https://news.ycombinator.com/item?id=19517977. You can think of us as an open-core Wiz alternative.

In 2016, I worked on Microsoft’s Azure Red Team, where we built an infra mapping service to find the shortest paths to exploit our targets. We were so effective that the Blue Team wanted it too. In 2019, I joined Lyft, where we applied the same ideas to AWS and beyond, helping build and open-source Cartography. Over the past six years, it’s been incredible to grow the community and see over 70 companies (that I know of) use it.

Kunaal and I first worked closely together in 2020 when we helped bootstrap Lyft’s vulnerability management program and used Cartography as its backbone: https://eng.lyft.com/vulnerability-management-at-lyft-enforc.... This is actually where the name SubImage comes from: Lyft services are made up of one or more “SubImages”, and modeling this properly was such a memorable engineering challenge that we decided to name our company after it.

Cartography pulls metadata from multiple sources -- SaaS, cloud service providers, a company’s internal services -- and writes it to a graph database. This simple technique is incredibly powerful in modeling otherwise unseen misconfigurations and attack paths in areas like access permissions, networking, and software vulnerabilities.

SubImage picks up where Cartography leaves off: it’s a fully-hosted solution that provides specific recommendations for the problems it finds. The fix-action depends on company size: small teams might run AWS CLI commands, while larger orgs require automated infrastructure-as-code pull requests.

Here’s a video demo showing how we can use SubImage to understand and take action if our Stripe API key is unexpectedly used: https://www.youtube.com/watch?v=RBCr35hb5Hk.

SubImage also provides a natural language interface to quickly answer questions about our infra: https://imgur.com/a/subimage-natural-language-interface-quer....

Security is a competitive space, but we have a few differentiators:

First, we allow a very deep level of customization where the security team can enrich their graph with their own internal data, not just data from the major cloud providers. If it can be expressed as structured JSON, you can graph it; here’s a demo: https://www.youtube.com/watch?v=rvwDJoZaO_w. This flexibility is needed to answer questions like: Which storage buckets contain PII? Who owns them? Who’s on-call for https://example.com/api/payment? Which company director owns the most risk?

Since it’s built on Cartography, teams can also just write custom plugins in Python if they’d like: https://cartography-cncf.github.io/cartography/dev/writing-i....

Second, our core principle is actionability. Security teams drown in alerts. SubImage traces paths from critical assets to the most exploitable misconfigurations, helping teams cut through the noise and prioritize real threats.

Finally, we’re built on open source. We created Cartography and as it improves, so does SubImage. Cartography is a CNCF project (https://eng.lyft.com/cartography-joins-the-cncf-6f6b7be099a7), which means that it is full open source and will remain so.

Going forward, we’re maintaining Cartography while launching SubImage as a fully managed offering. Our roadmap includes Access Management (prune excessive permissions and enforce security invariants, Change Tracking (detect and alert on infra changes that introduce risk), and Cloud & SaaS Misconfigurations (expand visibility, including vulnerability management).

Thanks for reading! If this sounds interesting, try out https://github.com/cartography-cncf/cartography.

It’s an honor to share SubImage with HN, especially having followed projects here for over a decade. We’d love to hear your questions, feedback, and the challenges you face in security and infra!




Awesome project!

As someone deeply familiar with this problem (ex-JupiterOne), I'd caution against asserting that 'deep level of customization' is a differentiator. Your buyer (CISO) and userbase (Sec Engs) are drowning. They (and I) don't want yet another product to build on top of. This is a key reason why Wiz is so successful -- an operator can turn Wiz on and immediately receive value, no adjustments or additions needed.

I'd strategically focus on making the 'actionability' part the cornerstone of the product and really become obsessed with making that part of your product incredible. The Goliath-killing story you need will be formed by figuring out how to get your product to the point where someone can turn it on and immediately receive value for the most impactful security problems first (ex: Log4J) and the total surface area of problems the product solves for second.


I would second this. No security person says "I don't have enough problems to look into."

Security spending is down, so navel gazing products are going to be a really hard sell. Figure out how to actually solve problems in an automated/semi-automated way and ship that instead.

The other issue with all of these tools is handling onboarding/integrations and getting terrible visibility as a result. A big market gap I see is a tool that can use the vulnerabilities it discovers to further information collection just like a real attacker would. Found Splunk creds in a log? Awesome, start using them. Syslog in an S3 bucket... boom. You are now hitting the stuff that every other ASM/visualization tool has missed.


Makes sense -- we're focused on fixing problems over just being yet another Jira ticket generator.

> Found Splunk creds in a log? Awesome, start using them. Syslog in an S3 bucket... boom. You are now hitting the stuff that every other ASM/visualization tool has missed.

This is my dream :). This past weekend I was playing around with something where if I clicked on a SecretsManagerSecret node then it'd give me the CLI commands to assume the roles and then retrieve the secret. It'd be neat to take it a step further and be able to click here and get a shell -- I don't think we're _that_ far off from that (but for now to be very clear we're focusing on read-only actions only since a security tool with permissions to do scary things in your environment kinda defeats the purpose).


Thank you, this is very helpful especially given your experience in the space. I intended to frame this like "there are many tools that let a security team can pull in data from the cloud providers and detect misconfigurations, but this becomes soo much more useful when they're able to contextualize it against their internal data". If I'm responding to log4j, I want to know all of the services that are running that affected library, which ones are internet open, and who in the organization owns it. That last part is key for actionability.


I was watching a competitor(?) of yours a few years ago who were trying to integrate https://github.com/WithSecureLabs/IAMSpy#iamspy with Cartography to have more insight into what, actually, the IAM Roles could do

Do you have similar plans or are those kinds of things left as an "exercise to the reader" via your Intel Plugins link? I do see https://cartography-cncf.github.io/cartography/modules/aws/s... but I also see https://github.com/cartography-cncf/cartography/blob/0.100.0... so it's hard to know what level of insight one wishes to support out of the box versus the localstack model of "open core, advanced features are $$$" type deal


> have more insight into what, actually, the IAM Roles could do

We 100% do this, see https://eng.lyft.com/iam-whatever-you-say-iam-febce59d1e3b.

We evaluate the policies for the IAM principal against the resources to determine what actions they can perform on each resource. This is configurable too; here's the set of the default permission relationships shipped in OSS: https://github.com/cartography-cncf/cartography/blob/master/...

It doesn't cover conditions since those can be wacky complicated, and it doesn't cover resource policies (yet!) but in my experience this is still a very good heuristic that is already more accurate than AWS IAM Analyzer when I played with it.

The next step we're working on is to take this access map and correlate it with event data to see which permissions are used/unused so that we can prune them for ensuring least privilege. More to come here.

Edit: adding on for the part of your question about what features are paid or OSS, our paid offering is fully hosted and includes things like automatic suggested fixes, a natural language interface, customization with our dynamic schemas, and other bells and whistles. I'm not a fan of doing things like premium modules because I don't want to ever get in the position where I'm declining a pull request in open source because it covers a premium feature; that doesn't feel right.


Yes, that's why I linked to what I did and mentioned IAMSpy because the devil's in the details, especially with things like AWS SSO and OIDC providers, because those represent a whole class of principals that _could_ get into the Role but only a finite number of them that actually do, barring misconfiguration[1]

I think it would probably be unreasonable to say "IAM Conditions when?" in a Launch HN if one had to build those things from scratch. That would be ferociously hard and not a sane ask right out of the gate. But since IAMSpy already exists, and according to you there's some non-trivial amount of IAM evaluation already in Cartography, then what I'm asking is whether you envision your future as one of ("eh, it's good enough", "we're integrating more libraries that attempt to formalize IAM", or "we'll roll our own policy engine in python, how hard could it be")

Further illustrating my point, you linked to a .yaml file with "s3:GetObject" seemingly applied to an S3Bucket saying "can read" but that's for sure not systemically true for a monster list of reasons. I get the impression that Wiz makes their bread and butter on helping people understand when they actually have open S3 buckets and not just giving them a report full of false positives

I do appreciate this can come across as busting your chops, but I don't mean to shit on you, or your product, or your launch. I'm just pointing out that if you put "You can think of us as an open-core Wiz alternative" in the 2nd sentence of your announcement, there is a massive opportunity for expectations being out of alignment unless you have a plan to get from where you are to Industrial Grade Introspection. The other side of that coin is that if you do have the background for it, as your pseudo-resume implied, then it's a massive opportunity to give them a run for their $5 billion, too

1: and it's the misconfiguration that I would want a reasonable tool to chirp about, not "omfg token.actions.githubusercontent.com can get into your Role!"


Not at all, super appreciate the discussion and your detailed read!

> what I'm asking is whether you envision your future as one of ("eh, it's good enough", "we're integrating more libraries that attempt to formalize IAM", or "we'll roll our own policy engine in python, how hard could it be")

It's a combination of 2 and 3. Today we use the policyuniverse library for things like s3 bucket-policies, and we have that self-rolled policy engine described in that blog post I shared. I should've mentioned earlier that this is the first time I've seen IAMSpy, thanks for sharing.

I think we're currently pretty good at permissions evaluation since that feature gives lots of value as it is, but there is a lot more to do. Continuing to improve this and being able to connect that with other data is a priority since it's one of our main value propositions.

> there is a massive opportunity for expectations being out of alignment unless you have a plan to get from where you are to Industrial Grade Introspection.

Industrial Grade Introspection is absolutely the plan. I'll also add that we're especially interested in highlighting cases that involve permissions that go between providers - like Okta->AWS, Opal->AWS, etc - and we intend to be very competitive here.


Looked at your video demo, does SubImage actually recommend changes and generate terraform? For example instead of exposing 80/443 to the EC2 instance, deploy a ELB in-front of it that listens on 80/443 publicaly and only allow the ELB to forward traffic to the ec2 instance. Also, utilize attach role to the ec2 instance to avoid storing AWS credentials in environment vars, though if the instance was compromised an attacker could still access the s3 bucket.


> does SubImage actually recommend changes and generate terraform?

We recommend changes, though we don't generate Terraform just yet. Great feedback on the specific fixes for this case, thanks.


Given that this is a paid product, are you liable if the chatbot misrepresents the data?

website(on firefox) nitpicks

- The handle_complexity.png image is too small to read and can't be zoomed unless opened in another tab.

- The background effect is in the foreground of chatbot_cropped_gif.gif

- The yaml schema text should have a background like the rest of the text boxes


> Given that this is a paid product, are you liable if the chatbot misrepresents the data?

Good question. Right now the chatbot is in preview and we're currently figuring that out. That said, we do have it provide the underling query that it used to answer the question so a user can double check with that.

Thanks for the comments on the landing page -- we're security nerds and definitely not great at frontend haha, will fix!


Wow this library has a lot of history being developed at Lyft! Have you seen a good response to the paid offering? I suppose all the OSS users self hosting will switch over!


This is cool, and really makes sense for large organizations. Do you foresee a release for smaller enterprises (something as simple as a lightweight aws integration?)


Depends on how small. Most seed stage and early companies are more worried about product-market fit and not security. For now, we're probably best fit for companies building out their first security teams, so that's _usually_ series A and later. That said, there might be something there, I'm open to figuring something out for a smaller company.


Could definitely see us releasing some form of this for smaller companies as well, it's crazy how many vendors and how much infra even these 2 month old startups have


Actionability >>> observability

If you can pull this off, you will have a great time


Agreed


Looks very cool! Wiz is a beast at the moment so I will be watching closely to see if you (or anyone else really) will be able to go up against them


Congratulations on the launch! Can you please provide some details on your business model?


Thank you! Our business model is B2B software as a service. We're offering a fully-hosted offering around Cartography where we add useful features that enterprises want like automatic fix actions, recommendations, a natural language interface, and others.


How about your go to market and pricing?


We're looking for customers who find value in Cartography but don't have the resources to self-host. Open source is big in helping us meet people and learn the needs of teams who would be interested in commercial support. For pricing, it varies because we can take on very different infra requirements depending on the size of the customer's environment or their data freshness needs.


How come things like this are not built into most cloud providers?


AWS has Config to give you an inventory, but that only covers AWS. My guess is that there's not much incentive for the major cloud providers to build a product to help you correlate data across other products.


Congrats on the launch!


Hi, interresting goal that you have in mind.

Working in a huge enterprise, I see a clear benefit for this kind of product, as we are really struggeling to keep track.

I understand that you are very early in boot-strapping, but what I was missing while skimming over the videos and links and webpage is a better high-bird view or contextualization of the apporach.

I was considering a demo, but the two options (chat and quick chat) were a bit unclear to me what they would archive / how they are structured.

Again, I have full understanding that you are still working on this. Good luck with this project.


> I understand that you are very early in boot-strapping, but what I was missing while skimming over the videos and links and webpage is a better high-bird view or contextualization of the apporach.

For a higher level view and contextualization, can you share more on what you mean? This would help give us a better idea on what to build.

> I was considering a demo, but the two options (chat and quick chat) were a bit unclear to me what they would archive / how they are structured.

Ah, you're referring to our cal link (https://cal.com/team/subimage)? It's basically up to you -- we can show you something in 15 minutes or 30 minutes based on your availability and based on what you're interested in -- would love to hear feedback in call!


absolutely awesome -- huge need


Looks great. Sent you a DM.


Congrats on the launch!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: