I took a quick look at this. Is SigNoz primarily for server observability? One thing I’ve used Sentry for a lot in the past is client-side logging – client-side JavaScript, native iOS, and native Android. SigNoz doesn’t do that as far as I can see?
Important bit:
Updated Sentry’s data usage rights to include the right to use Service Data for AI model training and other product development efforts.
Effective on Februrary 3, 2024; so essentially less than 30 days to find a new provider.
Country doesn't have anything to do with it.
Maybe think about self-hosting something if you are concerned about those things. Something like Signoz: https://github.com/SigNoz/signoz
If you're migrating from sentry to self hosting, you might as well just self host Sentry itself no? I've never tried myself, but it seems relatively straightforward: https://develop.sentry.dev/self-hosted/
We are solely processing data that is in scope of the GDPR on behalf of our customer in fulfillment of the contract. If you submit a GDPR deletion request, we also ensure that the data is delete within the given period.
So if one of my error messages leaks sensitive data to Sentry, and therefore gets used for AI training, are you going to delete your entire AI model upon request?
You know that technically that would virtually be the only guaranteed way to comply with such requests, right?
Sure, but if you unilaterally change the terms of that contract and don't re-consent for the expanded use, the old lawful basis doesn't really apply, now does it? I mean, whatever, the ICO will explain it to you soon enough with big legal words. Good luck!
I’m JD, co-founder of Raygun (https://raygun.com). We’re a paid alternative in crash report logging and user performance metrics. We’ve been operating for years, not reliant on VCs, and are HQ’d in New Zealand.
I’ve met with many customers over recent months who have told me quite plainly that they have been told none of their data is to go into any AI/LLM system. This is often coming from legal and many understand this may not apply forever, but for now it’s a hard no. While many of these were large organizations, the message was almost universal.
To that end, at Raygun we have always run the business as though all data we receive is the customers’ data. Treat it how I’d want our data treated by one of our suppliers. As we look to release our own AI capabilities this year, we’ve been making it clear that these will be opt-in features. We take the role of being trusted with customer data very seriously, and the trust that is placed in us. We’re already compliant with standards like GDPR and tracking what’s coming around AI rules is something I think we’re all paying close attention to also.
Happy to chat with anyone looking, can match budgets: jdtrask@raygun.com
I'm also happy to answer specific questions about the TOS change or what we're intending on using.
As for the opt-out: given the scope of how we are using usage and service data it did not seem all that obvious an opt-out was needed. We did receive the feedback however and will follow up here.
Have you considered your target audience when writing or before sending that email about that TOS change?
What really peeved me was that bit:
> If you do not want to be governed by the updated TOS, be sure to close your account before the updated TOS applies to you.
My immediate reaction was to look for alternatives to Sentry and/or to self-host for the time being. Judging by the comments here I'm not the only one. Instead of this cold, matter of fact kind of tone, why not explain the reason for wanting to collect data, give an example, etc.
Why not sell this as a win instead of telling us to fuck off if we don't like it?
> As for the opt-out: given the scope of how we are using usage and service data it did not seem all that obvious an opt-out was needed. We did receive the feedback however and will follow up here.
A good rule of thumb for vacuuming customer data in enterprise environments is: everything needs an opt-out (I'd even say that almost every instance of it should be opt-in).
You can't assume that all of your customers are ok to provide their data to a black box owned by you, sometimes that might even create regulatory issues they need to be aware of. Even if historically your ToS restricts how you can use said data, it creates a risk that one day you might not be so restrictive, we are all very aware of companies slicing the salami to further encroach on customers' data.
It sounds really strange that a company serving enterprise customers is not thinking in those terms.
> it did not seem all that obvious an opt-out was needed
That doesn’t help seem like people making decisions at sentry are caring about their customers. I left sentry for other reasons but this just reinforces my decision to leave.
Not super surprising that there are a bunch of negative comments here, but as a customer I appreciate the candor and especially the note that most competitors already have similar terms.
why can't companies be honest that they want to gather data so they could get higher valuation and sell that data in some cases for profit instead of making such shameless statements?
Correct. No such data is disclosed in any way other than to operate the service.
We might use that data to rank issues by severity within sentry.io, but this data is not disclosed to anyone.
True until they're hacked, or they're bought by some firm, or the management changes, or someone sneezes and makes a commit by mistake, or they just straight up change their mind.
After you pass in a tragic accident involving an escaped cyborg moose later this year, your fellows decide to leave for a new startup combining building castles in marshes with a singing school, and the company is bought by a private equity firm, will they too not disclose data and not sell data? I mean, joking aside but frankly we're now decades and decades into the digital revolution anyone around awhile has seen this film before. A lot of times. Founders and employees can be absolutely 100% pure in their intentions and genuinely stick to that, no cynicism. But they're humans, you're human, like the rest of us. The flow of time is inexorable and data stored doesn't spontaneously evaporate over time, while meanwhile the average age of companies, even established S&P500 ones not "startups", well [0,1]:
>"A recent study by McKinsey found that the average life-span of companies listed in Standard & Poor’s 500 was 61 years in 1958. Today, it is less than 18 years."
In 2003 we were all a lot younger, in the 90s lots of data really was just thrown out even objectively valuable stuff like final code/assets for media productions such as hit videogames, and some failure to predict or naivete made sense. Here in 2023 though the default assumption is that any data you gather you will keep forever, and that you as an entity will not be deciding what happens with it forever. That's something you need to address head-on, for real internally as well. "We do not disclose data and we do not sell data" simply doesn't cut it at all, anymore then saying your values are "Don't Be Evil" would. Come on.
Our terms of service are historically very restrictive. Unlike many others in the space they do not permit us to use event data for analysis or improvements of the service.
For instance SigNoz was mentioned in the thread here, their terms of service already contain a similar clause. I'm not sure how many companies offer opt-outs for that.
Clauses in terms of services for using event and usage data for product improvements without an opt-out are quite common place.
Here the equivalent clause from Datadog:
> In order to provide and support the Service for the benefit of Customer, Customer hereby grants Datadog a worldwide, non-exclusive, royalty-free license […] Customer agrees that, so long as no Customer Confidential Information is publicly disclosed, Datadog may: (i) use Customer Data to refine, supplement or test Datadog’s product and service offerings; (ii) include aggregated and anonymized Customer Data in any publicly available reports, analyses and promotional materials; and (iii) retain anonymized, non-attributable Customer Data following any termination of this Agreement for use in connection with the foregoing.
My personal prediction is that part of the ongoing [gen] AI Revolution will see a massive return of self-hosting at co-located data centers (if not also @HQ).
I was a union electrician in such data centers back enough years ago that this was the then-popular method [owning your own hardware]. IMHO, good-riddance to The Everything Cloud™. Proprietary data should belong to their companies/people, only.
EDIT: I can't respond [anti-flame measure; this is genuine discussion]; my follow-up to your comment (below) is: DO THE TERMS YOU CITED ABOVE APPLY ON SELF-HOSTED INSTANCES? If I blackhole your DNS entry-points, will the service remain functional "off-site"?
> DO THE TERMS YOU CITED ABOVE APPLY ON SELF-HOSTED INSTANCES? If I blackhole your DNS entry-points, will the service remain functional "off-site"?
The only communication that a self hosted sentry has with sentry.io is the beacon which is the first item in the self hosted docs for how to turn it off or on. It does not transmit any of your data other than the admin's email address for security updates which also can be turned off. You can also look at the code to validate that it doesn't do anything dodgy. No need to firewall it off.
I wonder if this is just them aligning themselves with the new EU AI Act at the same time that they are rolling out a EU region[1]. From my understanding that act, soon to take affect makes it a requirement to explicitly explain the use cases for AI in your use of data in TOS. Before this law you didn't really have to say if you used AI
Lots of negative comments here. No opt-out sucks but afaics the AI they’ll train will be purely useful to their customers. “You received exception XYZ; check these issues which might be the cause.”
That’s quite a cool assistant to have and very much on the safe side. I can always choose to ignore the AI’s advice.
Yes, you are correct. And even more specifically its a legal issue. The old TOS guaranteed to enforce data separation and now it does not. Its not about the technology, its about loosening constrains on user generated data. Do you think I care if they expose my data through LLM or heuristics based recommendation engine? This is not cake pictures from Instagram, it has legal ramifications.
I will get a hell of down votes for this, but why logs matter? If you’re logging personal or sensitive data, then you should reconsider your logging instead of changing the provider.
That said, I think there should be a legislation about companies changing their TOS whenever they want without consequences.
Its not just simple logs at the end of the day, so there are two things to be aware of:
1. Logs will, no matter how well you do your job, contain something sensitive. Typically this ends up being customer data (PII). PII is the largest customer to most customers.
2. IP is _very_ present in logs, and especially in the types of data Sentry gathers (e.g. stacktraces). While not everyone is hyper sensitive to their code being on other systems, some companies treat it much more seriously. One particular example is the gaming industry, where projects are kept tightly under wraps and often take a decade+ to commplete.
Regarding your note on the Tos Change, ours (Sentry's), and every other coroporate ToS I've ever looked at, always included some kind of clause to allow a change within a time window, without updated consent. As an implementor I understand the need for this (think about maintaining old versions of software, same paradigm), but as a customer I understand your sentiment as well.
That said, many customers end up on annual terms of service with slight variants, particularly larger customers. This is a typical negotiation process as many of the needs/demands of organizations vary slightly.
I just wanted to say that we've heard your feedback and are going to follow up. I'm not going to say we'll make any changes to the ToS, but there is at least a huge lack of clarity and clearly some triggering language in a number of things.
There are many things we want to be able to do - much completely unrelated to the current generation of LLMs - that we simply cannot due our current set of terms. A good example of something that you might not care about, but is not explicitly clear in our terms: can we show your production average latency comparies to other applications? What about other web services? Other Python web services?
I think we can all agree we missed the mark on comms here, and we may need to clarify and/or re-think part of the data use strategy. We certainly are not intending to funnel customer IP or PII into a random chatbot, and at the end of the day trust is involved in these things as we cannot explicitly label every feature we have built or plan to build within our ToS.
While I dont think HN often represents our customer base 1:1, I do know our customer base also lurks here, and data security is and always has been of utmost importance to us. That isn't changing, and there's no conspiracy with investors or anything else going on.
I understand that you don't want to be in the business of sharing your customer's IP, but it's hard to not see using your customer's internal diagnostic data for training as exactly that.
A universal opt-out of sharing data for training would go a long way in assuring me that I'm not about to find out that another customer is getting a snippet of quasi-private information (not sensitive to be PII, but not public enough to share). I can imagine that this'll be fine for a lot of customers but it makes me very uneasy.
Heck, you could probably incentive some users into sharing their data by gating "AI" features behind not opting-out.
Agreed! There's some things we think customers won't care about, that we'd like to leverage in general (latency benchmarking for example), but not everything is that.
Our current thought process is to support two additional consent mechanisms:
1) An opt-out for sharing data which has no risk of exposure to customers (e.g. think about our fingerprinting algo - we can train better heuristics for that without giving away any of your data).
2) An opt-in mechanism for sharing data, in exchange for access to the functionality, for anything that goes beyond the above. This is less well defined, and internally would be our default stance.
Our big TODO at this point is to clearly define those, build the consent mechanisms, and determine what, if any, updates we need/want to make to the ToS before enforcing it.
A) This raise a huge issues with IP and HIPAA. Unless you add specific limitations of model per customer with 0 shared information
B) I do not know what kind of product discovery process you did to think that "industry benchmarks" are what your customer wants, but its not. As an analytics startup owner I can tell you these are the kind of low effort "Features" people tell themselves their customer wants and its never holds when you actually perform discovery.
C) This is maybe something you want for your next VC round? do you need to shove proprietary LLM model to expand the valuation? I hope you do not crash and burn doing this.
D) When I opened my current startup I started with Sentry without even thinking about it, as it was what I used in my last company and it just worked. Sure as hell going to checkout Datadog and the alternatives as we speak. We are currently on a monthly plan, you tell me I have until Jan 26 to drop the service?
B) These arent fluff features we're building, and one of the most important features we need aggregate data on is core to our business (fingerprinting). The over rotation on generative AI in the topic is the problem - thats not a focus for us.
C) This has nothing to do with investors, and Sentry's well capitalized both from prior rounds as well as cashflow. This is simply us needing to be able to develop stronger features that have moved beyond what we can easily capture in hand-to-hand development.
> can we show your production average latency comparies to other applications?
This is pretty standard: many companies explicitly add contract language to NOT be included in these benchmarks. The reasons should be obvious, offering these benchmarks on increasingly narrow segments walks right up to betraying competitive information.
source: I work at another observability company where we DON'T share our customers information with each other
I can imagine there lots of practical ways they can use AI - guessing the cause & solution of issues (this seems to be an off by one error, or you might want to check for null values on this line), automatically finding & linking you to related code for new errors (this failing connection was probably created over here), and detecting which errors are important and which are noise that you can ignore (the biggest problem with Sentry in my experience). In each case, training on their existing data will help a lot with this.
All of that would be future product development, but I suspect this addition is for exactly that reason: so they can begin to prepare the ground for later development, not because they're going to immediately roll this out tomorrow.
They want the data for AI but not just for their products. They want it as a business asset. It helps boost their valuation and makes them attractive to potential acquirers, but they also can sell de-identified data outside the company per their own terms. That just means they don’t sell PII for less than two customers in a data identifiable group.
See the terms:
> Sentry may use Usage Data and Service Data for analytics and product development (including to train or improve AI Features and generate Outputs). Sentry will not disclose Usage Data or Outputs externally, unless they have been de-identified so that they do not individually identify Customer, its Users or any other person and are aggregated with data across Sentry’s other customers.
That is NOT what their own terms and conditions, which I’m quoting in the comment you are replying to actually says:
> Sentry will not disclose Usage Data or Outputs externally, unless they have been de-identified so that they do not individually identify Customer, its Users or any other person and are aggregated with data across Sentry’s other customers.
You can write whatever you want in a blog post. It’s the legal terms and what’s in the contract that matters and presumably shows intent.
In this case they are clearly planning to show or sell de-identified data as it’s in their terms and conditions explicitly. What is being said in the blog and what the company is giving themselves a license to do are not the same.
If I were running product at Sentry I'd be working on an agent that opened pull requests to fix your Sentry errors. This would work today with GPT-4 on a nontrivial number of errors/codebases.
This would definitely not work in most bug cases just based on the fact that context size is too short and models too dumb.
If you have a "can't access key 'foo' of undefined" error, adding a check if the object is undefined or not is not a fix. It's simply silencing the error.
The bug might be way, way down in the stack, maybe even in the backend or somewhere else in the client that made the backend store invalid data my mistake.
I highly doubt that the current breed of models are capable of this.
Just what I need, pull requests with a bunch of AI generated drivel. I really don't want to spend my days figuring out if the PR that just got assigned to me contains an actual issue or some AI hallucination.
Today one of the largest challenge with Sentry at scale is that every error looks the same. If statements only get you that far. One of the ways in which we want to improve this is to use issue interaction and the event's content to predict the severity of an issue.
The second case is a version of suggested fix that uses your code to suggest actual fixes to your code. In this case the data never crosses your organization's boundary and is private to yourself.
At the moment they have a feature where you can click on a button on a crash report and it will use generative AI to explain the problem / suggest a fix. Sometimes it works, sometimes it doesn’t. They also have features tied into version control where you can tag a commit to close a crash report etc. So I would expect them to be able to train on the data this gives them to suggest more accurate fixes down to the source code level.
Looks like the AI pied pipers and the VCs have pressured Sentry to add this useless change that no-one asked for to make themselves look like an AI company because of the hype.
If anyone is looking self-hosted for alternatives then they should try SigNoz: https://github.com/SigNoz/signoz