Hacker News new | past | comments | ask | show | jobs | submit login
Dear Search Guard Users (elastic.co)
136 points by praseodym 16 days ago | hide | past | web | favorite | 64 comments

This is the commit containing the code that Elastic says was copied by Search Guard:


In particular, the change to getLiveDocs(): https://github.com/floragunncom/search-guard-enterprise-modu...

and numDocs(): https://github.com/floragunncom/search-guard-enterprise-modu...


At least in the case of numDocs(), both Elastic and Search Guard's implementations seem to be based on this bit of Apache-licensed code from Lucene (or perhaps this is just a common pattern for counting live documents?):


Doesn't appear to be earth shattering changes. In fact it looks pretty boilerplate.

As someone who maintains an Elasticsearch plugin, you often have to do A LOT of research into how Elastic themselves have dealt with their undocumented, constantly evolving APIs to figure out how to adapt your own code. Stuff like this happens all the time where some boilerplate code changes, you copy in the way Elastic themselves did it. Given how the code is now a mix of Elastic and Apache code, now you can create a host of legal liability by accidentally taking the wrong piece of boilerplate.

Potential legal liability on top the headaches of maintaining an open source project in my spare time with almost no support. Oh joy.

I am not sure every comment critical of elastic's move has been down voted consistently to grey them out.

This has happened with my comments on other posts, it's content is that instead of using license encumbered ELK product people can use SOLR, Vespa or Lucene which are truly open source. It was down voted enough to move it to last or grey it out to make it difficult to read.

solr, vespa, and lucene only replace the elasticsearch part of ELk. Fluentd is an obvious replacment for Logstash, but I'm not sure what a good open source alternative to Kibana is that is compatible with solr, vespa, or lucene.

Lucene is a low level library which is internally used by ElasticSearch and Solr. Vespa is built from the ground up - it does not use Lucene.

Grafana comes to mind as kibana replacement.

My understanding is that grafana is good at visualization, not searching for specific events.

this is changing as grafana evolves to move into logs in addition to metrics

Interesting, however it doesn't seem like it is a full replacement, since it doesn't do full text indexing:

> Loki is meant to be complementary to existing solutions like Elasticsearch and Splunk that do full text indexing.

Unfortunately, since June 2018, we have witnessed significant intermingling of proprietary code into the code base. While an Apache 2.0 licensed download is still available, there is an extreme lack of clarity as to what customers who care about open source are getting and what they can depend on. For example, neither release notes nor documentation make it clear what is open source and what is proprietary. Enterprise developers may inadvertently apply a fix or enhancement to the proprietary source code. - from amazon's opendistro announcement.

That's a nice way to spread fud, when the complaint is that binary code was decompiled and copied.

>"Most of these instances of copying occurred before we opened our proprietary code last year, which means the Search Guard developers intentionally decompiled our binary releases in order to copy our code."

Hard to see how this could be anything other than deliberate infringement.

So, having read the court case (rather quickly, also thanks to 'Pyxl101 for uploading it to PACER - see https://www.courtlistener.com/recap/gov.uscourts.cand.347725...), my non-lawyer-but-software-engineer opinion is this shows vague signs of being an Oracle v. Google-ish case. Some of the examples of copied code look quite possibly like code where there's only one obvious way to implement them. The functions just use and connect some interfaces in a consistent way, and it might be the sort of thing like rangeCheck where the nature of the problem implies one correct implementation.

There is one suspicious part where there are five strings in a comment in Search Guard that are copied from five strings in X-Pack before X-Pack was open sourced. But that doesn't necessarily imply a decompiler / an attempt to copy X-Pack's implementation: they could have run a tool like 'strings' or even just monitored what APIs are called by X-Pack or something. And the strings in question appear to be API names, i.e., functional elements. I suspect it's possible that floragunn could say they legally reverse engineered X-Pack for the purpose of interoperability instead of decompiling it. On the one hand, if I were them I would stay away from X-Pack with a ten foot pole just so I wouldn't have to worry about it, but on the other hand, in my professional life I do run things like strings and strace and even gdb on closed-source binaries that we don't have a source license to, for the purpose of making them work / seeing why they're failing / etc., and I would be shocked if making use of what I learned from that counts as commercial copyright infringement.

Remember that floragunn's goal here is to produce a plugin that works with Elasticsearch, not a competitor to Elasticsearch as a whole, so it's unsurprising that they'd end up with similarly-shaped functions. And again while I would personally stay away from the product I'm trying to clone, doing so is not legally required - see e.g. Sony v. Connectix, where it was ruled legal for Connectix to copy and even disassemble the PlayStation BIOS for the purpose of producing a compatible reimplementation.

I'd like to see floragunn's defense. I agree it seems difficult for them to have a good one, but it's not impossible, so I'm curious to hear the story in their telling. And I'm also worried about the precedent here, in that there's a world in which Elastic wins the case, morally should have won the case, but also the court phrases it in a way that makes running gdb on proprietary code illegal.

Elastic may present it this way, but it doesn't need to be. Personally I decompiled different binaries numerous times in order to understand how an implementation works, but it never occurred to me I could "copy" the code somehow - the decompiled version was usually a big mess, so I had to figure out what is essential and reimplement it myself. In may jurisdictions this is perfectly legal. The jurisdictions of the USA and EU explicitly allow decompilation for the sake of interoperability.

They clarify here that this would have required explicit decompilation of proprietary features.

I suppose Search Guard could claim that they interpreted the license to mean the code they were copying was Apache licensed and that their interpretation of the license grant is more valid than Elastic's, but that is going to be a stretch. This announcement from Amazon might look bad for Elastic, but if anything it means that Search Guard was on notice and should have taken more care.

That is concerning and raises the risk level for using that tool in some scenarios.

Search Guard site is here: https://search-guard.com/company/

My gut feeling here is that Elastic is probably right. The SQ team is very small, while that doesn't mean anything it does make you wonder.

On the other hand, does anyone know if a company that small has any viable way to defend themselves against someone with deep pockets like Elastic?

> On the other hand, does anyone know if a company that small has any viable way to defend themselves against someone with deep pockets like Elastic?

The Search Guard code has been taken by Amazon and rebadged as https://github.com/opendistro-for-elasticsearch/security

So they're not really fighting Search Guard at all. They're fighting Amazon with this.

And Amazon would probably (and certainly has the money and business case) to rather buy out Elastic than to lose this lawsuit by proxy - and maybe that's what Elastic is actually playing at.

Aha, that is important subtext.

Of course, ES and Amazon have been fighting a battle (of words/ideas/dislike) for a while now, since Amazon released a hosted ES, and an ES distro. So this is part of that battle.


.... did Amazon really not get someone to compare the sources against the sources of X-Pack before redistributing it? It feels like you could just hire an outside contractor to double-check it and not contaminate yourselves.

Now that I've had a look, it seems like ODES doesn't contain the particular change that floragunn had put into Search Guard that is mentioned in the filing, so at least in that particular instance that's not directly linked.

I still think however that Amazon will feel very involved.

The code is still there even if the commit isn't: https://github.com/opendistro-for-elasticsearch/security-adv...

Ah, my mistake, I had looked in the security repo, not this one.

It would have been better if they provided explicit examples of infringement.

The complaint submitted to the court presumably has more details. It would have been nice if the blog post had included a link to it directly, but they did include the case identifier.

Edit: I went ahead and dug the information up. Here are the details about the court case on PACER (requires login): https://ecf.cand.uscourts.gov/cgi-bin/iquery.pl?924893974295...

This appears to be a link to the specific complaint: https://ecf.cand.uscourts.gov/doc1/035018374190 (20 pages, ~1 MB)

Edit: I've uploaded the complaint to the RECAP archive, which I believe you can access for free as a PDF here: https://www.courtlistener.com/docket/16154366/elasticsearch-... (see document #1)

One of the claims that's easy to summarize is that Elastic alleges that floragunn made massive changes to its codebase shortly after Elastic released proprietary updates, contrary to floragunn's typical development practices, and that these changes included copies of Elastic's proprietary code:

> On June 7, 2018, just over one month after Elastic made the source code for XPack version 6.2.x publicly available under the Elastic License, floragunn made a sudden and very large change to the Search Guard code. This change comprised 244 additions and 145 deletions of code. Many of these changes involved the wholesale copying of the X-Pack code that Elastic opened little over a month before.

The complaint goes into more detail with specific alleged examples of source code copying, showing the code in Elastic's codebase and the corresponding code in floragunn's.

You could submit it to "RECAP", a site that hosts uploaded PACER documents.


Done! Thanks for the suggestion. I've edited my post to include a link to the case on RECAP.

I can see it without any sort of login. Thanks!

> Elastic is aware that there are likely third party adopters of floragunn’s infringing Search Guard product. Elastic may seek leave to amend to add those third parties as defendants following discovery from floragunn regarding their identities.

OK, given discussion elsewhere in the thread, they're definitely talking about Amazon, right?

> I was able to retrieve the complaint as a PDF document, but I'm not sure how to host it somewhere that can be easily shared.

https://mega.nz perhaps?

Github can host pdf.

Here's our first response:


Jochen Kressin

Those are some bold accusations. Looking forward to hearing the response from Search Guard. I wonder if there were attempts to resolve this quietly before going to the courts (and blogs).

I was working with first ElasticSearch versions when Shay was the only developer. At that time, I was impressed how Shay was responsive, friendly and overall ES had good design, compared to Katta [1] we used in our product.

ES was my go-to search engine since, but something fishy started to happen with elastic.co from 2018. They changed the license, started to use dark patterns for downloads and product names and this message from Shay, where he invites Search Guard users to use 'free security features' (which aren't free at all) from elastic.co, is low blow, not for Search Guard devs/users, but for ES users as well. If I develop a custom plugin for ES and charge support for enterprise users, how will I know they will not come after me simply because they have similar addon?

As one comment noticed, alleged code is too common and can be found in Lucene as well.

I'm hoping this will end well, but elastic.co brand isn't going to be the same.

(Also, elastic.co isn't immune to taking over other work as well [2]).

[1] http://katta.sourceforge.net/

[2] https://discuss.elastic.co/t/vector-modules-going-apache-xpa...

EDIT: added link to vector module issue

My thoughts exactly. I'd even say that Elastic is acting very toxic and childish, but then again, they IPO'd, dunno who's running the show now could be anyone for what I know.

It sure is someone who' eager for profit though, that's for sure.

From a first impression, this seems entirely reasonable.

It is unfortunate when "freemium" companies hide essential features like security behind their paywall, and I don't love Elastic for doing that, but code copying is code copying.

It seems like they heard the “critical feature as paid feature” complaint loud and clear which is why they made that part of the application freely available. While it maybe shouldn’t have taken so long to realize, many companies use a similar model that involves security features (LDAP integration, lower level controls, etc) as premium.

Security controls by accessing via API or strict network configs were also an option if you were using older open source versions. It took more work - but that’s also why it makes sense to pay for a better abstraction from the vendor.

We have analyzed the claim, and it has no merit.

Out of 10s of thousands of code they're bringing just a few snippets here and there which frankly only deal with APIS (netty, Lucene) in a way that is simply normal to do.

A shameful FUD, Looking forward to read the official rebuttal.

Introduce yourself please.

I am not sure about the merits of the case to comment on this directly (blog post is sparse on details), but this is certainly unfortunate for the ElasticSearch community.

I want to share Arc - https://github.com/appbaseio/arc, an API gateway for ElasticSearch with security features that we have been actively developing starting this year. It's Apache 2.0 licensed, built in Go and we use it for providing security features for all of our customers. It's not as feature-rich as X-Pack / SearchGuard today, but we're happy to accept any PRs.

It's already known elastic made elastic search code base proprietary. So if anyone who uses their distribution without proprietary license will likely be sued.

It's very difficult in a mixed proprietary and open source code to figure out which one and which one is not proprietary. Also given the terms and conditions and license text is written broadly, it can be changed at will.

I will not side with sun guard or elastic here. Use Vespa or solr or some other software if one wants to use open source. Don't touch ELK with encumbered license.

All of the license-encumbered code lives in a separate directory, with a clear warning that it's licensed, and builds result in license-encumbered code being an entirely separate artifact from the purely Apache 2 licensed code. Besides that, anything that goes into package managers is also fully clean Apache 2 licensed code.

Literally the only way to "mix this up" is to not read anything, ignore package managers and build from source, then somehow decide to use 'x-pack' over 'elasticsearch'.

Releasing a software with license-encumbered code, using specialized marketing to vote down a comment critical to company's product, justifying the lawsuit for copyright infringement similar to Oracle. I think users can see through it.

Let's wait for the results of this case it will make it clear that if someone wants to use ELK, they should purchase a license or use Amazon version (as they will fight the case, a small startup can't have budget to even defend).

For a small development firm its better they use really open source product like Solr, Vespa or Lucene directly with OpenJDK, not the oracle JDK. Looking at this can only say Richard Stallman is a visionary and saw this when he defined Free Software and accompanying licenses.

> Whether open source or proprietary, any responsible creator must protect their work.

What does that even mean in this context? It seems to just be there to have the subtext "So don't blame us for not being open source, we would be doing something like this even if we were!"

Which isn't entirely true. If the code were released under an open source license, someone else could copy it so long as they respected the license. So you wouldn't be suing someone for copying your code; you might for violating your license. Hard to say if the alleged infringer would have been happy to copy the code with an open source license, who knows.

Not saying it's "okay" to copy proprietary code (for varying definitions of okay), just challenging the implication that "this has nothing to do with it being open source or not" -- it surely does. And I can't see any point of that otherwise nonsequitor statement except that implication.

If you do not enforce your copyright, any case in the future will look weak. Elastic has to sue now that they have discovered infringement or the next guy will say "they don't care that much, they let search guard get away with it".

It's just that what you describe has got very little to do with how open source works. So why are they mentioning open source?

(As an aside, I'm also not sure that's as much of an issue in enforcing copyright as it is in enforcing trademark, but I'm no lawyer. But:

> Copyright is not like trademark. Copyright has a set period of time for which it is valid and, unless you take some kind of action, you do not give up those rights… However, unlike trademarks, which do have to be defended, there is nothing the precludes you from enforcing your copyrights at a later date.


But I'm not sure how relevant this really is).

It's an issue in enforcing copyright, not trademark. Search guard isn't pretending to be Elastic. They are pretending to have copy rights to the code beyond what the open source license says they have (hence the mention of open source).

Edit: I think you changed your comment a bit, but it's not that elastic couldn't let this one slide, it's that doing so would hurt them in the future. Their code can't be worth much if they just let people violate the license left and right.

I think the commenter above was trying to say that the "you have to enforce it every time or future cases are weaker" applies to protecting trademark rights and not copyrights. (They weren't saying this is a trademark case, they were saying that because this isn't a trademark case, trademark rules don't apply.)

The "you have to enforce it" thing is about genericized trademarks: I can sell a painkiller under the name "aspirin" because everyone forgot it's a trademark and so there's no argument that there's an intent to cause market confusion with Bayer. But the copious pirated copies of Windows don't let me sell it without licensing it from Microsoft.

Attorney here! (This is not legal advice, consult a licensed attorney in your jurisdiction.)

The doctrine of laches, which is probably what you're thinking of, isn't typically available as a defense to copyright infringement suits -- see https://en.wikipedia.org/wiki/Petrella_v._Metro-Goldwyn-Maye.... There is, however, a 3-year statute of limitations on such suits.

I was trying to be careful with the wording. It's not "use it or lose it" so much as "the code can't be worth much if you let the last guy steal it". Any lawyer in a future case worth their salt would tear Elastic apart in the damages arguments.

Perhaps. But:

1. Much of the relief sought in an infringement suit is the equitable relief (i.e., the injunction), not necessarily monetary damages.

2. If ElasticSearch can prove that Floragunn GmBH committed willful infringement, they can avail themselves of statutory damages of $150,000 per work, or, alternatively, any profits received by Floragunn (and all ElasticSearch has to show is proof of revenues minus expenses). See 17 U.S.C. section 504.

3. The fact that you give free samples away or willfully ignore some infringing behavior is not, in my experience, a conclusive indicator of the overall value of a good or service. (And there is not, to my knowledge, a legal doctrine that says so.) Apple gives all of its Macbook purchasers MacOS for free, yet nobody would seriously claim that a court wouldn't award them millions -- if not billions -- if the source code were misappropriated.

To your third point, I think we're comparing apples and oranges. Apple grants an OSX license to each macbook purchaser.

The equivalent example would be Microsoft releasing WinX which is just a straight line-for-line rip-off of OSX and Apple not doing anything about it. Then when someone else starts distributing OSX and gets sued they'd say "Apple already let Microsoft devalue OSX by distributing it". The distribution might have to stop, but Apple's lawyers are going to be fighting an uphill battle for significant damages. The same would be true here for Elastic if they didn't pursue this lawsuit.

But in the case of Apple in particular, Apple doesn't go after every infringement. The internets are full of instructions on how to set up a Hackintosh, including how to circumvent the copy protection measure (there's a 64-byte string that's a three-line poem widely available on the internets), but that did not stop them from going after Psystar.

I don't have time to do the legal research at the moment, but I haven't seen this argument used successfully in any cases to mitigate damages for a copyright infringement case. It's not like an ordinary tort where the plaintiff has such a duty. Do you have any cases to cite, or are you just guessing?

It seems a special kind of doublethink to release some software as free and some as proprietary.

Software isn’t property, and applying property rights to it is fundamentally unjust. This is the point of free software.

To then turn around and also license proprietary software simultaneously means that you were doing free software for the wrong reasons entirely.

I don’t agree with this. There is a lot of value to releasing open software. I’ve learned a ton from reading other people’s code and looking over various ways to implement certain patterns. Additionally, it allows a developer to see and understand how something internally functions, which can bring clarity to the positives and trade offs the software comes with.

> Software isn’t property

It's intellectual property. So, yes you can apply property rights to it.

But you can certainly argue that freely available software is better than privately owned and controlled software.

Intellectual property is a fiction, designed by legislators for one purpose: to enable industrial production of copyrightable works.

Don’t let the fact that player piano reel duplicators worried composers and musicians into getting congress involved cause you lose sight of this fact. It isn’t property, we just treat it that way for purposes of business and commerce.

It’s not real; you can’t own a sequence of bytes. This is the point of copyleft, to assert non-ownership within this existing, fictional framework our lawmakers have constructed for the benefit of these industries.

Licensing part of one’s software as free software and licensing another part of one’s software as non-free means that the entity doing the licensing doesn’t understand this concept, or why software freedoms matter.

What about the case surrounding Anthony Levandowski? Should he be allowed to freely take Waymo's IP when another company offers him a substantial economic incentive to do so?

Sure, https://www.newyorker.com/tech/annals-of-technology/how-the-... makes that argument (as does https://news.ycombinator.com/item?id=20812814). The Traitorous Eight taking Shockley's IP launched the Valley. Microsoft and Apple taking Xerox's IP (so brazenly that a movie would later call them "Pirates") launched the personal computer revolution. Google taking what the court of appeals calls Oracle's intellectual property launched Android. Illegal copies of everything from Windows to IDA Pro to Photoshop launched so many of our careers.

More specifically, the free software argument is that he should be allowed to freely take and publicize Waymo's IP even when nobody is paying him to do so, that the legitimate ends of society are better reached by not protecting software copyright at all and by encouraging work to be done in the commons. Copyright produces a reward for the first company to successfully implement some idea, but then results in that company alone being able to improve - and since they have the market cornered, they have little incentive to improve until some other company catches up with them. It is far from obvious that this produces better software, better research, or better results for society than just letting anyone build on top of anyone else's work, monetizing (if necessary) on delivering whole products with a support lifecycle instead of on code itself. Waymo and Uber will both still profit from self-driving cars even if they're not the exclusive owners of the code or research behind making them work, and even if Waymo decides to stop having their own fleet of cars, it's still valuable for others to hire the original engineering team to work on specific features.

Or, let me put this another way. Is the Levandowski case justification for software patents? Why should Uber be able to use Google's algorithms and ideas without licensing them? That's intellectual property too, isn't it?

The free software argument is neither novel nor rare in the hacker community, and I'm confused to see someone who advocates it getting such a hostile and skeptical reception here.

Well, I don't believe copyright actually protects the first company to successfully implement an idea. In the instance of cars, the goal is to have self-driving cars. Some people think LIDAR is necessary, Tesla doesn't. There are often many solutions to the same problem.

In the case of Elastic, we instead could just have services SaaS services where the source code is protected behind an API like Algolia Search instead. I realize Elastic did FOSS partially for marketing purposes, but I don't see how Amazon lifting Elastic's IP and wrapping behind a service is beneficial in the long run. It might be better, cheaper and faster today, but when Elastic goes out of business, Amazon likely would not put the same type of effort into future development and new features IMO. I do think this is different than Google, Oracle and Java. In that circumstance, Google wrote its own source code using Java versus extending Java and offering at as a "new" language. This is more closely akin to Uber leveraging Google Maps when it started.

A lot of HN’s electorate’s salaries depend upon not understanding the importance of software freedom.

After all, the server we are using to communicate is paid for with dollars made from keeping source code secret from the users of that same software.

Let’s not forget that this is first and foremost a commercial YC project. HN is not free software and its source is not available to its users. It is unsurprising that users here are hostile to the plain, objective fact that it is clearly and self-evidently impossible to own a sequence of bits in the same way one can own a physical object.

Minor point: the source code for this website is available from http://www.arclanguage.org/arc3.2.tar (see news.arc) and is free software under the artistic license.

Other than that, yes, agreed - despite the name "Hacker News," this forum is run by and for the subset of the hacker community that has a vested interest in monetizing software itself.

It is my understanding that HN is a fork of that code and is not released.

Oh, I definitely assume that, but I'm on the side of "you should be permitted to release / reuse source" and less so on the side of "you should be obligated to release source."

(That is, I see software copyright as a thing we should fundamentally not have just like software patents, and I see the GPL's copyleft approach as useful in a world with copyright law and I gladly write GPL software because we live in that world, but I don't think it by itself is justification to retain the leverage of copyright.)

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact