It's not so much about "never nil", but rather "never accidentally null". Rust's compiler prevents you from moving data into a method which then nulls it out, leaving a dangling pointer in the calling code. At best this dangling pointer will look at garbage and cause a crash or undefined behavior. At worst, it will look at other, actively-used memory and cause a security vulnerability. Rust will just refuse to compile until this problem is fixed.
Static analysis of these lifetimes allow a whole class of errors to be avoided (dangling pointers, double-free, iterator invalidation, etc).
Rust's Options are handy for a lot of stuff (async APIs, concurrent code that may/may not succeed, error codes, etc). But they are just icing really, not really the main thrust of Rust's memory model.
> It's certainly not easier to use and reason about, IMO.
To offer a counter-viewpoint, I find that Option<> (and Result<>) are very easy to reason about. They tell you exactly what to expect from a function, and you don't have to guess if you need to catch exceptions or let them throw higher up. Everything is explicit, the only surprises are panics which are cataclysmic anyhow.
> And it seems just as likely you'll end up crashing your program due to a bounds-check error (which may happen more often since Rust encourages indexing over references due to this very design.. at least, so I've read).
If you use iterators in Rust, you never need to worry about out of bounds errors. In fact they skip range checking altogether, since you are guaranteed that the value you are iterating on won't change under your feet (no iterator invalidation, etc), which generates more efficient code
If you use explicit indexing, then yes, you can have a runtime panic if you go OOB. But that's the nature of explicit indexing. It also has to include those safety checks, so will be slower code.
> Rust's compiler prevents you from moving data into a method which then nulls it out
Just for clarity, we have this in Nim too, eg:
Foo = ref object
Bar = object
val: Foo not nil
f = Bar() # Error, 'val' must be set
proc foobar(f: Foo not nil): Foo not nil =
return nil # Error, can't return nil
foobar(nil) # Error, can't pass nil
> At best this dangling pointer will look at garbage and cause a crash or undefined behavior. At worst, it will look at other, actively-used memory and cause a security vulnerability.
In C/C++, yes, but this isn't so applicable to Nim where we have GCed references and 'not nil' constraints.
> If you use iterators in Rust, you never need to worry about out of bounds errors
Well I was not talking about iterating through a list, but rather maintaining arbitrary indexes to a mutable list. Eg, a Sprite which contains a index to a Texture array. In that scenario it's just as easy to miscalculate and crash your program via a bounds-checking error as it is to crash by nil-deref.
> To offer a counter-viewpoint, I find that Option<> (and Result<>) are very easy to reason about..
It's good that Rust works for you, truly. And like I said in another post, I agree Rust's design here may be better for some domains. However, Nim's design still feels more elegant and straight-forward to me. Luckily, we both get a powerful language that suits us, regardless of which one we prefer :)
Oh, sorry, I should have been clearer: I wasn't trying to disparage Nim at all (it's on my list of languages to play with). I was just clearing up some points about Rust :)
> Well I was not talking about iterating through a list, but rather maintaining arbitrary indexes to a mutable list. Eg, a Sprite which contains a index to a Texture array. In that scenario it's just as easy to miscalculate and crash your program via a bounds-checking error as it is to crash by nil-deref.
The solution here is to just use a reference instead of an arbitrary index. If you hand out references you can lean on the compiler to enforce memory safety -- the compiler won't let you access data that is no longer alive, won't let you accidentally share across thread boundaries if you don't explicitly want that, etc. And if that was a shared mutable list, it's doubly important to let the compiler help you reason about it, since shared, mutable state is the main source of data races.
This is one of those cases where leveraging the compiler allows you to write better, safer code.
No worries. I also wasn't implying you where trying to discourage Nim, and I hope my post didn't come off as accusatory. Cheers!
> The solution here is to just use a reference instead of an arbitrary index.
Ah, sorry I should have said "mutable Texture array". In that case Rust's borrow-checking will 'freeze' the array, preventing Textures from ever being changed during the lifetime of your Sprites. So you're left with either Option<> or indexing as a solution, each with it's own merits, but neither as.. practical as nilable GCed references, IMO (again, just my opinion.. others seem to find it easy enough).
Out of curiosity: How do you convert from a nilable `Foo` to a `Foo not nil`? As in, what do you do if you have a function maybe returning a `Foo` and want to pass its value to a function taking `Foo not nil`?
You prove to the compiler that the nilable var is not-nil via if statement. Eg:
proc foobar(f:Foo not nil) =
let f = Foo() # nilable ref
let b: Foo not nil = f # Error, can't prove 'f' is not nil
if f != nil:
let b: Foo not nil = f # can assign non-nil vars to 'f'
foobar(f) # or can pass 'f' directly
Yes, this. So much. And because I'm a biologist by training and can't resist:
> Bacterias bring us health
Unless it is e. coli, salmonella, listeria, botulism, etc etc. Some bacteria is helpful, most is neutral and some will make you very sick or kill you.
> I think we need the bacterias and all the life on the food
There is just as much bacteria on your skin, on the doorknob and on the basket holding you bananas to service your "daily dose of bacteria". Seriously, bacteria is everywhere. There is good evidence that ultra-clean lifestyles can increase allergies as you mentioned, but that is where you try to sterilize everything. It isn't where you simply cook food.
> Today I'm living on 90% fruits
Please make sure you are getting enough iron, or you could be on the road to anemia and serious health complications. :)
> When I eat dairy or something cooked my poop goes bad on the next day, it's like our body is the best doctor around.
Or, different foods have different molecular structure, which is digested and metabolized by different pathways in your stomach and intestines, leading to different aromatic by-products. Just because you don't like the smell doesn't mean there is something "wrong". I don't like the smell of manure all that much, but it is a very "good" by-product, if you want to label things good or bad.
It would remove a lot of nice search features, however. If you just index tokens without positional information, you have a much harder time performing phrase matching. If you include positional information, you can probably crack the encryption because some tokens are statistically more likely to appear next to each other than others.
If you index shingles (phrase chunks) instead, you lose out on sloppy phrases...you can only match exact phrases. I imagine you can perform a similar statistical attack too.
Hell, just getting the term dictionary would probably allow you to reverse engineer the tokens, since written language follows a very predictable power law.
Hashing also removes the ability to highlight search results, which significantly degrades search functionality for an end user.
Basically, yes, you can do search with encrypted tokens...but it will be a very poor search experience.
As other's have mentioned, it's entirely possible to have multiple fields with different analysis chains. Elastisearch handles multi-languages very well. You just have to start changing the defaults, since the defaults assume a single language for all fields (namely English).
E.g. "title_german" and "title_english" can each have their own analyzer specific to the language. Or you could have a single field "title" which then uses multifields to index a "title.english" and "title.german" field.
The key is that at search time you need to use a query that understands analysis. So you should use something like the `match` query, or `multi_match` for multiple fields. These queries will analyze the input text according to the analyzer configured on the underlying field (e.g. english or german)
Topics include: language analysis chains, pitfalls of multiple languages per doc (at index vs search time), one language per field schemas, one language per doc schemas, multi-language per field schemas, etc
Unsure if you are talking about "The Definitive Guide" from the original link, or "Exploring Elasticsearch" from the OP.
If you are talking about "The Definitive Guide", it is targeted to Elasticsearch 1.0. Most of the APIs and concepts are backwards compatible, but we wanted to target 1.0 because it brings a lot of great new features.
> Systematic hard work can get you results in academia.
This is not necessarily true. At least in life sciences, there is a big element of pure, dumb luck. Biological systems are inherently noisy, and no matter how diligent we try to be about our processes and protocols, there is always luck.
I spent nearly two years of my life performing a single protocol (an endocytic receptor internallization assay) and the line between "good results" and "wasted a week" was incredibly thin. Some things just require luck to work out, no matter how careful you are.
I left academic biology because I didn't want luck playing into my career. In academic biology you must be incredibly smart, incredibly dedicated, willing to work long hours with little pay, AND be incredibly lucky.
I looked around at the post-docs in my department (at MIT, mind you) and saw brilliant people who would never produce a top-tier article, who had spent so long in their post-doc that they had no chance of ever becoming a professor. They would probably wash out to some industry job at Merck testing cholesterol drugs after wasting 10 years of their life pursuing some fictitious dream.
Truth be told, I wasn't as brilliant as most of the people around me, so I made a judgment call and left. It was the right decision, I'm happier than ever (and actually make money too).
Not the OP, but GC issues in Elasticsearch basically boil down to memory pressure (obviously), which is usually caused by facets. Facets eat a lot of memory, especially if you are faceting high-cardinality fields - think fields like "tags" or any analyzed field. High cardinality, analyzed strings is the easiest way to blow out the heap.
There are other reasons, but that is like 90% of GC issues. To solve it, you need to make sure your faceted fields are configured well (usually not_analyzed) and assess how much memory is available. You may be able to index and even full-text search ten billion docs on a single machine, but faceting it may just be too much to ask for a single node.
Omiting norms, disabling bloom filters on old indices and enabling doc values are other ways to help alleviate field-data pressure.
Other GC culprits can be: too large bulk requests, unbounded threadpool queues, or something like parent/child/scripts/filter cache keys eating all your memory. Also don't go above 30gb heaps, the JVM becomes unhappy :)
This is technically a very naive approach, since a simple rsync of the data dirs will include replicas too. If you were more diligent you could check the state files in each shard directory and only copy out the primaries.
You can just google "elasticsearch rsync" to get information, and even scripts, that will do this for you. The thing is... you REALLY need to know what you're doing when you go this route.
Also, you can try the gateway feature. Gateway is actually pretty straightforward. Restore WILL be slow though. And for many scenarios ... it is not ideal. (You don't want to take a day, or even a few, to restore after a failure.)
I think the best advice is...
Update to 1.0.
Just go to 1.0 and do snapshots... you will save yourself A LOT of headaches.