Hacker Newsnew | comments | show | ask | jobs | submit | polyfractal's comments login

It's not so much about "never nil", but rather "never accidentally null". Rust's compiler prevents you from moving data into a method which then nulls it out, leaving a dangling pointer in the calling code. At best this dangling pointer will look at garbage and cause a crash or undefined behavior. At worst, it will look at other, actively-used memory and cause a security vulnerability. Rust will just refuse to compile until this problem is fixed.

Static analysis of these lifetimes allow a whole class of errors to be avoided (dangling pointers, double-free, iterator invalidation, etc).

Rust's Options are handy for a lot of stuff (async APIs, concurrent code that may/may not succeed, error codes, etc). But they are just icing really, not really the main thrust of Rust's memory model.

> It's certainly not easier to use and reason about, IMO.

To offer a counter-viewpoint, I find that Option<> (and Result<>) are very easy to reason about. They tell you exactly what to expect from a function, and you don't have to guess if you need to catch exceptions or let them throw higher up. Everything is explicit, the only surprises are panics which are cataclysmic anyhow.

> And it seems just as likely you'll end up crashing your program due to a bounds-check error (which may happen more often since Rust encourages indexing over references due to this very design.. at least, so I've read).

If you use iterators in Rust, you never need to worry about out of bounds errors. In fact they skip range checking altogether, since you are guaranteed that the value you are iterating on won't change under your feet (no iterator invalidation, etc), which generates more efficient code[1]

If you use explicit indexing, then yes, you can have a runtime panic if you go OOB. But that's the nature of explicit indexing. It also has to include those safety checks, so will be slower code.

[1] https://doc.rust-lang.org/book/iterators.html

-----


> Rust's compiler prevents you from moving data into a method which then nulls it out

Just for clarity, we have this in Nim too, eg:

  type
    Foo = ref object
    Bar = object
      val: Foo not nil
  let
    f = Bar() # Error, 'val' must be set
  
  proc foobar(f: Foo not nil): Foo not nil =
    return nil # Error, can't return nil
  
  foobar(nil) # Error, can't pass nil
> At best this dangling pointer will look at garbage and cause a crash or undefined behavior. At worst, it will look at other, actively-used memory and cause a security vulnerability.

In C/C++, yes, but this isn't so applicable to Nim where we have GCed references and 'not nil' constraints.

> If you use iterators in Rust, you never need to worry about out of bounds errors

Well I was not talking about iterating through a list, but rather maintaining arbitrary indexes to a mutable list. Eg, a Sprite which contains a index to a Texture array. In that scenario it's just as easy to miscalculate and crash your program via a bounds-checking error as it is to crash by nil-deref.

> To offer a counter-viewpoint, I find that Option<> (and Result<>) are very easy to reason about..

It's good that Rust works for you, truly. And like I said in another post, I agree Rust's design here may be better for some domains. However, Nim's design still feels more elegant and straight-forward to me. Luckily, we both get a powerful language that suits us, regardless of which one we prefer :)

-----


Oh, sorry, I should have been clearer: I wasn't trying to disparage Nim at all (it's on my list of languages to play with). I was just clearing up some points about Rust :)

Edit:

> Well I was not talking about iterating through a list, but rather maintaining arbitrary indexes to a mutable list. Eg, a Sprite which contains a index to a Texture array. In that scenario it's just as easy to miscalculate and crash your program via a bounds-checking error as it is to crash by nil-deref.

The solution here is to just use a reference instead of an arbitrary index. If you hand out references you can lean on the compiler to enforce memory safety -- the compiler won't let you access data that is no longer alive, won't let you accidentally share across thread boundaries if you don't explicitly want that, etc. And if that was a shared mutable list, it's doubly important to let the compiler help you reason about it, since shared, mutable state is the main source of data races.

This is one of those cases where leveraging the compiler allows you to write better, safer code.

-----


No worries. I also wasn't implying you where trying to discourage Nim, and I hope my post didn't come off as accusatory. Cheers!

EDIT:

> The solution here is to just use a reference instead of an arbitrary index.

Ah, sorry I should have said "mutable Texture array". In that case Rust's borrow-checking will 'freeze' the array, preventing Textures from ever being changed during the lifetime of your Sprites. So you're left with either Option<> or indexing as a solution, each with it's own merits, but neither as.. practical as nilable GCed references, IMO (again, just my opinion.. others seem to find it easy enough).

-----


Out of curiosity: How do you convert from a nilable `Foo` to a `Foo not nil`? As in, what do you do if you have a function maybe returning a `Foo` and want to pass its value to a function taking `Foo not nil`?

-----


You prove to the compiler that the nilable var is not-nil via if statement. Eg:

  proc foobar(f:Foo not nil) =
    discard
  
  let f = Foo() # nilable ref
  let b: Foo not nil = f # Error, can't prove 'f' is not nil

  if f != nil:
    let b: Foo not nil = f # can assign non-nil vars to 'f'
    foobar(f) # or can pass 'f' directly

-----


Ah, good to know. So Nim does static analysis on conditionals where Rust would use an explicit pattern match to get at the contained value.

Might be a good example for http://nim-lang.org/manual.html#not-nil-annotation

-----


Yes, this. So much. And because I'm a biologist by training and can't resist:

> Bacterias bring us health

Unless it is e. coli, salmonella, listeria, botulism, etc etc. Some bacteria is helpful, most is neutral and some will make you very sick or kill you.

> I think we need the bacterias and all the life on the food

There is just as much bacteria on your skin, on the doorknob and on the basket holding you bananas to service your "daily dose of bacteria". Seriously, bacteria is everywhere. There is good evidence that ultra-clean lifestyles can increase allergies as you mentioned, but that is where you try to sterilize everything. It isn't where you simply cook food.

> Today I'm living on 90% fruits

Please make sure you are getting enough iron, or you could be on the road to anemia and serious health complications. :)

> When I eat dairy or something cooked my poop goes bad on the next day, it's like our body is the best doctor around.

Or, different foods have different molecular structure, which is digested and metabolized by different pathways in your stomach and intestines, leading to different aromatic by-products. Just because you don't like the smell doesn't mean there is something "wrong". I don't like the smell of manure all that much, but it is a very "good" by-product, if you want to label things good or bad.

-----


There's tons of non-virulent e. coli living in your small intestine right now... E. coli is a specific species under the Salmonella genus.

And neutral bacteria are great when they help the good bacteria out-compete any virulent strains.

I remember being told that only 1% of bacteria are virulent in bacterial phys & metabolism, NIH seems to agree[1]. And that you have 10x as many bacteria on/in your body than you have cells.

[1] http://www.nlm.nih.gov/medlineplus/bacterialinfections.html

-----


It would remove a lot of nice search features, however. If you just index tokens without positional information, you have a much harder time performing phrase matching. If you include positional information, you can probably crack the encryption because some tokens are statistically more likely to appear next to each other than others.

If you index shingles (phrase chunks) instead, you lose out on sloppy phrases...you can only match exact phrases. I imagine you can perform a similar statistical attack too.

Hell, just getting the term dictionary would probably allow you to reverse engineer the tokens, since written language follows a very predictable power law.

Hashing also removes the ability to highlight search results, which significantly degrades search functionality for an end user.

Basically, yes, you can do search with encrypted tokens...but it will be a very poor search experience.

-----


As other's have mentioned, it's entirely possible to have multiple fields with different analysis chains. Elastisearch handles multi-languages very well. You just have to start changing the defaults, since the defaults assume a single language for all fields (namely English).

E.g. "title_german" and "title_english" can each have their own analyzer specific to the language. Or you could have a single field "title" which then uses multifields to index a "title.english" and "title.german" field.

The key is that at search time you need to use a query that understands analysis. So you should use something like the `match` query, or `multi_match` for multiple fields. These queries will analyze the input text according to the analyzer configured on the underlying field (e.g. english or german)

There is a ton more information in The Definitive Guide book, in the chapter on languages: http://www.elasticsearch.org/guide/en/elasticsearch/guide/cu...

Topics include: language analysis chains, pitfalls of multiple languages per doc (at index vs search time), one language per field schemas, one language per doc schemas, multi-language per field schemas, etc

-----


FWIW, we didn't choose the animal. O'Reilly's design department does a voodoo incantation and choose an animal...the authors/editors have zero input :)

I quite like the snake though, think it looks nice

-----


Unsure if you are talking about "The Definitive Guide" from the original link, or "Exploring Elasticsearch" from the OP.

If you are talking about "The Definitive Guide", it is targeted to Elasticsearch 1.0. Most of the APIs and concepts are backwards compatible, but we wanted to target 1.0 because it brings a lot of great new features.

-----


> Systematic hard work can get you results in academia.

This is not necessarily true. At least in life sciences, there is a big element of pure, dumb luck. Biological systems are inherently noisy, and no matter how diligent we try to be about our processes and protocols, there is always luck.

I spent nearly two years of my life performing a single protocol (an endocytic receptor internallization assay) and the line between "good results" and "wasted a week" was incredibly thin. Some things just require luck to work out, no matter how careful you are.

I left academic biology because I didn't want luck playing into my career. In academic biology you must be incredibly smart, incredibly dedicated, willing to work long hours with little pay, AND be incredibly lucky.

I looked around at the post-docs in my department (at MIT, mind you) and saw brilliant people who would never produce a top-tier article, who had spent so long in their post-doc that they had no chance of ever becoming a professor. They would probably wash out to some industry job at Merck testing cholesterol drugs after wasting 10 years of their life pursuing some fictitious dream.

Truth be told, I wasn't as brilliant as most of the people around me, so I made a judgment call and left. It was the right decision, I'm happier than ever (and actually make money too).

-----


Not the OP, but GC issues in Elasticsearch basically boil down to memory pressure (obviously), which is usually caused by facets. Facets eat a lot of memory, especially if you are faceting high-cardinality fields - think fields like "tags" or any analyzed field. High cardinality, analyzed strings is the easiest way to blow out the heap.

There are other reasons, but that is like 90% of GC issues. To solve it, you need to make sure your faceted fields are configured well (usually not_analyzed) and assess how much memory is available. You may be able to index and even full-text search ten billion docs on a single machine, but faceting it may just be too much to ask for a single node.

Omiting norms, disabling bloom filters on old indices and enabling doc values are other ways to help alleviate field-data pressure.

Other GC culprits can be: too large bulk requests, unbounded threadpool queues, or something like parent/child/scripts/filter cache keys eating all your memory. Also don't go above 30gb heaps, the JVM becomes unhappy :)

-----


The pre-Snapshot/restore method is:

- Pause indexing

- Issue a flush request

- Rsync data directories somewhere

- Resume indexing

This is technically a very naive approach, since a simple rsync of the data dirs will include replicas too. If you were more diligent you could check the state files in each shard directory and only copy out the primaries.

-----


Polyfractal is right.

You can just google "elasticsearch rsync" to get information, and even scripts, that will do this for you. The thing is... you REALLY need to know what you're doing when you go this route.

Also, you can try the gateway feature. Gateway is actually pretty straightforward. Restore WILL be slow though. And for many scenarios ... it is not ideal. (You don't want to take a day, or even a few, to restore after a failure.)

I think the best advice is...

Update to 1.0.

Just go to 1.0 and do snapshots... you will save yourself A LOT of headaches.

-----


Regarding updates, you can use the Update API for partial updates, and include a script to do things like "counter += 1" or "add value to existing array".

Internally it is still reindexing the entire document, but from your application's perspective, the Update API is a lot friendlier.

http://www.elasticsearch.org/guide/en/elasticsearch/referenc...

-----


Thanks for pointing that out, it will be really useful!

-----

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: