
Solr 8.6.1 - based2
http://mail-archives.apache.org/mod_mbox/www-announce/202008.mbox/%3CCAD4GwrP%2B8r8W2GDDwaFb%3DXbP%2BQGfrbkS_kOhTAdjDRbPkS2%3DAw%40mail.gmail.com%3E
======
technicolorwhat
Weirdly I still find Solr easier to use then ES. Solrs's bulk import has csv
support out of the box instead of converting it to json first and increasing
its size and payload a lot. The query DSL is way easier and better documented
I find. And it tends not to break all the time between releases. I moved a
couple of times from Solr to Es and back again. SOLR comes with a tiny admin
that you can just use OOB and use to fire your queries against instead of
choosing another frontend, configuring and setting that up etc. I find solr's
experience way easier and lower ceremony after all.

~~~
sheeshkebab
It’s also amazing how versatile solr is - I’ve used as both an embedded search
lib in an tool as well as a multi region distributed replicated cluster in
large scale prod environments, all from pretty much the same distribution.

~~~
leetrout
Could you speak more about using solr as a primary DB? Or did I misunderstand
how you were using it?

~~~
joking
you can, but almost nobody will recommend to use it, neither will do with
elasticsearch. In most cases it works, but it has priorized speed over
resilience, so it's better to have a source from where you can rebuild the
index.

~~~
cmckn
> it's better to have a source from where you can rebuild the index.

Having worked for several years at a company selling search, I can't emphasize
this enough. Rebuilding indices should always be straightforward, and your
data should be very accessible in some other (preliminary) form. We ran into
so many issues, and had so many panics, because the index had become the only
place some things existed. It's also way easier to tweak (read: optimize) your
schema over time when you're in the habit of rebuilding the index.

------
softwaredoug
Solr is great, but it's frustratingly unopinionated about a lot of things.

Should you use SolrCloud or Solr standalone mode? Use the classic config
files? Or dynamic/managed schemas and config APIs?

You may say "to each their own" but when you want to do a simple thing, like
install a plugin, and there's about 5 ways to do it - with each way working
for a different 75% of use cases, it can be rather frustrating. Compare this
to Elasticsearch where there's one company behind everything (for better or
worse) with very strict and clear opinions about how to do something.

I'm working on a little project with my personal opinions for Solr use. Call
it Effective Solr, or Solr The Good Parts. Who knows when/if it'll get done,
and they're my somewhat informed opinions, but maybe someone would find them
useful.

It's rather early, but it never hurts to begin gathering comments on this

[https://gist.github.com/softwaredoug/3212fa9c5a198a565a9a77b...](https://gist.github.com/softwaredoug/3212fa9c5a198a565a9a77b8d6f888ed)

------
MarsTeam
Solr is amazing and I wish there was more buzz around it

------
tekkk
Solr is one of those technologies which works but isn't really glorious to use
and is bit stuffy with its XML configurations and Java interfaces. It's a bit
of shame, because search engines are so popular nowadays and everybody seems
to be fixated on using ElasticSearch. Which from what I've read and heard is
resource-hungry and not really cut for simple text-search.

Maybe someday somebody will create a new search engine that will be hipster
and easy to use, like Algolia but open-source. I'm bit curious though how
incumbent Lucene is as the core search engine that it would even make sense to
try and recreate it in another language. Probably too much that unless
somebody has money to throw around it will remain so to the far future.

~~~
lukevp
Search really has 3 main levels.

1\. Core ranked inverted index data store (Lucene)

2\. API layer on top of Lucene with easier query, index, data importing,
sharding, etc. (Solr/Elastic Search)

3\. Fully hosted API / UI for easier GUI for developers and search relevance
engineers (Algolia, Lucidworks Fusion).

The new hotness in search is currently Rust-based tooling. Rust is a great
application for search as it's very performance-sensitive and data-structure
heavy, and once the indexes are built are fairly stable, leading to the 90th+
percentile latencies and throughput to be much better than Lucene-based
libraries that are built on top of the JVM.

For the low level core search engine like Lucene (inverted indexes, TF-IDF
ranking, etc), there's Tantivy [1]

For the middle tier (ES) there's Sonic [2]

MeiliSearch [3] is a play for the hipster open-source Algolia. It's in Rust,
is MIT licensed, supports self-hosting, has an out of the box web interface.

[1] [https://github.com/tantivy-search/tantivy](https://github.com/tantivy-
search/tantivy) [2]
[https://github.com/valeriansaliou/sonic](https://github.com/valeriansaliou/sonic)
[3]
[https://github.com/meilisearch/MeiliSearch](https://github.com/meilisearch/MeiliSearch)

~~~
tekkk
Cool, very interesting! Thanks for the summary, I had/have a project that
currently uses Solr for quick text search yet the integration has not been
totally painless. Also it's kind of ridiculous how much RAM even the tiniest
instance uses.

I myself picked up Solr (instead of ES) as it was recommended as better bet
for my use case but I did not research in full what others options there were.

Rust seems like a smart choice like it probably is for this type of very fast
processing. I guess this is yet another reason to learn it.

~~~
gifflar
I also found a nice overview in the MeiliSearch documentation [1]. Maybe, you
find it interesting.

[1]
[https://docs.meilisearch.com/resources/comparison_to_alterna...](https://docs.meilisearch.com/resources/comparison_to_alternatives.html#about-
meilisearch)

------
gengstrand
I remember load testing both ES and Solr about 3 years ago. What was tested
was the indexing of randomly generated 500 word documents and keyword based
search. When it came to indexing, Solr was faster but ES had better
throughput. When it came to querying, Solr was the clear winner.

[https://glennengstrand.info/software/performance/elasticsear...](https://glennengstrand.info/software/performance/elasticsearch/solr)

~~~
scaryclam
I tend towards the rule of thumb that if you want search, Solr is what you
use, if you want logging, ElasticSearch is the way to go.

------
based2
[http://mail-archives.apache.org/mod_mbox/www-announce/202008...](http://mail-
archives.apache.org/mod_mbox/www-
announce/202008.mbox/%3CCAD4GwrNogj2iRP5UQUf87ORedkrAtp2oR%3Dsca8RVjdOP9M5L0Q%40mail.gmail.com%3E)
Apache Lucene 8.6.1 released

------
dimitar
Does Solr have what the HFT guy calls 'catastrophic typing' in ES?

[https://thehftguy.com/2020/08/04/the-differences-between-
spl...](https://thehftguy.com/2020/08/04/the-differences-between-splunk-
kibana-and-graylog/)

~~~
koffiedrinker
Yes, since you have to define the type of each field. If you have a managed
schema (=you don't define a schema but let SolR auto-create it), SolR will
pick the type for you. Any document with a field not matching the type will be
rejected.

------
based2
[https://linuxsecurity.com/advisories/deblts/debian-lts-
dla-2...](https://linuxsecurity.com/advisories/deblts/debian-lts-
dla-2327-1-lucene-solr-security-update-22-06-48?rss)

~~~
teddyh
More canonical source:

[https://lists.debian.org/debian-lts-
announce/2020/08/msg0002...](https://lists.debian.org/debian-lts-
announce/2020/08/msg00025.html)

