How did you figure this out? I've done lots of Linux software build troubleshooting as a result of using Gentoo, BuildRoot, and pacaur, but this doesn't ring any bells for a common issue
Did you try spacy's most similar method? It's written in cython so is presumably quite fast as well. Thanks for the rust implementation though, I will most likely use this.
I’ve not much to say on the actual lib, it seems great! However, don’t feel compelled to put all your rust code into a single lib.rs. You can split your work into several files and use ‘pub use’ and ‘mod’ in lib.rs to re-export your functions & types into a public API of your choosing.
cargo check and format time might also slightly improve!
Funny, I often say the opposite. Don't feel compelled to split up your lib.rs. It's really refreshing to see a nice, compact library in one or two files. Much easier to follow, especially over "type per file". Of course, there are limits, but for a small lib like this, I personally would keep it in a single, or maybe two files.
I have a fair bit of experience writing Rust code and the current status is totally deliberate. I find module file sizes of about 400-800 lines of code optimal in terms of my ability to find things vs the unnecessary complexity of having to skip around files when changing something that touches an API boundary.
This webpage use a significant amount of CPU constantly for no apparent reason (as far as I can see it is mostly a static webpage). What the hell ? Is it mining crypto in the background ?
Sorry, this page had a useEffect/setState render loop. We are running react@experimental with concurrent mode, and missed the error. Rolling out a fix now. Thanks!
These results are less accurate than Google Translate. But they are far faster to get, and far less expensive to generate: https://cloud.google.com/translate/pricing — our goal is here is speed. We want to search through many possibilities as quickly as possible.
The word vectors have been aligned in multiple languages. Using an approximate nearest neighbor search we are able to find the nearest vector to the input in multiple languages very quickly.
To keep the example simple, we did not try to filter the data through hand-built language dictionaries. In fact, we simply drop words in other languages that also appear in the English .vec file. Words like "ciao" appear frequently enough in otherwise English sentences that the example code drops it from Italian, and so is not shown in the results:
One improvement would be to filter out any words that do not appear in a hand-curated dictionary instead of filtering out words that already appear in English. We decided not to show how to do this because we'd already introduced a few concepts, like aligned word vectors, approximate nearest neighbour searches, and wanted to keep the example as simple as possible.
The Italian "auguri" means "best wishes"; "chiamatemi" means "call me". Neither is a plausible translation of "hello". The obvious one, "ciao", is missing.
I thought Hello was invented with the telephone. Prior to that, English greetings were good morning/evening. What do Italians and French say when they pick up the phone? Allora?
No, bonjours exists (it's simply the plural form of bonjour used as a noun) but the contexts it is used are very very infrequent so it's weird to find it in that list.
It seems to be a very domain specific solution, they are trying to present versions of words in customer requested domain names if already taken.
Like you type in “stargazer. com”, system sees it’s already registered, and returns a “sorry sir it’s taken” page, with similar words listed as “but maybe try these words: astronomer, observatory, telescope, shooting star...”.
So it’s not serious translation, more of an inexpensive quick dictionary search. I guess it’s okay for its intended purposes.
Can something like this be done to compare/translate subsequences COVID genetic code to SARS and other virus genetic codes. Would be interesting how much overlap there is. And would further the research into where it came from.
Bioinformaticists have been able to do that with traditional algorithms for years (dynamic programming gets you a long way to compute an edit distance for example).
It sounds like you're thinking of "sequence alignment", which is a pretty standard bioinformatics tool.
BLAST (=Basic Local Alignment Search Tool) is one common version, and the NIH'S NCBI has a variety of nice online tools here: https://blast.ncbi.nlm.nih.gov/Blast.cgi
Note that it does take a little bit of background knowledge to interpret:some motifs are just really common, others are shared.