This project is the first that tickles my brain in the right serendipitous ways; it merges topics of my recent interests.
However, after a VERY short perusal, I grew a giant sense of empathy for non-native English speakers. The readme is gentle enough to English speakers (aka: +95% English) no less I felt like I muddled through renaming tokens in my mind as I went. However, that quickly showed me two things.
1. It reminds me why I never seem to finish classic Russian literature… so often get lost in the introductory parade of names that are a cache miss for my usual set of names.
2. This is perhaps a significant cultural muscle that has never been necessitated for English speakers. Since the earth has largely been using English (in some capacity) for significantly longer than my life span - as my favorite joke says in the punch line “what do you call someone who only knows one language… uni-lingual… jk: American”
PS: it seems like there could be an open registry maintained by “Americans like me” who would rather pre-process the code for tokens within the docs and src… seems like a “DefinitelyTyped style” definitions registry would be very niche, but SUPER useful.
> as my favorite joke says in the punch line “what do you call someone who only knows one language… uni-lingual… jk: American”
"Another thing to keep in mind, when you get to feeling bad about being monolingual, is that the fair question is not 'how many languages do you know?' It is, 'of the languages spoken by five million people or more within a thousand miles or so of where you live, what percentage do you know?'"
> of the languages spoken by five million people or more within a thousand miles or so of where you live, what percentage do you know
By that metric you shouldn't feel bad for not speaking Russian in most of Russia, or for not knowing the most common languages in your immediate surroundings in large swaths of Africa (i.e. most Bantu languages would be excluded).
Lots of the heavily multilingual people in the world also have a lot of irl interactions that necessitate knowing languages other than their mother tongue. Of course that’s not the only reason to learn languages, but it is both common and effective. So in terms of expected number of languages spoken I think it’s a good baseline.
In the case of the US the languages that would meet that criteria in most parts of the country would be English and Spanish. But there are also hierarchies around languages, ie people that speak the more dominant languages are less likely to speak the less dominant languages, but the speakers of the less dominant languages are expected to speak the more dominant languages and suffer higher consequences if they don’t.
I prefer to view it as "what's the likelihood that, with your current knowledge of languages, you're able to communicate with any person you may presumably want to speak to in the future."
This is why it's so much easier to only speak English than it is to only speak another language.
> This is why it's so much easier to only speak English than it is to only speak another language.
It’s pretty easy to only speak German within the DACH countries. Huge online communities as well that speak German.
I’d wager it’s similar for several other large languages, e.g. Spanish or Chinese, OTOH they are even larger, OTOH they probably don’t have the same advanced dubbing industry that we have.
I am Spanish; we have a very strong dubbing industry and pretty much all movies have been dubbed since forever. In fact nowadays we usually get two dubs, one for Spain and one for LATAM, with serious online fights about which one is better xD
Oh interesting, I heard that it’s almost never the case that people are so attached to dubs as here, where people often don’t really care about the original voices and instead about the dubbers, which might even get movie billing.
> one for Spain and one for LATAM
What’s the difference there for someone who is only bilingual (English and German)? :D
There's a bit of variety in the vocabulary used between LATAM and Spain, and honestly even between LATAM countries there's variance.
An example that comes to mind, as a first year Spanish student (I'm doing my best but fact check me because I'm very much still learning!) with a Latin wife, is "el carro," which means car, and is common in some Spanish speaking countries but others might use "el coche" -- I believe this dialectic difference even exists within Latin American Spanish speaking countries!
There are differences in pronunciation too but that obviously doesn't apply to subtitles
Regarding "car", there is a third option: "auto", which if I am not mistaken, is the preferred word in the Southern Cone.
But yes, it is mostly a difference of vocabulary, accent and pronunciation, and also how or if to translate the names of characters and the movies themselves.
I also would love to hear more about the cluster shapes and cardinality of the coordinate system. I consider myself am pretty versed in data analysis, however with less expertise on NLP topics (eg t-SNE).
So a quick blurb like: the units on the axes in the graph are “a reduced embedding space” designed to keep structure and to reduce the dimensionality such that the clusters could be plotted on screen…
(I’m not even sure that’s correct, but I would have loved for you to have informed me on the one sentence visualization choice and then point me to t-SNE.)
Overall nice project - and it reminds me of a painful professional analysis lesson I have had to re-learn more than once.
> After working for NN hours on an analysis, and finally breaking through and completing it, overlooking the title and labels is the biggest footgun I have ever dealt with.
Most languages have some code generation tool requiring a compile step, but most of the specs in here change infrequently enough you can just do it once and commit to VC. I personally have a use case where I modify the Meltano (ETL tool) spec at runtime and use a generated scheme to validate reads and writes to the file, helping catch bugs early.
You could use this[0] package but you would need to download the schema first into a folder say "schemas" and then add a build step as a script in your package.json '"compile-schemas": "json2ts -i schemas -o types"' to export to a "type" folder
> “you can solve every problem by adding or removing an abstraction layer”
Have you considered generating your YAML/JSON config with something that composes?
If you are open to it you might be interested in dhall [1] as it’s a config language with variables, functions and imports.
I have used it for pet projects and I could see how it could offer some tidy encapsulation patterns for larger, more complicated production applications.
Actually, the French spelling and pronunciation is cliché [kliché] (accent aigu, abrupt stop), but the English pronunciation is better achieved with è [klishay] (accent grave, extended).
Note that there are no words in French that end with è.
Honest Question: Isn’t the future of AI image generation (of all kinds) and AI-driven Chat all predicated on some human signal for future training? Otherwise it will have future training sets including its own output…
So far the human signal was that text and images were crafted by humans, of course with some tooling.
However as the universe of images and text corpuses grow, they will embody the idiosyncratic nature of the AI generation process. And those “glitches” as has been called higher in the thread, will potentially get fed-back into the training data set with out some filter using a human-signal, if even just an AI human-signal.
At first I was amused to consider what a ‘POST, PUT, DELETE’ means for this API. Then I realized that the conceptual meanings are straight forward. Then I realized that democracy is a fancy authorization regime for doling out ‘write access’.
(Added edit)
I too agree with the sentiments that a v2 should have a finer grain resolution where the data model would recognize a bill as a collection of clauses or statements. And attribution and intent of those clauses would be transformative (I think) for regular citizens.
Because then the internet would be able to track unintended consequences by author. This is arguably what the internet does best.
I love the scientific discussion here, but much of this seems to miss the idea of “millennial minimalism”. The preference tends towards plain/simple buildings so that nature can have the main stage.
Have no fear Gen Z is a pendulum towards radical 90s colors.
I highly recommend the accompanying youtube demo video [1] - but only if you enjoy a hacker dad singing fausetto to propel his son's toy train faster around the track.
[1]: https://www.youtube.com/watch?v=ohDB5gbtaEQ