Hacker Newsnew | past | comments | ask | show | jobs | submit | mooreed's commentslogin

Post reminds me of the absolutely lovely Monty python skit [1] that is not only humorous, but has a lot you can learn from.

[1]: https://www.youtube.com/watch?v=ohDB5gbtaEQ


This project is the first that tickles my brain in the right serendipitous ways; it merges topics of my recent interests.

However, after a VERY short perusal, I grew a giant sense of empathy for non-native English speakers. The readme is gentle enough to English speakers (aka: +95% English) no less I felt like I muddled through renaming tokens in my mind as I went. However, that quickly showed me two things.

1. It reminds me why I never seem to finish classic Russian literature… so often get lost in the introductory parade of names that are a cache miss for my usual set of names.

2. This is perhaps a significant cultural muscle that has never been necessitated for English speakers. Since the earth has largely been using English (in some capacity) for significantly longer than my life span - as my favorite joke says in the punch line “what do you call someone who only knows one language… uni-lingual… jk: American”

PS: it seems like there could be an open registry maintained by “Americans like me” who would rather pre-process the code for tokens within the docs and src… seems like a “DefinitelyTyped style” definitions registry would be very niche, but SUPER useful.


> as my favorite joke says in the punch line “what do you call someone who only knows one language… uni-lingual… jk: American”

"Another thing to keep in mind, when you get to feeling bad about being monolingual, is that the fair question is not 'how many languages do you know?' It is, 'of the languages spoken by five million people or more within a thousand miles or so of where you live, what percentage do you know?'"

https://structuredprocrastination.com/light/biling.php


> of the languages spoken by five million people or more within a thousand miles or so of where you live, what percentage do you know

By that metric you shouldn't feel bad for not speaking Russian in most of Russia, or for not knowing the most common languages in your immediate surroundings in large swaths of Africa (i.e. most Bantu languages would be excluded).


The essay is a bit tongue in cheek, but I think that was basically the point.


> of the languages spoken by five million people or more within a thousand miles or so of where you live, what percentage do you know?

IRL interactions are just one aspect of life. Pretty important, sure, but it’s not the only important thing.


Lots of the heavily multilingual people in the world also have a lot of irl interactions that necessitate knowing languages other than their mother tongue. Of course that’s not the only reason to learn languages, but it is both common and effective. So in terms of expected number of languages spoken I think it’s a good baseline.

In the case of the US the languages that would meet that criteria in most parts of the country would be English and Spanish. But there are also hierarchies around languages, ie people that speak the more dominant languages are less likely to speak the less dominant languages, but the speakers of the less dominant languages are expected to speak the more dominant languages and suffer higher consequences if they don’t.


and the joke about the heavily multilingual people is that they will speak 9 languages,none of them fluently


I prefer to view it as "what's the likelihood that, with your current knowledge of languages, you're able to communicate with any person you may presumably want to speak to in the future."

This is why it's so much easier to only speak English than it is to only speak another language.


> This is why it's so much easier to only speak English than it is to only speak another language.

It’s pretty easy to only speak German within the DACH countries. Huge online communities as well that speak German.

I’d wager it’s similar for several other large languages, e.g. Spanish or Chinese, OTOH they are even larger, OTOH they probably don’t have the same advanced dubbing industry that we have.


I am Spanish; we have a very strong dubbing industry and pretty much all movies have been dubbed since forever. In fact nowadays we usually get two dubs, one for Spain and one for LATAM, with serious online fights about which one is better xD


Oh interesting, I heard that it’s almost never the case that people are so attached to dubs as here, where people often don’t really care about the original voices and instead about the dubbers, which might even get movie billing.

> one for Spain and one for LATAM

What’s the difference there for someone who is only bilingual (English and German)? :D


There's a bit of variety in the vocabulary used between LATAM and Spain, and honestly even between LATAM countries there's variance.

An example that comes to mind, as a first year Spanish student (I'm doing my best but fact check me because I'm very much still learning!) with a Latin wife, is "el carro," which means car, and is common in some Spanish speaking countries but others might use "el coche" -- I believe this dialectic difference even exists within Latin American Spanish speaking countries!

There are differences in pronunciation too but that obviously doesn't apply to subtitles


Regarding "car", there is a third option: "auto", which if I am not mistaken, is the preferred word in the Southern Cone.

But yes, it is mostly a difference of vocabulary, accent and pronunciation, and also how or if to translate the names of characters and the movies themselves.


As a Southern Conesman, can confirm.


Nice project.

I also would love to hear more about the cluster shapes and cardinality of the coordinate system. I consider myself am pretty versed in data analysis, however with less expertise on NLP topics (eg t-SNE).

So a quick blurb like: the units on the axes in the graph are “a reduced embedding space” designed to keep structure and to reduce the dimensionality such that the clusters could be plotted on screen…

(I’m not even sure that’s correct, but I would have loved for you to have informed me on the one sentence visualization choice and then point me to t-SNE.)

Overall nice project - and it reminds me of a painful professional analysis lesson I have had to re-learn more than once.

> After working for NN hours on an analysis, and finally breaking through and completing it, overlooking the title and labels is the biggest footgun I have ever dealt with.


Feels like a spiritual successor to the ksuid [1] lib which I first heard of used in conjunction with DynamoDB

[1]: https://github.com/segmentio/ksuid which has very similar use cases.


Does anyone know of a typescript translation for each of those validation models?

Or maybe even a way to discover related statically typed definitions based on the validation rules?

It would be really nice to not define parts of a data model that provide little to no business value - but where you can easily “stub your toe”.


use quicktype: https://quicktype.io/


Awesome tool, thank you for sharing. I made use of it already!


Most languages have some code generation tool requiring a compile step, but most of the specs in here change infrequently enough you can just do it once and commit to VC. I personally have a use case where I modify the Meltano (ETL tool) spec at runtime and use a generated scheme to validate reads and writes to the file, helping catch bugs early.


You could use this[0] package but you would need to download the schema first into a folder say "schemas" and then add a build step as a script in your package.json '"compile-schemas": "json2ts -i schemas -o types"' to export to a "type" folder

[0] json-schema-to-typescript


If it's a one-off you can just use http://borischerny.com/json-schema-to-typescript-browser/ or https://transform.tools/json-schema-to-typescript (they both use the same library).


I've been asking chatgpt to do it for me.


At the risk of sounding clichè/unhelpful.

> “you can solve every problem by adding or removing an abstraction layer”

Have you considered generating your YAML/JSON config with something that composes?

If you are open to it you might be interested in dhall [1] as it’s a config language with variables, functions and imports.

I have used it for pet projects and I could see how it could offer some tidy encapsulation patterns for larger, more complicated production applications.

[1]: https://dhall-lang.org/


Cliché is spelled with é, rather than è, which makes the word sound more like cleesh, not cleeshay.


Actually, the French spelling and pronunciation is cliché [kliché] (accent aigu, abrupt stop), but the English pronunciation is better achieved with è [klishay] (accent grave, extended).

Note that there are no words in French that end with è.


This is when I feel the need to interject and recommend the "macron" (nothing to do with the president of the same name!) [1]

[1] https://fr.wikipedia.org/wiki/Macron_(diacritique)



This is the first rule of French: there is at least one exception to every rule.

And this probably includes the first rule itself.


I'd say it's not really a French word but a transliteration of the Ancient Greek word ἡ κοινὴ


#til


Honest Question: Isn’t the future of AI image generation (of all kinds) and AI-driven Chat all predicated on some human signal for future training? Otherwise it will have future training sets including its own output…

So far the human signal was that text and images were crafted by humans, of course with some tooling.

However as the universe of images and text corpuses grow, they will embody the idiosyncratic nature of the AI generation process. And those “glitches” as has been called higher in the thread, will potentially get fed-back into the training data set with out some filter using a human-signal, if even just an AI human-signal.

See https://nopecha.com/ (I’m not affiliated) and plenty others.


At first I was amused to consider what a ‘POST, PUT, DELETE’ means for this API. Then I realized that the conceptual meanings are straight forward. Then I realized that democracy is a fancy authorization regime for doling out ‘write access’.

(Added edit) I too agree with the sentiments that a v2 should have a finer grain resolution where the data model would recognize a bill as a collection of clauses or statements. And attribution and intent of those clauses would be transformative (I think) for regular citizens.

Because then the internet would be able to track unintended consequences by author. This is arguably what the internet does best.


I love the scientific discussion here, but much of this seems to miss the idea of “millennial minimalism”. The preference tends towards plain/simple buildings so that nature can have the main stage.

Have no fear Gen Z is a pendulum towards radical 90s colors.


I highly recommend the accompanying youtube demo video [1] - but only if you enjoy a hacker dad singing fausetto to propel his son's toy train faster around the track.

[1] https://www.youtube.com/watch?v=t65X-cs55qM


Almost sounds like the Sprach Zarathustra. Almost. https://www.youtube.com/watch?v=Szdziw4tI9o


you forgot the "Also". Not having it is like calling Nietzsche's novel "Spoke Zarathustra".


My bad! Thanks for the correction. Unfortunately, can't edit now.


This video is amazing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: