More

maoeurk · on Jan 27, 2023

I’m building https://polyglatte.com a language learning website (and app) for learning with real content like YouTube videos, subtitles, and text articles. Ideal for intermediate and advanced learners.

Some neat features are that we fully parse the text and part of speech tag it, we have a system to prepare for difficult articles/videos with clips from easier ones, and the core idea scales really well to more types of content like chat messages and images (not public yet).

Our focus is now shifting to on-boarding and marketing. We have a few users that have figured out how to really use Polyglatte and they love it and use it a lot but for the most part, many users leave without seeing most of the value we can provide.

It’s a self-funded 2 person project and we haven’t monetized it at all yet.

I have no problem charging for it but I need to figure out how to help people really understand the value we provide and reach more people before it would make sense to do so.

lom · on Jan 28, 2023

Wow! This is really cool, will definitely be using this for learning french. Thank you!

Just one question, is there a way to make words by default known? I’m at an intermediate level, and don’t really want to tell the program that I know all the rather simple words.. or does that not really matter?

maoeurk · on Jan 28, 2023

Glad you're enjoying it!

Currently the best way is just that there's a "mark known" button at the bottom of articles to mark all remaining words as known. Also on the wordlist view you can select many words and mark them all known at once. But we're also working on integrations now to help make the on-boarding experience easier and leverage existing study workflows.

The main challenge is that in Polyglatte a word is defined as (lemma [the root word], part of speech, language) and most other systems don't use part of speech, so there's often not a 1 to 1 mapping when importing. So we've delayed having much here so far, but it is definitely a solvable problem. Is there a specific place/format you would want to import that data from? If not then perhaps we can add a "mark wordlist known" button and add some French word lists.

> I’m at an intermediate level, and don’t really want to tell the program that I know all the rather simple words.. or does that not really matter?

That's a good question, due to the way typical word frequency distributions are, the problem becomes less annoying pretty quickly if you mark words as you see them. But solving this problem is also an important part of making Polyglatte fun and a smooth experience for new users. And helping people to spend more time enjoying and using their languages is one of our major goals with Polyglatte -- bulk marking words you already know isn't very fun.

Feel free to reach out on Discord or our new community forum (https://community.polyglatte.com/) for help or for feature requests by the way -- we'd love to make Polyglatte better for you and everyone else.

maoeurk · on Dec 2, 2022

I’m working on a website for intermediate learners to practice by reading and listening, including Japanese, and Japanese is my strongest second language so I can answer a bit about why it’s so uncommon.

Japanese is just really, really hard for computers to deal with. The only reason I got parsing and word segmentation to be pretty good was because I was so familiar with the language and wrote a 3000 line post-processing function on the tokens to get reasonable results. We have a few similar post processing steps like one to better handle separated verbs in German but it’s nothing compared to what we needed for Japanese.

Additionally Japanese kind of breaks our word model, despite being aware of it and planning for it from the start and every part of the app needs special logic to support Japanese properly.

It’s a lovely language, honestly my favorite language I’ve spent time with, but it’s non-trivial to handle it in general with code.

Happy to answer any questions and also, self-promo: https://polyglatte.com for my project. Happy to make improvements to better support you / the intermediate reader use case, just let me know.

dayjaby · on Dec 2, 2022

Not wanting to sound too rude or harsh, but your tutorial appears to be very non-native Japanese. Grammar mistakes (like ...you+verb instead of ...you ni+verb) plus weird choice of words (tango o suru, seriously what does that mean?)

maoeurk · on Dec 7, 2022

Thanks for the feedback! I did originally write it myself in a rush using the English as the basis but have since had it rewritten by a native Japanese speaker -- I will have that done again.

> (tango o suru, seriously what does that mean?)

Thanks for pointing this out, there's actually a mismatch between the JSON fixture used to load this article into the DB and the raw text from which that JSON should have been generated: there's a missing token there. Frankly it's just good (or perhaps bad) fortune that it broke in such a way that it kind of made sense without that word there.

I will get that fixed up along with a general rephrasing/look over by a native again.

_rm · on Dec 2, 2022

That looks good.

It would be great if you could align it with the JLPT. Including the listening, which I think is the hardest.

I'm useless at learning languages, basically foreign language dyslexic. I've found targeting the tests & applying for then in advance was the only thing that allowed me to progress.

They're a huge motivator. Not just the looming deadline effect, but also passing makes you feel good, as it's a genuine asset as much as it is proof of progress.

maoeurk · on Dec 7, 2022

Thanks!

Yeah we have JLPT word lists available and you can set them as a focus and learn the words from the JLPT word lists from clips / snippets of videos and articles -- perhaps there's deeper integration we could do there too.

I agree with JLPT being a useful tool, I personally used their word lists as a core vocab builder when I was studying a lot of Japanese.

> I'm useless at learning languages, basically foreign language dyslexic.

Don't be too hard on yourself! If you were able to pass the JLPT at any level, I think you're doing great.

One of the motivating ideas of Polyglatte is to make language learning more fun. I'm a big believer in mass exposure and quality time spent with the language. I think for most people, most of the time, just being able to have fun with the language will give you great results.

If you're looking for new ways to improve your listening, consider watching some YouTube videos, not with the intention of understanding everything but with the intention of having a good time. For me, listening skill is something that largely came out of nowhere. I gave my brain a bunch of stimulus (Japanese YouTube videos I'd watch for hours every day, even if I didn't really understand it) and then I woke up one day and I just understood most of it or it felt like my brain was suddenly able to keep pace at the very least.

Anyways, the most important part of that system is finding videos you want to watch even if you don't fully understand them and that you'd want to watch even if it wasn't helping you develop a skill.

maoeurk · on Nov 23, 2022

Ukraine has captured a lot of Russian equipment during the recent Russian retreats.

I’ve even seen it claimed in the past few months that Russia was, at points, the number 1 supplier of Ukraine in terms of equipment due to how much they left behind. [1]

If you’re interested in this topic and have some time, I’d recommend https://youtu.be/sNLTE75B0Os

[1]: https://www.wsj.com/articles/ukraines-new-offensive-is-fuele...

maoeurk · on Oct 19, 2022

I've been wanting to make a datalog tagging system for a few things for a while now but don't have the energy to actually do it. Essentially the idea is to encode relationships allowing for very specific queries like: "show me pictures of a person wearing a green hat looking at another person" which is not something most tagging systems could reasonably do.

Breaking that down, that'd be something like:

  wearing(person1, hat), is_hat(hat), is_green(hat), is_person(person1), is_person(person2), looking_at(person1, person2).

I wanted to apply this to Brazilian Jiu Jitsu videos to be able to find very specific queries like, "matches where player 1 gets a takedown, gets swept by player 2, and player 2 wins by submission". A sufficiently well tagged data set would let you find specific stories and sequences of events in a way that I don't think a non-computational query system could do.

Most of the effort and value around a system like this would be building a community of people to tag the data and tools to make that tagging easy... and perhaps a more user friendly query UI.

youainti · on Oct 19, 2022

So you might be interested to know that medical information is described in the way that you propose. Snowmed CT [0] uses a standardized set of "relationships" between "concepts"

[0] https://en.wikipedia.org/wiki/SNOMED_CT?wprov=sfla1

maoeurk · on Oct 18, 2022

> There are two issues -- (1) feeding copyrighted material in to an AI model, and (2) getting copyrighted material out.

> The latter is obviously a violation of copyright, full stop.

It's not obvious to me that (2) is a violation of copyright. Unlike patents, copyright violation is not as simple to prove. My understanding is that, at least in the US, independent creation is a valid defense against copyright infringement. For example if 2 people independently write the same story and can prove that they did, they can both hold copyright over that story.

The analogue to this does exist without AI, when creating something that looks like copyright infringement, clean room design (don't look at similar things) is often done to ensure that "independent creation" can be used as a valid defense in court. Given that, I think (1) is probably not safe to do at all if you can't prevent (2).

adastra22 · on Oct 18, 2022

Get Stable Diffusion to output Micky Mouse and see how far you can use that commercially without Disney stomping down on you hard.

Outputting copyrighted material is a violation of copyright, period. Whether that violation is enforceable depends on your means though.

Siira · on Oct 18, 2022

And why is Micky Mouse not in the public domain as of 2022? There lies in the root of all these questions. The system is not designed to benefit people, but rent-seeking.

anilakar · on Oct 18, 2022

While I agree that copyright terms are unreasonably long, it's not relevant to this specific case.

cycomanic · on Oct 18, 2022

> > There are two issues -- (1) feeding copyrighted material in to an AI model, and (2) getting copyrighted material out.

> > The latter is obviously a violation of copyright, full stop.

> It's not obvious to me that (2) is a violation of copyright. Unlike patents, copyright violation is not as simple to prove. My understanding is that, at least in the US, independent creation is a valid defense against copyright infringement. For example if 2 people independently write the same story and can prove that they did, they can both hold copyright over that story.

> The analogue to this does exist without AI, when creating something that looks like copyright infringement, clean room design (don't look at similar things) is often done to ensure that "independent creation" can be used as a valid defense in court. Given that, I think (1) is probably not safe to do at all if you can't prevent (2).

I don't think the analogue holds, the AI does have direct view of the actual code. In the most paranoid clean room design you have two teams, one analyses the behaviour of some software and writes a specification (without view of the source code), the other then uses that spec to write the reimplementation.

Copilot turns that on its head, you ask to do something it then looks up the source code how to do it and gives that to you.

BeefWellington · on Oct 18, 2022

> For example if 2 people independently write the same story and can prove that they did, they can both hold copyright over that story.

This is the theoretical case but I don't think I've ever seen that actually happen in practice.

maoeurk · on Oct 5, 2022

Yeah it's pretty much exclusively used incorrectly. The original meaning was that an exception to a rule proves that such a rule exists. For example, "no parking on Sunday" is an exception that implies that parking on any day except Sunday is okay.

https://en.wikipedia.org/wiki/Exception_that_proves_the_rule

It honestly doesn't bother me as much as when people incorrectly say "begs the question" when they mean "raises the question" but I think we're long past the point of us ever correcting any of these.

xxs · on Oct 5, 2022

One note about parking - normally all actions (incl. parking) are allowed, unless they are explicitly forbidden. Lack of any other rules/guidance implies the parking is allowed.

hsbauauvhabzb · on Oct 5, 2022

That’s a fairly generic statement, have you confirmed that rule to be true across every city/state/country across the globe?

maoeurk · on Aug 22, 2022

I'm working on a B2C product as an indie founder and I run into this quite a lot. Though from my perspective a lot of the "physical vs non-physical" disconnect is also about the fact that people who have not had direct experience in the creation of a software product are unable to extrapolate from what they're currently seeing.

I think that's actually completely reasonable though; people take things at face value. But what that means is that all the potential in the world means nothing unless you can show via features how that's a good thing and what the impacts are.

This impacts the product I'm working on because there's a lot of stuff on a base level that seems like similar products in the space but the real value is the integration of systems and data to fundamentally change the interactions. And until recently, we didn't have much of the "Big Picture Ideas" on display for our users.

maoeurk · on Aug 14, 2022

I’m actively working on language learning software through immersion at https://polyglatte.com . It’s currently completely free with no way to pay for it but I still feel some amount of guilt about self-promotion even though I get lots of positive feedback and it’s already helping people.

I find marketing to be very uncomfortable so I’ve been neglecting it, but I know it’s the only way to any kind of success.

maoeurk · on March 11, 2022

In Taipei, and Taiwanese cities as far as I've seen, it's the same! The main benefit is just how convenient they are. For example there's a place in Taipei where you can walk to 72 different 7-11s with a 10 minute walk [1] -- and that's just the 7-11 there's tons of FamilyMarts and some smaller stores too.

My personal favorite food to get there is the chicken breast and protein drink with 45g or so of protein for about $3.50 USD, great on the way home from the gym.

The biggest and strangest difference I've noticed about the difference between Taiwanese and Japanese konbini are that you can't buy vegetarian or vegan onigiri in Taiwan despite there being a ton of vegetarian and vegan people here. My favorite flavors, which are vegan, are only sold in Japan.

Also to give more context for the function of konbini in Taiwan (and probably Japan), they're for a lot more than just food: I pay my bills there at the counter, occasionally pick up packages (they act like PO boxes), and print out documents there when I need a printer. So really it's a multi-function store that's open 24/7 with pretty decent food and is on every street.

[1]: https://international.thenewslens.com/article/68424

maoeurk · on March 3, 2022

Just an anecdote but I live in Taiwan (Taipei proper) and haven't noticed or been affected by any power issues today

frankacter · on March 3, 2022

Agreed. I'm also in Taipei (Shilin) and equally no power issues.

The cascading outage resulted in less than a 25% overall outage across the island and for most of those impacted lasted less than 2 hours.

That said, some areas, parts of Miaoli for example, are still without power.

CodeGlitch · on March 3, 2022

Yes but we won't hear from people who have lost power :)

Glad you're not affected.

HWR_14 · on March 3, 2022

They both restored power to many of the affected and many affected were in rolling blackouts. So it's possible people in either group could be posting.