The page doesn't say it, how well it handles very large files? Also, how does it handle editing files with various encodings? Asking this while having Notepad++ in mind.
I use Kate all the time and IME it doesn't. In fact it struggles even with relatively small files. In fact try this:
seq -f"foo %100.100f" 1 1000 > /tmp/test1000 && kate /tmp/test1000
...and watch Kate struggle with a ~100KB file. It seems to be related to the text layout or something along these lines (i.e. it isn't because of whatever data structure it uses for text, modern PCs should be able to handle the naivest of naive text editor structures even when editing a few MBs without sweat) because the slowness is triggered whenever text is about to be displayed (e.g. when scrolling) or lines are added/removed (modifying a single line is fine, but if you move say to the middle of the file, press Enter and keep it down you'll see that it struggles to add new lines).
I'm only using for editing text files and source code so it rarely bothers me personally but it can be annoying sometimes.
What do you consider as struggle? Here this works fast and flawless with Kate, VS Code, vim... Though, I did not use /tmp, but $HOME. But I doubt this would make a difference, unless your KDE is broken and Kate struggles because of kio on opening the file.
Choppy scrolling, taking up to a second to add a line in the middle of the text, etc. As i wrote, the issue was with scrolling and adding/removing lines - so most likely text layout stuff. The file I/O was fine.
Could it be then, that the layout or styling of JSON is performant, while the one for random text is not? But then again what's there actually to style or layout in random text.
Hi! This is awesome for size and quality. I want to see a book reading example or try it myself.
This is a tangent point but it would have been nicer if it wasn't a notion site. You could put the same page on github pages and it will be much lighter to open, navigate and link (like people trying to link some audio)
Thanks for the kind words!
You can try it now on https://huggingface.co/spaces/nari-labs/Dia-1.6B
Also, we'll try to update the Demo Page to something lighter when we have time. Thanks for the feedback :))
How???? I can believe the guy in the video being AI because his lips are not perfectly synced. But the woman? Even with continuous silly exaggerated movement I have hard time believing its generated.
A strand of her hair fell on her shoulder, because she was moving continuously (like crazy) it was moving too in a perfectly believable way, and IT EVENTUALLY FELL OFF THE SHOULDER/SHIRT LIKE REAL HAIR and got mixed into other fallen hair. How is that generated? It's too small detail. Are there any artifacts on her side?
Edit: she has to be real. Her lip movements are definitely forced/edited though. It has to be a video recording of her talking. And then a tool/AI has modified her lips to match the voice. If you look at her face and hand movements, her shut lips seem forced.
Nah, having used HeyGen a bit, it's extremely clearly a HeyGen generation. There's a small number of movements and expressions that it continually uses (in forward and reverse).
Edit: I mean, to be clear, it is a real person, just like the author's video is. The way HeyGen works is you record a short clip of you saying some stuff and then you can generate long videos like these of you saying whatever you want. So the stuff you noticed does come from a real video of her, but it's not a real video that's lightly edited by AI, more like the AI has a bunch of clips it can continually mesh together while fixing up the mouth to continually generate video.
If its a bunch of clips meshed together, the AI isn't doing a good job of meshing in the silence pose. I looked at HeyGen site, these are probably called interactive avatars. A woman's throat is moving (as well as her hands) as if she is talking, but her lips as shut. Whatever they are doing, they are not handling silence/listening very well.
14/20. Increased brightness to full, turned off blue light filter (aka night light on Android), zoomed in, scrolled the circles up and down to dissipate after image from eyes, 19/20.
I use voidtools Everything on windows for instant file lookup. It has an Http server built in. Whenever browser complains about a feature only available via webserver url, not local file, it comes handy. Open everything webserver, enter name of the file and click.
> They recorded over 300 of these observations, including what the caller was doing at the time, what was happening in the environment and the behaviour of the caller and audience after the vocalisation.
> To reveal the meaning of each call, they used a technique from linguistics to create a cloud of utterance types, placing vocalisations that occurred in similar circumstances closer together. “We kind of established this dictionary,” says Berthlet. “We have one vocalisation and one meaning.”
This is lots of manual effort, could the recent advancement in language models help decode animal languages more easily? I guess it will need lots 24/7 capture of physical movement/action and sound data and train a model (that already understands vocal English too) perhaps.
I definitely think you’re touching on some exciting possibilities, but adding a language model at this early stage would endanger the goal of this particular research: proving that the compositionality exists in the first place. If there was a foundational language model we’re involved, it might be reading patterns into the calls regardless of whether they’re really there — that is what it’s designed to do, after all!
Re:”lots of work”, I think you’re misunderstanding the quotes a bit. They applied PCA to categorical data to generate semantic positions for each call type—or, in other words, ran a prewritten mathy algo on a big csv. Collecting the CSV data in the first place certainly sounds extremely hard, but that’s more of a practical issue than a scientific one! Bonobos aren’t known for living in easy-to-reach places ;)
Making models of the physical world is a lot of work. Can’t they install cameras and record hundreds of thousands of hours of objects getting shot through cannons, birds flying and trees swaying in the wind? Maybe with some more nuclear power plants they could get close to approximating something like Newton’s Laws of Motion (kind of close is good enough, no need to be nerdy about it).
reply