Every continuous shot lasts no more than five to ten seconds. It's not a "give-away" as such, but it's certainly a tell. r/aivideo is chockful of this crap.
This is more like a Claude-based skill set that orchestrates a bunch of different, separate systems. The closest equivalent to Trellis would probably be its usage of Huyuan-3D, which it uses to create some of the 3D object models.
From what I can tell, it takes an image and first segments it into objects versus environment then sends the environment to Marble 1.1 to generate a Gaussian splat,sends all the isolated individual objects to Hunyuan to generate GLB model files.
> Here’s what my nephew and I did when we got confused: we picked up the piece and looked at the base. Each figurine has a small chess symbol printed on the base. Chewbacca is a knight. The Stormtrooper is a pawn. Problem solved.
The real question. How is Chewbacca (wookies rip your arms out of their sockets when they lose) not a rook? Shouldn't knights be... hmmm I don't know, jedi knights like Skywalker, Obi-wan, etc.?
And yes, more related to the article: actuators that can manipulate the world are why it can be interesting to give LLMs access to a domain of commands that map to actions within the world (even a virtual world model for example).
I was literally just about to say this. It’s similar to what a lot of really good teachers do: students use them as a “rubber duck,” and then they respond as an almost Socratic guide.
I've actually used that approach during my years teaching ESL - self-discovery often leads to the most persistent (long-term) lessons.
Smaller models might not make the best agentic coding assistants, but I have a 128GB RAM headless machine serving llama.cpp with a number of local models that handles various tasks on a daily basis and works great.
- Qwen3-VL:30b > A file watcher on my NAS sends new images to it, which autocaptions and adds the text descriptions as a hidden EXIF layer into the image along with an entry into a Qdrant vector database for lossy searching and organization.
- Gemma3:27b > Used for personal translation work (mostly English and Chinese). Haven't had a chance to try out the Gemma4 models yet.
- Llama3.1:8b > Performs sentiment analysis on texts / comments / etc.
I run the biggest quant because it is more capable, spark has enough memory for two qwen at 8bit and full context length (roughly 48G each)
I find gemini/gemma to have become worse at coding, they are better for non-coding tasks, but maybe not even that, the hallucinations and instruction following have both degraded ime
It's been great for me. I have a secondary PC that's been running Windows 10 LTSC IoT for 5 years now. I’m still getting security updates but nothing else (that's a feature to me).
The only time I had an issue was when a DAW installer required me to upgrade to 22H2. I grabbed the enablement package directly and used the DISM tool to install it.
reply