Hacker News new | past | comments | ask | show | jobs | submit | ianbicking's comments login

Lots of broken links in the doc, though I guess the YAML file specifies everything: https://github.com/open-llm-initiative/open-message-format/b...

The metadata tokens is a string [1]... that doesn't seem right. Request/response tokens generally need to be separated, as they are usually priced separately.

It doesn't specify how the messages have to be arranged, if at all. But some providers force system/user/assistant/user... with user last. But strict requirements on message order seem to be going away, a sort of Postel's Law adaptation perhaps.

Gemini has a way of basically doing text completion by leaving out the role [2]. But I suppose that's out of the standard.

Parameters like top_p are very eclectic between providers, and so I suppose it makes sense to leave them out, but temperature is pretty universal.

In general this looks like a codification of a minimal OpenAI GPT API, which is reasonable. It's become the de facto standard, and provider gateways all seem to translate to and from the OpenAI API. I think it would be easier to understand if the intro made it more clear that it's really trying to specify an emergent standard and isn't proposing something new.

[1] https://github.com/open-llm-initiative/open-message-format/b...

[2] https://ai.google.dev/gemini-api/docs/text-generation?lang=r...


hey @ianbicking - thanks a lot for the feedback. I've merged a change to fix the links [1].

> The metadata tokens is a string [1]... that doesn't seem right. Request/response tokens generally need to be separated, as they are usually priced separately.

For the metadata you are right. Request and response tokens are billed separately and should be captured accordingly. I've put a PR to address that [2]

> It doesn't specify how the messages have to be arranged, if at all. But some providers force system/user/assistant/user... with user last. ...

We do assume that last message in the array to be from user. But we are not forcing it at the moment.

[1] https://github.com/open-llm-initiative/open-message-format/p...

[2] https://github.com/open-llm-initiative/open-message-format/p...


I've hit cross-LLM-compatibility errors in the past with message order, multiple system messages, and empty messages.

Multiple system messages are kind of a hack to invoke that distinct role in different positions, especially the last position. I.e., second to last message is what the user said, last message is a system message telling the LLM to REALLY FOLLOW THE INSTRUCTIONS and not get overly distracted by the user. (Though personally I usually rewrite the user message for that purpose.)

Multiple user messages in a row is likely caused by some failure in the system to produce an assistant response, like no network. You could ask the client to collapse those, but I think it's most correct to allow them. The user understands the two messages as distinct.

Multiple assistant messages, or no trailing user message, is a reasonable way to represent "please continue" without a message. These could also be collapsed, but that may or may not be accurate depending on how the messages are truncated.

This all gets even more complicated once tools are introduced.

(I also notice there's no max_tokens or stop reason. Both are pretty universal.)

These message order questions do open up a more meta question you might want to think about and decide on: is this a prescriptive spec that says how everyone _should_ behave, a descriptive spec that is roughly the outer bounds of what anyone (either user or provider) can expect... or a combination like prescriptive for the provider and descriptive for the user.

Validation suites would also make this clearer.


Yeah, I can completely see this, the goal of this was to be specifically for the messages object, and not a completions object, since in my experience, you usually send messages from front end to backend and then create the completion request with all the additional parameters when sending from backend to an LLM provider. So when just sending from an application to the server, trying to just capture the messages object seemed ideal. This was also designed to try and maximize cross compatibility, so it is not what the format "should be" instead, it is trying to be a format that everyone can adopt without disrupting current setups.

Huh, that's a different use case than I was imagining. I actually don't know why I'd want a standard API from a frontend and backend that I control.

In most applications where I make something chat-like (honestly a minority of my LLM use) I have application-specific data in the chat, and then I turn that into an LLM request only immediately before sending a completion request, using application-specific code.


Well, in the case of the front-end (like streamlit, gradio, etc) they send conversational messages in their own custom ways - this means I must develop against them each specifically, and that slows down any quick experimentation I would want to do as a developer. This is the client <> server interaction.

And then the conversational messages sent to the LLM are also somewhat unique to each provider. One improvement for simplicity purposes could be that we get a standard /chat/completions API for server <> LLM interaction and define a standard "messages" object in that API (vs the stand-alone messages object as defined in the OMF").

Perhaps that might be simpler, and easier to understand

OM


This is a very QAnon way of responding to news. Just add in "who's pulling the strings?" and you are pretty much there. You make no assertions, don't engage with the news itself at all, but just "ask questions".

Ahh yes, internet-connected mass schizophrenia thanks to abusive leaders, anonymous/artificial internet commenters breaking humans. They team up on bad ideas trying to get their justice. Buckle up for more of this.

You missed the initial step where they engaged with the news enough to process it and realise that rich people paying taxes they owed was in some way bad for their political team and needed to be undermined.

The actor list you have is so... cringe. I don't know what it is about AI startups that they seem to be pulled towards this kind of low brow overly online set of personalities.

I get the benefit of using celebrities because it's possible to tell if you actually hit the mark, whereas if you pick some random person you can't know if it's correct or even stable. But jeez... Andrew Tate in the first row? And it doesn't get better as I scroll down...

I noticed lots of small clips so I tried a longer script, and it seems to reset the scene periodically (every 7ish seconds). It seems hard to do anything serious with only small clips...?


Thanks for the feedback! The good news is that the new V2 model will allow people to create their own actors very easily, and so we won't be restricted to the list. You can try that model out here: https://studio.infinity.ai/

The rest of our website still uses the V1 model. For the V1 model, we had to explicitly onboard actors (by fine-tuning our model for each new actor). So, the V1 actor list was just made based on what users were asking for. If enough users asked for an actor, then we would fine-tune a model for that actor.

And yes, the 7s limit on v1 is also a problem. V2 right now allows for 30s, and will soon allow for over a minute.

Once V2 is done training, we will get it fully integrated into the website. This is a pre-release.


Ah, I didn't realize I had happened upon a different model. Your actor list in the new model is much more reasonable.

I do hope more AI startups recognize that they are projecting an aesthetic whether they want to or not, and try to avoid the middle school boy or edgelord aesthetic, even if that makes up your first users.

Anyway, looking at V2 and seeing the female statue makes me think about what it would be like to take all the dialog from Galatea (https://ifdb.org/viewgame?id=urxrv27t7qtu52lb) and putting it through this. [time passes :)...] trying what I think is the actual statue from the story is not a great fit, it feels too worn by time (https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...). But with another statue I get something much better: https://6ammc3n5zzf5ljnz.public.blob.vercel-storage.com/inf2...

One issue I notice in that last clip, and some other clips, is the abrupt ending... it feels like it's supposed to keep going. I don't know if that's an artifact of the input audio or what. But I would really like it if it returned to a kind of resting position, instead of the sense that it will keep going but that the clip was cut off.

On a positive note, I really like the Failure Modes section in your launch page. Knowing where the boundaries are gives a much better sense of what it can actually do.


Very creative use cases!

We are trying to better understand the model behavior at the very end of the video. We currently extend the audio a bit to mitigate other end-of-video artifacts (https://news.ycombinator.com/item?id=41468520), but this can sometimes cause uncanny behavior similar to what you are seeing.


I do find that comfort issues (i.e., nausea) make it hard to port non-VR games to VR. Movement just has to be a lot different in VR to avoid the nausea, either more focused on teleporting or changing the basic structure of the game so that the player is still or confined to a small location and the world moves around the player.

I let my kid play a pretty simple game on the Quest that had regular joystick motion and an hour later she was sick and threw up. The effects of VR can be surprisingly delayed and non-obvious.


Not everyone gets nauseous or motion sick from VR. I think it's okay if some VR games are designed around joysticks, as long as teleport mode is still an option.


I remember in the early 90's people similarly would develop motion sickness from the FPSs of the day, such as Doom and Quake.


Different games affected me differently, when I was 13 I could play Goldeneye 4 player on the N64 for hours with my friends but the Turok multiplayer focused game I could only handle 30mins before I would get nausea. I grew up playing Doom and Quake on PC without issues. Some games still make me sick, Dark souls 1 I have no issues with but Dark souls 2 is unplayable for me. Also I can play Rocket League for hours and feel perfectly fine. The source engine in VR was the worst experience for me and now when ever I see half life 2 in motion it makes me sick.

It’s quite varied and I would like to figure out what the common denominator in all of this that triggers my nausea response.


I used not to belong to this group, but shortly after my son was born, I started to get motion sickness from a few different things, including some old games (e.g. Shadow Warrior)


Perhaps due to hormonal changes?


Does the version matter? E.g., same response with 360 DS2 VS. PS4 or Xbox One? It might be consistency of framerate or FOV.


I haven’t tried on recent consoles so all my current gaming experience is on PC. I have played Rocket League at both sub 60fps and 200+fps and with narrow and wide fov and it didn’t change anything for me so I don’t know if it’s that.


I am one of those people! I got over it once during the Doom era, then a second time during the Counterstrike era. I keep trying to get used to it a third time but it hasn't worked.

I don't know whether the games changed rendering in some way, or my monitor is too big, or it's just age.


I couldn't play Wolfenstein without getting nauseous, but for some reason doom was fine. I think ut is ok to release a video game that some people can't enjoy. We don't forbid flashing lights, and that can send some people into actual seizures!


I still get motion sick after a while when watching someone else play an FPS. If I'm playing myself there is no problem. I can also get motion sick as a passenger in a car, but have no issue driving.


This is still a thing. My partner can't play Skyrim in first-person mode, only in third-person mode, because of motion sickness.


Doom on SNES was instant motion sickness for me. Thankfully I only rented it from blockbuster!


The Meta Quest store has comfort ratings for all games, indicating the likelihood of motion sickness. Children are particularly prone to motion sickness in VR, partly because their proprioception is less developed and also because they often have narrower IPDs than the headset minimum.

There are a variety of tricks that help a great deal when using joystick-based movement controls; while they don't work for everyone, they can work for a good majority of users when implemented well. There's the obvious stuff like being absolutely fanatical about controlling motion-to-photon latency and avoiding violent and uncontrolled camera movement, but there are a variety of less-obvious stuff like using vignetting to reduce the FoV during motion and making all acceleration instantaneous.

I'm occasionally prone to motion sickness when playing some flat-screen games, but I've had no issues with games like Onward and Ancient Dungeon on the Quest.


The current VR mod defaults to teleport mechanics. Most people can become tolerant to camera motion over time.


When I worked with people who had been writing code for VR for a long time a large portion of them didn't use VR much because of nausea issues. I don't think tolerance is as easy to achieve as people think, and can get worse over time instead of better.

I think some people become tolerant of a mild unpleasantness. But just as many people end up finding the entire experience unpleasant for reasons that aren't obvious to them (but I suspect are related to nausea), leading to VR's very poor retention.


Many people even get nausea from playing non-VR first person shooters, or things like too much screen shake. Nowadays most games have a slider to increase the field of view, which decreases nausea, but for a long time it simply wasn't known that this is an issue. There are some problems which people don't tend to complain about, they just avoid products which cause those issues, apparently "for reasons that aren't obvious to them".


I’d note the experience is not dissimilar to motion sickness, which despite being present in some humans, hasn’t stopped us from using cars or boats. Use of cars and boats allows people to become adapted to the experience through the amazing plasticity of the human brain.

Now, the use cases for cars and boats might be more compelling for sure, and maybe for those that experience this discomfort its benefits don’t outweigh the unpleasantness of the adaptive period, but I’d note motion sickness isn’t universal, and for smaller motions like automobiles (probably more similar to VR than boats which are an extreme), not even that common.


Figure skating is evidence that one can "train" one's vestibular system to ignore the mixed signals.

Now that I think of it, an anti-nausea VR training program would be excellent to treat motion sickness.


Is it? There are not that many figure skaters in the grand scheme of things. I’d assume there’s some element of self-selection going on along with the training.


Figure skating doesn't induce mixed signals I believe. The mixed signals from VR presumably come from the inner ear accelerometer giving different signals than the visual system.


in my experience most people need to use VR intermittently for about 2 weeks to fully adjust

i really struggled with vr initially but having used my quest 1 since pre pandemic it's fine and no longer have issues when using it for an hour or two


I don't know if you know this but kids throw up for many reasons besides VR nausea. Are you sure she didn't throw up for some other reason?


This is awesome! I have been thinking about what I'd want in an open source product after feeling unhappy trying to mess around with Bluetooth devices and overriding the assistant on an Android.

I really think an open source experience is going to be the only way this specific area will advance (wearable voice assistants). Apple/Google/Amazon are always going to very conventional in how they think about the purpose of their products, how personalized they can be, how much they can be expected to understand the user.

Looking at the Apple prompts, it's notable how uninteresting they are. There is no real theory of function, no sense of relationship or roles. They are just letting that all default to some unspecified common sense (as found in the model), handling only the surface level of these interactions. And they don't appear to bake that into the model because that wouldn't be enough, because those deeper interactions require state that seems pretty clearly to not be specified. Anyway, I'm really going off on a prompting tangent.

I think there is _really_ deep stuff people could be creating using these building blocks. The kinds of developments that are a synthesis of modified personal behavior and the tools provided. A tool this powerful is being wasted (theoretically and right now in actuality) if you don't modify behavior when using it. But that's a terrible way to make a commercial product, you can't expect people to change for you. And so they create these very bland experiences that are the projection of their current apps onto a voice or AI interface.

And they aren't wrong to take this conservative approach... it's very boring but very rational. I think this is a particularly opportune moment for people with their own very personal and specific ideas about how integrate AI into a particular part of their life to try to actually build that experience, with an authentic goal of just improving their own life. An open source stack makes that possible... including the device, because Google and Apple just won't let you use a phone that way.

So this is very exciting! My dev kit is ordered, and I await it eagerly


As an apparently excited user of this device, what do you think about the privacy concerns - both for the users themselves, but more importantly for people who interact with the users?


Just like with the prompt, I'm not thinking right now about what this is for everyone. I want something I can use myself, for myself. I will figure out what I think is appropriate in terms of privacy and social acceptability as I use it, and if I get it wrong that will be on me.

With respect to recording I'll also be thinking about what kinds of uses are responsible. The existence of a recording doesn't mean I have to use it or store it. I honestly can't recall a time when, if I had been continuously recording, I would have used that recording against anyone present. I would expect to be as respectful of the privacy of people I interact with as I am now... I don't recount what people say to me now without considering if that what they said might have been in confidence, without considering how what they said might be interpreted differently by a different audience or out of context, and without passing on my most good faith interpretation of what they said. That's a complicated rule system, but it does actually fire when I recount other people's statements.

But I'll also have to navigate how I use it, understand what things it captures that I don't want it to, and how that affects the people around me.

Also I just want to see what's possible, without pre-censoring what's appropriate before we know how any of this stuff works in practice. I'm willing to take the risk it's all a bad idea and I'll soon think of it as a dead end.


OpenRouter.ai does this pretty well. It has a pretty reasonable flow. It supports enough free models that you can do at least _something_ without a person having to pay anything (even if quality is a bit iffy). They have user-set spending limits, and each key is bound to the service. Their privacy policies seem decent.

The other option is Gemini, which has enough free credits to really be free for any small app. Unfortunately you can't use that through OpenRouter, you need your own developer account to be able to take advantage of Gemini's free offering.


If they are putting people on shift who are too tired to competently do their job, I assume many of those jobs aren't actually important. Some of the jobs are important (and when done wrong lead to these kinds of incidents), but given how widespread sleep deprivation is from people's comments here, clearly a lot of the jobs can be done very poorly without affecting operations.

That sounds like a management issue. Congress doesn't manage how ships run. The Navy makes all those choices itself.


I noticed that too. That seemed odd at first read... after all, it has a guidance system, it's not relying on exact aim. I'm assuming it's more that its guidance system can only has so much fuel at its disposal and ability to correct errors, and if it's aimed incorrectly it would exhaust its fuel before it corrects its trajectory.


Sometimes it's less work to engineer a hard problem into an easy one, than to solve the hard problem.

Most of the tech for the Minuteman I was developed in the mid-1950s.

With that level of processing, would you rather solve a 2d problem by precisely orienting the missile before launch? Or a 3d one by requiring it to orient during flight?

Keep in mind: any equipment to self-orient in-flight also needs to be carried on the missile itself, while being tolerant of launch, acceleration, and reentry forces.

Any precision machinery at the launch site has no such requirements.


This doesn't make sense to me. I would assume the engines starting by themselves would introduce enough error to throw the entire system off. Let alone natural seismic events in the ground, plus wind.

I would guess you must solve the 3D problem at least to some degree.


I'm not a rocket scientist, but I think thrust is pretty constant at that scale. That's why they start, spool up, then release from cradle.

Vs something like a Polaris SLBM that has a much more variable guidance problem

It'd be curious to see how early ICBM and SLBM guidance systems differ.


I haven't looked at submarine systems in detail, but my understanding is that the big problem is that an ICBM knows where it's starting, but the submarine travels. So submarines have super-accurate inertial navigation systems on board to determine their position.


I was thinking more for the sequence where it broaches the surface then lights its engine.

https://m.youtube.com/watch?v=h5KejRbD5s0&t=34s

That's a lot more dynamic of a launch orientation. Which way is it rotated? Is it inclined off vertical?


This is a clever idea but if it gets over the line I am very pessimistic about the supreme court allowing it.

I feel like it could go okay if it's in place for a presidential election, and that election has a majority of popular and electoral votes. But if it came into effect and the next election was ambiguous based on whether this compact stands or not, the result would be chaotic at best.

As mentioned in the article, getting congressional approval, and then letting that process go through the courts before enactment, might at least make the deployment reasonable, even if congressional approval may not be strictly required.


>This is a clever idea but if it gets over the line I am very pessimistic about the supreme court allowing it.

You could've stopped at "I'm pessimistic about the Supreme Court". It's completely dysfunctional now.

We need a Supreme Court reform if we want to keep the nation.

> the result would be chaotic at best.

As it already was in the past (Bush vs. Gore), and as I can absolutely assure it will be in 2024 regardless of anything else going on.

>As mentioned in the article, getting congressional approval, and then letting that process go through the courts before enactment, might at least make the deployment reasonable, even if congressional approval may not be strictly required.

Out of the two major parties in the US, one is pushing for National Popular Vote, and the other is famous for sticking for states rights, which is what National Popular Vote Compact is all about (not having the Congress dictate it from the top).

Surely they would support it... right?


A few more conservative states have signed onto the compact. I think most Republicans (individuals) would support it, the leadership is all realpolitik but normal folk do believe in the basic idea of democracy, especially something simple to explain like this. Republicans won the legislature in Florida the same year people voted to give felons voting rights... the electorate is more principled on these things than the parties. (I suppose that makes sense, it doesn't actually affect the electorate nearly as much as it affects elected officials.)


> We need a Supreme Court reform if we want to keep the nation.

More like you need the US house and senate to legislate and coming to stupid party deadlocks, gamesmanship, and worrying what an unpopular vote means to future electability.

The supreme court of the US gained so much power because the legislature stopped legislating.

Source: studied American civics in school


The president is not elected until the Electoral College vote. Whatever the newspaper publish before that is only an educated guess.


I LOVE tagged templates in JavaScript.

But in Python I could also imagine YET ANOTHER constant prefix, like t"", that returns a "template" object of some sort, and then you could do html(t"") or whatever. That is, it would be just like f"" but return an object and not a string so you could get at the underlying values. Like in JavaScript the ability to see the original backslash escaping would be nice, and as an improvement over JavaScript the ability to see the expressions as text would also be nice.

But the deferred evaluation seems iffy to me. Like I can see all the cool things one might do with it, but it also means you can't understand the evaluation without understanding the tag implementation.

Also I don't think deferred evaluation is enough to make this an opportunity for a "full" DSL. Something like a for loop requires introducing new variables local to the template/language, and that's really beyond what this should be, or what deferred evaluation would allow.


Cool to see you jump in, Ian.

I don't particularly mind the prefix thing. It came up in the PEP discussion, as did choice of backticks to indicate this is different. But JS template literals -> tagged template literals shows, you can get from A to B without a fundamental change.

I'm very interested though in the deferred part. I agree that there is complexity. I weigh that, though, against the complexity of existing Python HTML templating, where finding out what just happened is...harder.

I think we can get a TSX-level of DX out of this. And maybe a Lit level of composition. Agree that it is non-zero complexity.


Hey Paul!

I think JSX is an example of the somewhat crude but practical use of simple execution patterns. For instance if you have a loop you do:

    return <ol>
      {items.map((item, i) => <li key={i}>{item}</li>)}
    </ol>;
Which isn't really templating at all, but just the ability to use inline expressions and easily construct objects does get the job done.

Or in a SQL builder with JavaScript tagged templates, I do:

    exec(sql`
      SELECT * FROM a_table
      WHERE category = ${category}
        ${subcategory ? sql`AND subcategory=${subcategory}` : sql``}
    `)
That is, I nest tagged templates to handle different logic conditions and loops.

If there's deferred execution, it's done with ?: and .map() – though these very long expressions don't work nearly as well in Python. (List comprehension is in some ways better than .map()/.filter(), but not for very large expressions like in a JSX template.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: