Hacker News new | past | comments | ask | show | jobs | submit | TomasBM's comments login

I'm working on ways to allow developers and deployers of LLMs to express how and why their overall system is compliant and adversarially robust, and what to do when that's not the case.

Specifically, my team and I are making assurance cases and ontologies that can seamlessly integrate with the system and its guardrails. For example, if you want to deploy some mix of filters underneath a user-facing LLM app, you would able to: 1) express the logic of how they should be deployed and why (e.g., if X=1, then Y, else Z); 2) see how they perform over time and evaluate alternatives; 3) investigate what happened when an attack succeeds; 4) prove to the auditors that you're taking all measures necessary to be robust and compliant with the EU AI Act.

It started as an informal collab early this year, but we have since published a few workshop papers on this concept [1,2]. We're building a Python demo that would show how it all fits together.

[1] https://arxiv.org/abs/2410.09078 [2] https://arxiv.org/abs/2410.05304


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: