I wonder if RISC-V has a standard way to handle these custom instructions. Is there an easy way to trap these custom instructions and emulate them in software, or is there something like CPUID, or do you just don't distribute the binaries with the custom instructions?
Custom opcodes have a protected spot in the opcode map; but there's nothing that prevents two different CPUs from using the same custom opcode pattern to mean two different instructions.
Conceivably, you could trap, look up something like the mvendor/march CSRs to triangulate CPUID, and Do the Right Thing, but you usually added a custom opcode because you believe your performance/use-case depends on it.
Basically, your Option #3: "don't distribute binaries with custom instructions." If it's so important, it should get pushed through the standards process and then the binaries can be portable.
Isn't embedded a big piece of RISC-V? How important is portability? And that being said, why have a software implementation anyway? Doesn't that risk destroying performance in unexpected ways?
The core should trap an illegal instruction on unrecognized custom instruction opcode, and the handler could call an emulator. Just like if one issued a floating point instruction on a core without the F extension. Of course that means whomever writes that trap handler and emulator needs to actually keep up, and custom opcodes are unlikely to be unique across all RISCV implementations. So it’s theoretically possible but you know that saying about theory and practice…