I explored this topic a bit in a keynote earlier this year: https://github.com/trailofbits/publications/blob/master/pres...
I will also note that our long-term goal for Slither is to directly address some of the problems in Solidity (https://github.com/crytic/slither). It's like 2/3rds of a compiler already. It just needs a little extra push and we can generate EVM bytecode, then start ripping features out of the language that just aren't safe to use. It's amazing how far Ethereum has come with insecure tools but extreme testability. It begs the question what it would look like with both? I know Kadena is going for the clean slate approach (and we're keeping an eye on you all) but our investment at the moment is in adjustments to the current ecosystem.
The language I encouraged people to use for smart contracts was SPARK Ada. It's a mature language used in many industrial projects. Verification is easier with it. Building on what it already had would've cost them less and reduced risk. A SPARK Ada to EVM compiler is all they needed.
Also Vyper, which doesn't go as far but is already used in production: https://github.com/ethereum/vyper
The EVM itself isn't the end of the road either. Over the next couple years, Ethereum is migrating to a more scalable sharded version using proof of stake, and it looks like that will allow pluggable execution engines.
But then there's the question of whether any of those flaws are even reasonable to take upon your contract when you consider their use-case within a blockchain context. If you're only ever running terminating programs, why expose the entire language to the attack surface of non-determinism and re-entrancy bugs when there are very reasonable and expressive CbyC languages available?
If you want to compare the effectiveness of symbolic execution engines, then you should compare how many unique codepaths they can reach. ETH Zurich recently completed such an analysis in their paper on VerX (https://files.sri.inf.ethz.ch/website/papers/sp20-verx.pdf), an abstract interpretation-based verifier, and found that Manticore had the most complete model of the EVM compared to other symbolic execution engines by far, and we have continued to improve its accuracy in the releases since then. That said, ETH Zurich did underestimate a few of the capabilities of Manticore that we hope to better describe in an upcoming academic paper.
If you're looking to start using Manticore, then browse through the manticore-examples repository (https://github.com/trailofbits/manticore-examples) where we have complete working solutions to CTF challenges, crackmes, logic bombs, and other programs that Manticore excels at exploring.
It would be more effective to test bug detection tools against a database of real smart contract code with precisely identified vulnerabilities and example patches. Any benchmark must contain more than simple tests: it must approximate real software with enough complexity to properly stress tools for automated bug discovery.
There is more discussion about what a good bug detection benchmark looks like in our own Challenge Sets from DARPA (https://blog.trailofbits.com/2016/08/01/your-tool-works-bett...) and something we were working on that is similar for smart contracts in "Not So Smart Contracts" (https://github.com/crytic/not-so-smart-contracts).
You'll note that ETH Zurich and ChainSecurity came up with their own benchmark that meets the description above when they evaluated VerX in the paper I linked above. I'm eagerly awaiting the release of those test cases since they'll help us improve the functionality of many of our tools.