The main thing we cared about was boundary overhead so BoltFFI uses a different approach than UniFFI and the bench numbers are much better in our case, also async exports map to native async patterns on the target side.
few numbers from the current benches vs UniFFI
echo_i32 BoltFFI <1 ns UniFFI 1,416 ns speedup >1000x
counter_increment 1k calls BoltFFI 2,700 ns UniFFI 1,580,000 ns speedup 589x
generate_locations 10k BoltFFI 62,542 ns UniFFI 12,817,000 ns speedup 205x
benchmarks and code are in the repo and we compare against UniFFI and wasm-bindgen there.
repo: https://github.com/boltffi/boltffi
docs: https://www.boltffi.dev/docs/overview
benchmarks: https://github.com/boltffi/boltffi/tree/main/benchmarks