My experience is that iPhone 6 GPU can be up to 50-70 times faster than the CPU - for single precision floating point (i.e. Swift running on the CPU, and Metal on the GPU). See http://memkite.com/blog/2014/12/18/gpgpu-performance-of-swif... for an example (comparison with Accelerate framework)
That's almost a "what not to do" for using Accelerate. You're introducing multiple passes over the data, and introducing extra dummy arrays that need to be passed over as well, which blows up the load-store traffic further. You're also using vvpowf to compute a simple reciprocal, which is wildly inefficient.
I don't mean to pick on you, but it's a misleading comparison. A basic transform that only gets rid of the extra working arrays and does no other optimizations is ~5-10x faster in my quick timings:
vvexpf(&result, &negx, &localcount)
let one = Float(1)
vDSP_vsadd(&result, 1, &one, &result, 1, localcount)
vDSP_svdiv(&one, &result, 1, &result, 1, localcount)