You can check how cloudimage-responsive works here 
It would be better to use the images and photogrammetry software and instead make a real 3D textured model and use a good OBJ web-based viewer.
About using real 3D textured models, I think it depends on use case.
check a quick example for outdoor and indoor view: https://cdn.scaleflex.it/demo/home.html
(photo credits: Bush-Jaeger)
we need to do some work on it (nice arrows, forbid to go from last to first image, and a few more things) but it’s already usable
The interesting thing is that taking photos is only the first part of the problem - the second and larger problem is that you have 36-108 photos which need editing. Outsourced, you can have an agency remove the backgrounds from photos, but long term this is unsustainable.
I'm speaking about this from having spent nearly 2 years having tried to get a startup off the ground which was intending to offer this kind of photography as a service.
Another massive part of the Challenge was that nobody cared! It's interesting tech and all the stats say that 360/3D photos convert better, but they're a lot more expensive and not possible to have an intern shoot as easily. In the end, I spoke to about 1/3rd of the available market, often reaching out with custom mockup and products in their market - we had 2 clients from this.
That was 2013 though. I can think that 2 things might have changed:
1. It seems quite possible that an RNN/CNN could help remove backgrounds more reliably? I'm no expert in this, perhaps someone else could comment. The particularly hard case is separating nearly identical colors - e.g. black&white trainer on white background. That would certainly allow for a higher quality finish.
2. Customer interest may have changed? Anecdotally, I've not really seen a huge increase in 360/3D photography in ecommerce. In 2013, I thought Amazon was going to introduce it because of some patents filed relating to an automatic photography pipeline.
I'm pretty sure the video file would be smaller than all the individual images, and only required 1 network request.
Though it is more pronounced with a screen where the picture fills most of the screen. You currently have to find a border or space between the pictures to swipe on to scroll up or down.