Accelerate Customer Review Classification with Sparse Transformers

mwitiderrick · on Nov 22, 2022

Language models have a limit on the length of text they can process. For instance, 512 for BERT and 4096 for models such as Longformer and Big Bird.

However, documents in the real world can be arbitrarily long. For instance, reviews from customers.

Classifying these reviews fast can help a business in engaging with its customers quickly. Fort instance, flagging negative reviews to reduce churn and customer complaints.

The problem?

Language models are usually very large making them slow at inference and difficult to deploy. These models are usually overprecise and over-parameterized. You can drop some of the weight connections in these networks to achieve a small model while maintaining accuracy. This is known as sparsification. Furthermore, you can reduce the precision of the floating points in the network to reduce its size further.

In my latest article, I explore how to perform text classification on long documents using sparse Hugging Face Transformers. I illustrate that it’s possible to use a 90% sparse transformer model— meaning that 90 percent of the model's weights are removed, and the rest of the parameters quantized— and still achieve accuracy similar to a dense model.

Using a sparse transformer achieves a 4.8X increase in performance over the dense baseline. It also results in a smaller model which is easy to deploy on commodity CPUs, hence no expensive accelerator hardware is required.

Try it yourself https://neuralmagic.com/blog/accelerate-customer-review-clas...