This paper combines knowledge distillation and efficient attention mechanisms.
=> It works (still efficient, lower cost).
Not an unexpected result, but to their credit, they established a new benchmark to test these combinations. KD+LongFormer is one of the best ones, retaining 95.9% of the performance for 50.7% of the cost.
A.1 Data Collection
Data for GONERD was obtained through Giant
Oak’s GONER software, which scraped web ar-
ticles from public facing online news sources as
well as the U.S. Department of Justice’s justice.gov
domain. This webtext data was randomly sampled
with an upweighted probability toward documents
from justice.gov so that justice.gov consisted of
roughly 25% of the total GONERD dataset.
“Help me understand what this paper is about and estimate whether the relevance and impact to the field is likely to be low, medium, or high.
Explain jargon that may be specific to AI research, but don’t bother explaining or expanding on terms familiar to a working software developer or basic undergraduate computer science.”
2. Follow the above prompt with either the abstract or the full text of the paper.
3. Post something useful here and save others the time.
Do not - in my opinion - copy/paste LLM output as a comment. But would love to hear your own succinct, human sounding, HN guideline compatible thoughts.
=> It works (still efficient, lower cost).
Not an unexpected result, but to their credit, they established a new benchmark to test these combinations. KD+LongFormer is one of the best ones, retaining 95.9% of the performance for 50.7% of the cost.