At a high level, scancode detects many more licenses and copyrights than licensecheck does, reporting more details about the matches. It is likely slower.
In more details:
ScanCode is Python app using a data-driven approach (as opposed to carefully crafted regex):
- for license scan, the detection is based on a (large) number of license full texts (~900) and license notices/rules (~1800) and is data driven as opposed to regex-driven. It detects exactly where in a file a licensse text is found. Just throw in more license texts to improve the detection.
- for copyright scan, the approach is natural language parsing (using NLTK) with POS tagging and a grammar; it has a few thousand tests.
- licenses and copyrights are detected in texts and binaries
Debian's licensecheck (available here: https://anonscm.debian.org/cgit/collab-maint/devscripts.git/... for reference) is a Perl script using hand-crafted regex patterns to find typical copyright statements and about 50 common licenses. There are about 50 license detection tests.