Thanks for asking! We're using a multi-source approach:
Case law: Google Scholar + CourtListener's bulk data (great coverage of federal and state appellate decisions).
Statutes & regulations: Currently using Justia for state statutes, but working on scraping directly from state legislature sites. U.S. Code from the Office of Law Revision Counsel's XML releases, and eCFR's APIs for federal regulations.
Fabricated citations: Case doesn't exist at all
Wrong citation: Case exists but doesn't say what the model claims
Misattributed holdings: Real case, real holding, but applied incorrectly to the legal question
From our internal testing, proper context engineering significantly reduces hallucination across the board.
Once we ground the model in the relevant source documents, hallucination rates drop substantially.