Patents get really interesting when litigated because 40% of patents are found to be invalid.
As an aside; patent attorney productivity is an elusive concept. Or any attorney, at that. And that is because every single attorney only cares about one metric: billable hours. Not volume of work produced, not client wins, but billable hours. Efficiency to an attorney means lower effort to bill more hours. Quality is not a concern, only hours. I cannot stress enough how traditional concepts of efficiency or success or effectiveness don’t apply to lawyers. It’s a deeply flawed industry. And in patent work, it’s difficult for clients to determine if the lawyer is doing a good job or not. Concrete results are many years down the road, and often the in-house attorneys supervising aren’t qualified or given enough time to evaluate ongoing prosecution efforts across a large portfolio.
Do you know by any chance how the `embedding_v1` vectors were generated? The data field description says "Machine-learned vector embedding based on document contents and metadata, where two documents that have similar technical content have a high dot product score of their embedding vectors."
Could this be word2vec, GloVe, or something else like that? Maybe produced from the tf-idf-transformed sum of the word tokens in the title+abstract of each patent?