Hacker News new | past | comments | ask | show | jobs | submit login
A Modern Compiler for the French Tax Code (arxiv.org)
188 points by wut42 on Nov 25, 2020 | hide | past | favorite | 130 comments



> In France, income tax is computed from taxpayers' individual returns, using an algorithm that is authored, designed and maintained by the French Public Finances Directorate (DGFiP). This algorithm relies on a legacy custom language and compiler originally designed in 1990, which unlike French wine, did not age well with time.

This interesting. When it comes to legacy code we think of COBOL and FORTRAN most of the time, but probably there is a huge amount of even more exotic code out there that does its duty day in day out.


In my country, with an admittedly quite simple tax code, the authorities used to publish each year official COBOL code that computed income tax.

They switched to Java in 2018-ish - and to Github instead of a random FTP server.

Doing your taxes has also been a no-op for many people the past decade or so. You don't do anything except read the tax report, unless you spot some mistakes or items missing in the report that was autofilled by the government, then you can go into the web form and submit the amendments necessary.


That's genius.

A buddy of mine retired rich after creating property tax administration systems. They only had a few client counties.

They'd just about rollout new system updates before the next wave of tax code revisions. Never ending work.

Sounded like living hell. I couldn't do that kind of work and stay sane. But he seemed to enjoy it.


Recently helped my dad clear out his office - and we came across a binder with fan-folded, faded dot matrix printed copy of COBOL code he used in order to calculate taxes in the 80s - translating the code to BASIC. The original COBOL code came from the Norwegian tax office - I'm not sure if they still publish something similar.

Ed: ah, same country https://news.ycombinator.com/item?id=25210009

A little unfortunate that we threw it out.


That sounds amazing. Which country is it?


Cool - which country? Could you link to the Github repo?



How can you tell? All "Languages" I get is "Java 100.0%".


The COBOL part is from memory. I'll have a look to see if I can find a copy floating around somewhere.

Edit: the transition appears to be more like 2016, and one of the commits then refers to a file generated from COBOL: https://github.com/Skatteetaten/trekktabell/commit/5181d86c5...


I did find an awkward source that lists the 2017(?) COBOL code for the tables - the core program seems to be last updated 1993 (Norwegian only) : https://docplayer.me/38761458-Beregning-av-forskuddstrekk-in...

> IDENTIFICATION DIVISION.  PROGRAM-ID. FT7P200T. AUTHOR. PER J. RISTUN. DATE-WRITTEN. NOVEMBER 1993 ----------------------------------------------------------------  * BESKRIVELSE : PROGRAMMET BEREGNER FORSKUDDSTREKK FOR ÅR 2017 * (...)


This is the one!

If memory serves, they updated the program for each year (I guess only numerical constants and small rule changes), but they probably didn't update the source in the documentation.


I think it's Norway.

(I just googled GP's email address (it's in their profile).)


I’m guessing it’s Estonia. It’s really ahead with digital governance.


Estonian here - no it is not Estonia.

Here our income tax declaration is very simple, the tax office already knows how much salary we have earned, and there aren't many deductions, so most people either don't have to file taxes at all, or just click next a few times and get their tax returns after a week or so. For 90% of people it takes less than 5 minutes.


In Belgium, calculation of the pensions was done in a mix of BS2000 assembly language and COBOL until at least 2007, when they entered a race to switch to Java before the last of the original developers retired.


I worked for an org that tried to modernize, again, their mainframe legacy code, and failed.

I always thought it'd be easier to use an emulator and keep using the old stuff. Then over time whittle the code base down to just the business logic.


So you think they still had a 1970s mainframe ? No: Siemens themselves replaced it with an x84 based emulator in the 90s (which ran on a desktop machine).


My entire industry (subset of healthcare) mostly uses a single application for 95% of our core business functions. That application is written in Visual FoxPro and is, predictably, pretty terrible. Obviously it isn't going to get better, either, but it'll be a decade or more before there's a serious competitor.


I'm not near the space any longer (in a former life, the healthcare industry was a big customer of our computer systems). But a significant portion of healthcare in the US used to use a proprietary language called MUMPS. https://en.wikipedia.org/wiki/MUMPS


Epic has been trying to move on for years, or so a recruiter told me half a decade ago. Their job posting now describes “Using leading-edge technologies and languages like JS, TS, and C#”, though I wouldn’t put a bait and switch past them.


Yeah, well, they have a thing called TS2M, referred to e.g. here... https://www.reddit.com/r/epicsystems/comments/9pmsjj/ts2m_in...


MEDITECH, maybe others, had a proprietary operating system too. (We had a special deal to sell them hardware without an OS at a time when they were normally bundled.) Eventually they moved on to NT.


MUMPS is quite popular in finance, mostly through Cache product.

Including for new developments.


Hah I’m also working in healthcare and we also use Visual FoxPro for storing data. Luckily most of the Visual FoxPro UI is gone.


We have it right here. Hacker News is written in the Arc programming language, which was developed by Paul Graham and Robert Morris.

https://en.wikipedia.org/wiki/Arc_(programming_language)


I've been asked to write parser of _my_country_law_ and something that's capable of doing diffs and putting it together (diff+original=>newest version) docs without knowledge in that domain

after seeing sample doc I've estimated it on something like 1 week of work (XD)

month later I've been crying and having like 20-30% done

this shit has been so sensitive (insanely error prone) and debugging was time costly. I think I didn't spent enough time on thinking about its architecture, but on the other hand my experience was pretty small with this stuff

easy & fast to test and reliably

have solid abstraction over original documents

have solid abstraction over operation (e.g Article 5's meaning is changed)

_____________

project died because it was needed "fast" (in that time there was very specific peroid of changes in law) and we weren't getting to the viable version fast enough


I feel like legislation is a field that would benefit a great deal from the progress that software engineering has made in a number of fields.

It's already evolved somewhat into a DSL. With some nudging and technical leadership, I suspect that we could move it over entirely into a format that can be readily parsed, tested, and version control. The tax code is especially well-suited to this, because it's a lot of rules and math formulas.

In fact, I bet most tax software probably does something very similar to this.


Laws are supposed to be systematic invariants, and to specify them requires a comprehensive vocabulary containing the union of all the things people do in a life. This language is necessarily bound to your time and place, and it may not translate well in the future. (I feel like laws that live long enough to outlast the language in which they were defined hasn't actually been solved! Certainly we should have to reword the Constitution at least once every 200 years!)


BTW how do you think we would create a new Constitution today? What would it be made of, how would it be made, and how many copies would be made?


I heard about the startup Legalese [1] a while ago, which does exactly this.

[1]: https://legalese.com/


It would be more valuable to write many fewer, simpler policies, but transparently document the why when it comes to decisions by stakeholders. For example, the only difference between getting something from the government (say, a California Public Records request) and a court is (1) cost and (2) the court almost always has to write a defensible opinion for why something happened some way, but a government agency does not. At the end of the day you wind up with the same possibilities of results.


> the court almost always has to write a defensible opinion for why something happened some way, but a government agency does not.

That's not true. Agencies that do rulemaking have to provide justification for why they're enacting or repealing particular regulations, otherwise those regulations can be struck down as capricious. This is a major problem the Trump administration had: a lot of their attempts to rollback Obama's policies or enact new ones foundered on the ability to provide this justification.


> a lot of their attempts to rollback Obama's policies or enact new ones foundered on the ability to provide this justification

...in the court of law. Someone had to go and sue the administration. When they issued the policies, they simply provided no meaningful justification. You're proving my point.


It's not the court that does the justification, it's the administration that has to. It has to be litigated in the court because, well, no other branch of the government has the power to decide that someone is violating the law.


This sounds easy until you actually get to grips with how parliamentary procedure and law making actually works.

For example last year I was on a SOC (Standing orders committee) for a 3.5 day conference with 600+ delegates.

We had 120 motions submitted working out Consequentials if motion 15 passes motions 16 99 and 120 fall is non trivial.

We also had to do compositeing of 20/21 motions on one topic which where all worded slightly differently and had slightly different effects took 4 of us about 2/3 of a day just for that.

Another example is say the various legal documents for pensions a choice of a different two letter word can lead to years of legal arguments - the difference between CPI and RPI


The current buzzword for this is “rules as code” and many governments are exploring it. You’re right that it is particularly relevant to complicated financial laws like the tax code, and necessarily has been done in a limited way by tax software for many decades now. I am skeptical that this technology can meaningfully assist us in resolving contentious legal questions – it’s more about generating wizards that can help you navigate a 1,000 page tax statute by only showing you what’s relevant to your problem.


This is mostly true, however laws are are already generally written as (legal) diffs, and many agencies issue regulations as diffs as well.

Cornell Law has providing version control releases of the federal laws for some time for free, and Lexis and their competitors have been doing the same for decades for commercially.


IMHO we shouldn't define the problem in terms of reducing legislation into some computer language or schema; indeed the effort should be in describing and linking the terms into a graph with consistency validation rules and shape identities.


Do you plan on open-sourcing it, or do you have any recommendations for similar projects/research? Very excited about the computational law space.


Check out http://austlii.community/wiki/DataLex. In Australia we are fortunate enough to have AustLII publishing virtually all Australian laws and court decisions (at least from this century) for free in a somewhat consistent HTML format. DataLex is the name for the “computational law” research AustLII staff have been doing since the 1980s. It’s interesting, but the real value of AustLII’s work is in getting the courts and legislatures to allow them to collect all the raw data and publish it in a free database with full text search. Just getting to that point is a huge improvement over what’s freely available in the US and the UK.


I don't think I could legally do that


Who was the client?

I ask because my state's legislature has a staff that reviews proposed legislation and then maintains the revised official laws. If I was tasked with doing diffs, I'd first interview them, try to make their jobs easier.


Private Persona


yeah complex tax laws often have interrelated and cocontingent things to compute the final tax on.

Is this deduction before or after your AGI, whats the maximum % of AGI that it can be, does taking the deduction lower your AGI and thus the max percentage of AGI that it can be?


Ex Dee


During one of those rabbit hole journeys I discovered that Dutch Tax authorities use MPS, a DSL creation language/tool by JetBrains [1]

It makes me wonder, why haven't DSLs caught up? Their claim that it allows developers spend more time implementing the business logic makes sense. But somehow that promise hasn't been realised. I'm curious to know from those who tried DSLs.

[1] https://www.jetbrains.com/mps/documentation/


The nearest I've been able to articulate this is lack of overlapping skillsets & departments' tendencies to hire people who look like them.

The kind of person who (a) lives in business problem land & (b) is proficient in programming language design (even guided DSL generation)... doesn't exist. At least not in hireable numbers.

And those that do are buried deep in the guts of consultancies, who can afford to pay them way more than customers can.

It's a shame, because it results in suboptimal software. And suboptimal tooling available to the devs that do work in that space.

The best solution I've seen are products that lower the knowledge barrier of entry (at least for creating a proof of concept) + designing for non-programmers as your primary users.


Hi, OP here :) I've come to the same conclusion about programming language creation. However, for a very large organization, it makes sense to have a team in charge of language tooling (see Dropbox/Python, Facebook/Hack/Flow, Apple/Swift). Which is why I'm trying to convince the French state that they should have such a team of permanent people.


I worked in a similar kind of role recently, in a large public retail company.

The only advice I'd give would be to lean hard on finding, maturing, and then advertising end-user champions.

Cross-department / -traditional boundary products are frustratingly difficult to push top-down, as the leader of the space that "owns" the product (i.e. IT) doesn't directly see the value, because they're not the end user (business).

What mostly work for me was being as loud as possible with open-attendence educational events, continually taking meetings from interested areas, and then mentoring developing teams.

The goal is to help them create a killer product using your product, such that (highly-placed leader on their side) talks to (your leader) in glowing terms about your product. And that usually happens because your product helped them get a win that moved an important metric to them.

Hint: Ask them about things they've always wanted to do, but couldn't because it was technically impractical. There's probably at least one diamond in there that would be "easy" with your product.

Hint2: Think more broadly about the kind of thing you're trying to do, and get your team in that area. I've worked under CFOs as often as I've worked under CTOs, because "saving money" is near and dear to the former.

(Adapt as necessary to how French government works. Good luck!)


Quite insightful! Being a part of product teams, I've noticed "platform" teams struggle and the reasons have mostly been not doing what you've pointed out above. As in, instead of working with their customers (i.e., other product teams) to identify their problems and fix them, they would push down their generic platforms down the throat. It invariably didn't end well.

I tend to think that platform/framework teams within a large orgs should be run as a B2B SAAS, at least with that mindset.

Also, if a platform team isn't run well, it ends up being the first one on the chopping block during layoffs. Uber laid off an entire developer-platform team earlier this year. One casualty was the Screenflow team, a promising product that didn't gain wider adoption due to terrible marketing/evangelism.


One can turn it into a flywheel too.

What features should you work on next? The things your users are asking for at your touchbases.

There's a time and place for top-down, but it works best when there are few edge cases. Platform work tends to be a normal distribution with the usual number of "Oh. We never thought anyone would want to do that" tails.


Very insightful update, I'll remember it. Thanks!


Thank you for applying your skills to make the world a better place! If everyone did that, we'd all be better off.


The issue you'll bump into is attracting the same caliber of talent.

Compare the offers from Dropbox, Facebook and Apple.


I make them all the time (https://jtree.treenotation.org/designer/), and track many thousands of DSLs.

IMO, the problem is our languages are unnecessarily complicated. they are all linearly parsed BNFs, and you don't need any of that. I think things will start changing big time.

That being said, my favorite ecosystem in the traditional DSL world is ANTLR, and I'd highly recommend Terence Parr's books on the field if you are interested.


ANTLR 3.x was a game changer for me. I was able to refine my grammars so that the resulting parse tree and abstract syntax trees were the same thing. No goofy inlined tree construction pragmas, term rewriting, post parse tree walk processing.

I'm just a grammar mechanic, so I don't really grok the underlying theory or use the right word for this stuff.


I was late to the party and didn't start until ANTLR4 https://pragprog.com/titles/tpantlr2/the-definitive-antlr-4-...


Does his book actually teach you how to create a DSL using ANTLR and not just and overview? Btw the 2 books of his are almost a decade old, so are they any good now?


Yes, it walks your through it step by step. I have https://pragprog.com/titles/tpantlr2/the-definitive-antlr-4-... and at least 1 more of his, forget where it is. I would say it is ageless (at least, until something better than ANTLR comes along—ohm is promising but not sure what the latest is with that).

"The Definitive ANTLR 4 Reference" is absolutely the most understated title I've ever seen in a book. It's really more like "The Book That Will Change How you Look at Programming Languages Forever"

IMO anyway. I guess it depends on how much you've been exposed to parsers and grammars and compiler compilers already.


Thanks, I'll start reading his book, already have some experience with interpreters and compilers :)


They do? That's bloody awesome... there's something to this country I love, despite the bad weather and worse food :)

Can you share the resources you found?


JetBrains has a DSL creation language called MPS (Meta Programming System)[1]. I stumbled upon it while exploring their various offerings. I haven't played around with MPS but came across Dutch Tax system's use of MPS in their case studies, it's near the end of the page [1].

You will find a decent info to get started here[2]

[1] https://www.jetbrains.com/mps/ [2] https://www.jetbrains.com/mps/concepts/domain-specific-langu...


Could you clear something up for me, do you mean DSLs generally, or DSLs for tax codes specifically? TIA


DSLs in general, say in banking system, e commerce apps etc.


Well, this is (one of my) areas so here goes. DSLs are a concept, not an implementation. As implemented they can vary from chained procedure calls to actual sub languages with lexers and parsers (and I tend to consider the latter to be 'proper' DSLs, but that's just my view).

To have a 'proper' DSL I reckon you need two things, and understanding that a thing can and should be broken out into its own sublanguage, and the ability to do so. The first takes a certain kind of nouse, or common sense. The latter requires knowing how to construct a parser properly and some knowledge of language design.

Knowing how to write a parser is not particularly complex but as the industry is driven by requirements more of knowing 'big data' frameworks rather than stuff that is often more useful, well, that's what you get, and that includes people who try to parse XML with regular expressions (check out this classic answer <https://stackoverflow.com/questions/1732348/regex-match-open...> Edit: if you haven't seen this check it out cos it's brilliant).

I think this reflects the fundamental problem in software development of the market's not knowing what's actually needed to solve real business problems.

++++

Edit, some reading material

https://www.amazon.co.uk/Language-Implementation-Patterns-Do...

https://www.amazon.co.uk/Definitive-ANTLR-Reference-Domain-S...

https://www.amazon.co.uk/yacc-Nutshell-Handbook-Doug-Brown/d...

They're all worth investing the time in.


Thanks a lot! This is very useful information and follow up books.


Modern computer languages are DSLs for writing DSLs, which you do by defining public domain specific classes, methods, modules, etc. A library for computing PI is a PI DSL.

At least, that's the right way to name things in code.


That's really stretching the concept of a DSL, but at an extreme it can be seen that way. What you're really describing is hierarchical structure.


The French government and parliament also use this implementation of the law (taxes, benefits, ...) : https://github.com/openfisca/openfisca-france It's open source and contributive so, a common structure is used by multiple countries. They are listed here : https://openfisca.org/en/countries/


Here's the source code of the implementation of the tax code: https://github.com/etalab/calculette-impots-m-source-code


Hi, one of the paper authors here. This is unfortunately only part of the story. This "calculette" covers only a fraction of the tax computation; furthermore, without knowing the crazy semantics and computational rules of the M language, it's very hard to reproduce the tax computation.

As a side-note, the source code appears to have moved here: https://gitlab.adullact.net/dgfip/ir-calcul


The languages breakdown is interesting. It seems GitHub refuses to admit defeat and categorises the .m files into various other languages which use the extension (M, MATLAB, Objective-C, Mathematica and Mercury). I wonder if they use some sort of fuzzy ML solution for categorising them rather than conventional parsing.


Wonder no more! https://github.com/github/linguist/blob/7c2adbdb15d4efd25d92...

As most AI, it's regex and ifs all the way down.


> As most AI, it's regex and ifs all the way down.

This made me laugh way more than expected, mostly because of how true it is.


These incredibly vague regexes are hilarious. But I guess if it works, it works (until it doesn't)



I can't help but wonder what awesome things happen in "iliad" and "ocean" mode after looking at some of those files:

    application : pro, batch , iliad,oceans ;


Iliad means "Informatisation de L'inspection d'Assiette et de Documentation" which roughly translates to "computerisation of tax base and documentation inspection".


Hi, author here :) You seem to be well-informed of the DGFiP jargon, do you know if news of my work has been spreading among the IT department there?


Hi! Unfortunately I have no idea, I don't work there nor have any affiliation with them.


If tax codes are so complex that authorities struggle to maintain the code that implements them, how are humans supposed to understand them well enough to follow the incentives they are designed to create?


Like every other piece of software they are probably spending 90% of their time on edge & corner cases while citizens are spending 90% of their time solidly in the simple core functionality.


Most of the time you only need a small subset which is manageable. I once tried to increase my deductions as a normal citizen but this went nowhere quickly as you need to create giant loopholes to get it done as a person. Businesses on the other hand are the ones using the complex stuff and they can hire an expert (in this case expensive doctors in the case of the body metaphor of the other poster) at quite some cost.


Most people don't know about the muscles or bone structures in their hands, but most people seem to know how to use their hands anyway, despite the gross complexity involved.


Most people aren’t at a gross disadvantage to those who know how to game the system or bribe authorities to make their hands work far better than others.



Translating legal texts to mathematical form is very interesting. It could decimate most legal jobs if a lawsuit can be converted to mathematical form and then 'executed' against the laws that are also in mathematical form. You get your judgement and the explanation as to how that conclusion was reached, all automatically.

It could even cause headaches if contradictions in legal judgements are detected.

It all relies on the conversions to mathematical form being done correctly though which, given that some laws can be intentionally vague, may be impossible.


Australian CSIRO is working exactly on that

https://theconversation.com/csiro-wants-our-laws-turned-into...

https://research.csiro.au/bpli/our-research/reasoning/

https://people.csiro.au/G/G/Guido-Governatori

They are using https://en.wikipedia.org/wiki/Deontic_logic and https://en.wikipedia.org/wiki/Defeasible_logic describe laws in terms closest to how it's done in the legal community.

Can't find a link but they codified some parts of the Australian import duties laws that we went through in a workshop of theirs that I had a chance to attend.


https://theconversation.com/csiro-wants-our-laws-turned-into... claims law-as-code is a bad idea because its "dynamic"/"always changing" and "discretionary"/"requires or open to interpretation".

The first argument is nonsensical (computers are great at changing data: in fact way faster and more accurate than humans, having the capacity for things like single source of truth, change logs, peer consensus, and dynamic versioning while preserving historic versions automatically... and also better at publishing it, analyzing it, and documenting it).

The second argument is an outright straw man fallacy. Who cares if some laws require interpretation. Just write MAY instead of WILL (like RFC language) to make it clear the judge can decide, then provide statistical information regarding past case law. Nobody is saying "fire all the judges", they're just saying make the law clear. Right now it's not clear, and it's a big problem.

Refugee? Read up to date law in your language, automatically. Business? Determine what is required by law in order to execute a transaction. Human? Determine what you are and are not allowed to do in some field (like architecture, driving, sailing, pet walking or rock collecting) without being fined.


> The second argument is an outright straw man fallacy. Who cares if some laws require interpretation. Just write MAY instead of WILL (like RFC language) to make it clear the judge can decide, then provide statistical information regarding past case law. Nobody is saying "fire all the judges", they're just saying make the law clear. Right now it's not clear, and it's a big problem.

But that is not how law works. A judge is generally expected to interpret the law because we cannot expect someone who wrote the law to have predicted all possible things, especially those that did not even exist when the law was written.

It's (often) not close to a machine-interpretable spec, but a to visual mockup, to stay in the software area.

For example, you may have a law to forbids euthanasia. Does it also extend to assisted suicide? What if the dying person can't physically trigger their own death? What if assisted dying is illegal here, but someone takes the patient to the neighboring country?

Also, I hardly believe "read up to date law in your language" is possible, there are entire legal concepts that do not exist in different jurisdictions, or literally the same expression may mean different things ("voir dire" for example).

It's good to attempt formalizing things, but I don't think this is a strawman.


The straw man was "interpreted law is uncodifiable" - essentially throwing the baby out with the bathwater. We can interpret where necessary without claiming that because interpretation is required in specific cases that the whole concept fails.


> law-as-code is a bad idea because its "dynamic"/"always changing"

I don't think it was meant in classical terms of an SQL UPDATE. It was meant in a way that a new rule may affect the application of an existing rule. Academically speaking, it makes reasoning non-monotonic. This is exactly why https://en.wikipedia.org/wiki/Defeasible_logic is proposed.

> The second argument is an outright straw man fallacy.

I think they are trying to codify the laws in logic _without_ changes to the law.

[Edit] The journalist also failed to read up on the basics of deontic logic and defeasible reasoning: "The law says cars must drive on the left in Australia. But what if they have to cross the road to avoid hitting a child?"


Yes, in the current system it's shifting goalposts. That is a bad system.

You cannot completely successfully codify something that is based on the vagaries of wishy-washy language, nor specific legal concepts like the intent of regulations or the context of prior judgements. Therefore, improve the language: don't give up!

Imagine if latitude and longitude weren't invented because "sorta over there a few days sail beyond the cape" was too hard to quantify. This is the same ridiculous argument. It just so happens that there are also a vast number of ingratiated rent seeking and powerful people and corporations interested in the status quo: literally all of them.

I believe that as engineers and as optimists within the greater human endeavour, over time in all fields we should seek to create means of trust and means of precision: in our measurements, in our communications, in our analyses, in our references and in our collaborations.

We don't need to fire all the judges. But maybe 90% of the solicitors and standard procedural lawyers, a large part of whose job is explaining to the average citizen what exactly is the done thing in some particular area or how exactly they can expect to be treated the hands of a system that cannot otherwise explain itself.

Also, in terms of community governance if it becomes crystal clear that a law is being abused through increased fidelity in the logging of police actions brought about by such a system, then the law can more rapidly be identified and repealed.


The last century has seen a huge increase in the quantity of law that is written out explicitly in statutes [1], rather than being worked out on a case-by-case basis according to the common law method. This is an attempt to “improve the language” as you have suggested, but it has not made the law easier to comprehend. Detailed statutes make it harder for lawyers to build up a coherent picture of the entire legal system, because there’s a much greater risk that a solid argument based on general principles collapses due to a specific statute that the lawyer has never heard of.

[1] https://www.gov.uk/government/publications/when-laws-become-...


Again, this points to poor scoping, an imprecise language and a broken system. According to your assertion it's currently impossible for even professionals to know what should be considered in scope. This should be the trivial basis of a legal case, not the difficult and dubious extended research result of paid professionals.


Yes, that is my assertion – it’s impossible to verify that you have considered and correctly interpreted all relevant law, although it should be rare for professionals to make a mistake, especially after extended research. I’m curious as to what makes you think this problem can be solved trivially? Formal verification is hard enough for algorithms over the integers.


Law needs to be rewritten to be efficient and transparent. Logically speaking, the first countries to do so should score major investments and trade bonuses from multinationals. If you have a tinpot dictatorship with US protection or an out of the way novelty country with few useful industries, you could do worse than throw down on this project.


> It could even cause headaches if contradictions in legal judgements are detected.

I remember hearing an anecdote about researchers who tried to formally specify the benefits system of some European country, and they found that the system was slightly non-deterministic, in the sense that the outcome for the citizen would depend on which order various government departments processed the various forms that the citizen used to enrol into various programmes they were eligible for.

It makes me wonder if these mathematical formalisations should support some kind of fuzzing, to check that entities doing things in slightly different orders (or earning slightly more/less) don't produce dramatic changes in outcome.


It's not possible to reduce all lawsuits to compact formulae. Part of the legal profession is in interpreting old laws in the modern context, for instance, which would take something close to a general AI. More broadly, reasoning about legal edge-cases needs a sophisticated understanding of the law and of the world. Also, jurors can't be replaced by software, as they must be 'peers' by definition.

I imagine it would be much easier to build a system to roughly estimate the odds of a lawsuit being successful, inputting salient features of the case and running the numbers, without deep reasoning about the particulars. I don't know if work is already being done on this but I wouldn't be surprised. A lot of money must ride on knowing which corporate law battles can be won.


Work is being done on this – here’s some in the human rights context: https://link.springer.com/article/10.1007/s10506-019-09255-y

But the outcome of high-stakes commercial litigation is much more complex than “win or lose.” Most cases settle, so the ones that result in published legal judgments which can be fed into a machine learning tool are an unrepresentative minority. Participating in this kind of litigation also costs millions and attracts public attention, leading to innumerable second-order effects (eg. public relations damage, law reform) that are hard to predict and may be more significant than whatever specific legal decision a judge makes. So being able to put a numerical probability on the legal opinion “we think the company has a good chance of winning this case” may not be that helpful.


It sounds like what you are describing are Ricardian Contracts. [0] Also worth viewing is this talk by Clay Shirkey on technology tools for Government [1].

Ultimately, I don't see law being automated. As others have pointed out, most cases that go to court are all about interpretation and establishment of the circumstances (did events X, Y and Z happen), interpretation of the law (does law F mean that event X was illegal, and is event Y a mitigator or irrelevant?) and, more elusively, impact (how serious was X?).

What this doesn't mean, though, is that there aren't tools that can't be used in the creation and management of law. If we have a simple dependency tree of documents and sources, we can trigger a review of downstream documents if a document is changed. We can statically check some aspects of the document, like a Legal IDE that would present itself more like an advanced spellcheck. Meta rules can ensure consistency, add in support for in-line reading (include a definition next to a paragraph even though it is defined elsewhere).

Probably the closes thing you'll get to an 'automated' system would be for the 'process' that is currently done by administrators to be partly or fully automated. When charges are brought, a court date is automatically booked. When a judgement is made, a court report is generated, prison alerted etc etc.

Basically, everything that people in tech think should be automated, shouldn't be, and all the things technology people think shouldn't exist should be automated.

[0] https://en.wikipedia.org/wiki/Ricardian_contract

[1] https://www.youtube.com/watch?v=CEN4XNth61o


Well technically (joke)... a decimation is 10%.

I agree. Very interesting. Realistically, paperwork & bureaucracy is an ideal target for automation. That said, every input that you feed an algorithm is legally debateable... Structuring your inputs to be both legal and optimal is more storytelling than mathematics. There's enough play in these "subjective" parts that results could be anything.

Still, having a major algorithmic in the mix changes the game.

The fact is, that while PCs have been (a) on every desk and (b) paperwork efficiency machines for the last 30 years... the total number of legal/accounting/hr/admin jobs has increased in this period.

The effects of technology/efficiency on "clerical" work is not predictable in the way it is on factories.


In countries or legal frameworks based around civil law, I think that could possibly occur. It would be impossible to do this in a formulaic way in legal areas covered by common law, although having digitized legal "assistants" that could bring up and analyze relevant precedents seems almost guaranteed (if they don't exist already?)


Civil law probably has a big edge in computer-driven legal implementation, if that's where things are headed.

I have no idea what would be in it for Common Law countries. Perhaps they'd be more driven towards using data mapping techniques to lay out networks of heterogeneous legal stuff (jurisprudence etc.).


IMO this is such an 'obvious' idea once you think about it that it will eventually happen.

The law will go from written natural language to executable computer programs (formal languages).

However a change of this mangnitude cannot happen quickly, (and might even be impossible to do peacefully).

Abusing the "software is eating the world" analogy, we ain't seen nothing yet. Software is just barely opening its mouth.


There is no floating point representation for 'gross', 'egregious', or 'blatant'. At least not one that will be suitable in all cases.


Right on. Consider whether forcing someone to unlock their smartphone should constitute compelled self-incrimination. This is a problem that US courts have flip-flopped on. [0] How would an SMT solver derive answer? It can't. You'd need a general AI.

Even ignoring the subtleties of the real world that the ̶l̶a̶w̶ edit legal system must cope with, nuts-and-bolts legal concepts like mens rea can't be mapped to algebraic expressions.

[0] https://arstechnica.com/tech-policy/2020/06/indiana-supreme-...


but the law is not the judge


Yes but you can do a what-if analysis based on the parameter $degree being equal to 'gross', 'egregious' or 'blatant'.


They opitimized it by removing dead code etc. So I wonder if that could be fed back into the original laws to simplilfy them.


I was rather thinking about fuzzying it to find gaps in tax laws.


In a previous job we built an advisor for banking and pensions using a mathematical optimiser solving over a model of the Danish tax rules. It would normally find enough tax savings to retire about a year earlier by finding the best savings strategies (e.g exploiting dynamic and temporal effects between taxation on spouses retiring at different times, using the house mortgage as a savings buffer to shift pension payouts to lower tax years etc). It was quite an eye opener.


You should write a blog post or a HN story about that!


I can give a few highlights: it had an exact model of a personal economy (savings, pensions, debt, houses, cars, boats and other assets) and the tax system and a projection engine capable of predicting the future cash flows for a person.

The projections were the basis for advising the client how to best manage their finances.

The advisor model could request an optimisation of how to best allocate the assets and liabilities over time from a mixed integer programming model built with GAMS. The latter optimisation model was not exact but it could generate close-to-optimal strategies for e.g. pension savings, buying houses and how to best spend the savings after retirement. The best strategies were then fed back into the exact model, evaluated and presented to the user.

It requires a pretty complex tax system to really generate a lot of value for the clients so its value was lower in the neighbouring countries that had separated their social systems more from the tax system.


Where could one read more about this?


It was developed by a Danish startup named Financys, which was acquired by Schantz (another Danish company) and this was later acquired by Keylane from the Netherlands. Unfortunately I don’t think there are any good, detailed information about the technology available any more.


Someone did do this - finding discontinuities in the tax law - using formal methods, for the French tax law:

https://blog.merigoux.fr/en/2019/12/20/taxes-formal-proofs.h...


He's also the author of the research paper we are discussing.


Oh, oops!


Most of the French socio-fiscal system can be simulated in Python through https://github.com/openfisca/openfisca-france/.


I remember when the US tax code was going to be simplified so that an entire tax return form could be printed on a post card.


The majority of tax payers actually don’t have very complex taxes, if you only have w-2 income and don’t itemize deductions, you just need the single page 1040. Even small businesses aren’t complicated if they have proper bookkeeping, or if you’re just deducting mortgage interest. The piles of tax regulations really are for wealthy people and corporations with complicated accounting.


Here in NZ we got rid of almost ALL deductions the tax system is so simple that almost all people with one job pay exactly the right tax through employer PAYE (as someone who runs a small company I do my PAYE with 1 line of spreadsheet formulae, it's easy).

This means most people don't need to file, if you want to it's a 1-2 page web form, and, starting last year, the IRD will do it for you anyway and directly credit your refund (with interest) to your bank account without you asking


What do you do about interest on savings accounts or capital gains on stocks?


The bank takes it out at a standard rate - when you open an account you tell them your marginal rate, if you get it wrong it's obvious to the IRD.

We have no tax on capital gains (we should, it badly distorts our markets, leaves too much money in real-estate).


Was that the 9-9-9 plan? To be honest such simple tax rules was eye-opening and refreshing but I don't think it was particularly fair.


It actually goes back to the flat tax proposals to set the top rate to 28% with very few deductions.

The idea was that you don’t have some rich paying the top rate of 39%, and some paying near zero (cough, cough, Mr. Trump) due to a large discrepancies in deductions.


You'll notice they never specified how big the writing was going to be.


> This algorithm relies on a legacy custom language and compiler originally designed in 1990, which unlike French wine, did not age well with time.

Gotta love the sense of humour.


I wonder if you could backpropagate gradients all the way through the tax calculation, so it could tell me what to change about my finances to minimize taxes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: