
Ask HN: Are There Generalized Transpilers? - passer_byer
I have a client who has tens on thousands of legacy Base SAS programs for calculating Current Expected Credit Losses (CCEL) models. Base SAS is a commercial, interpreted language used widely in financial services for reporting and risk analysis. New hires coming into the firm mainly know Python and few know SAS.  They would like to convert their exusting SAS programs into Python 3 scripts.<p>One obvious approach is to recode by hand the existing logic into Python.  However, this is not a scalable approach.<p>I would like to know if there are existing tools, like a generalized transpiler that parses language X into language Y.<p>https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Source-to-source_compiler<p>I have a reasonable understanding on mapping common Base SAS language constructions into Python.  See:<p>www.pythonforsasusers.com<p>Absent such a generalized tool, what advice do you have to help automate such an effort?
======
verdverm
You'll first have to parse the SAS code, then output python or some other
intermediate form.

Does an open source parser exist?

Golang did a C to Go transpile, there are some articles and maybe a talk on it
out there. It likely won't be 100% perfect, so you will need tests to verify
correctness.

I have a tool that isn't open source yet, but it could be quite useful if you
can get SAS to an AST in json or other data format. Hit me up if you'd be
interested in talking about that option.

~~~
passer_byer
Thanks for that. There is no open source SAS parser. There are commercial
versions, but that is why I ask about a generalized approach, so as not to
build a bespoke parser.

There are a couple of Python libraries for converting SAS datasets into
dataframes which I plan to use.

Converting the existing program logic is the main challenge.

~~~
verdverm
Maybe check out PEG parser generators, they are easier to reason about
compared to older things, but are less capable, though often sufficient

------
eesmith
Tell them that you can't do it.

The project you describe is complicated. SAS is not a simple language. For
example, how do you plan to map GOTO into Python?
[https://support.sas.com/documentation/cdl/en/imlug/64248/HTM...](https://support.sas.com/documentation/cdl/en/imlug/64248/HTML/default/viewer.htm#imlug_langref_sect126.htm)
.

Here's part of a resume of someone who wrote a SAS parser as part of a job.
Between 2013 and 2016:

> Wrote a SAS parser (C++/ANTLR). SAS is a 70s language not at all written in
> conformance with modern computer language practices. Among its many
> ‘challenges’, the worst is that keywords can also be used as variable names
> (thus you can define variables IF, ELSE, THEN and have a legal SAS statement
> like “IF ELSE=IF THEN IF=ELSE ELSE THEN=IF;”) So standard compiler theory /
> a classical lex/yacc approach won’t work because lex/yacc assumes that
> reserved words aren’t used as variable/function names. I wrote a parser that
> successfully builds an Abstract Syntax Tree (from which symbol tables and
> more can be extracted). Successfully tested against a test bed of 180
> representative SAS programs taken from industry production environments.

Note that it took several years!

Your client has 10,000+ SAS programs. They likely exercise a lot of odd
corners of the language. They'll want some strong assurances that the
conversion is done correctly, as this is business logic.

Based on the way you asked the question, I'm certain that this is not
something you can easily take on. You don't seem to know much about language
parsers or enough about the SAS language.

Instead, pay someone to do it who already has those skills. Looking around
(and with absolutely no knowledge or experience in the topic) I found
[https://dullesresearch.com/](https://dullesresearch.com/) which says:

> Automatically Convert SAS to Python and Java Instantaneous, 100% accurate.

> Migrate off of SAS to Open Source / Avoid manual rewriting of code /
> Eliminate human errors

There are almost certainly others.

So my suggestion to help automate such an effort is to pay someone else to do
it for you.

Find out how much they charge, what their guarantee is. Then figure out what
your client wants. See if they are compatible. If so, charge your client +15%
for overhead, you work with the other organization, and you're done.

~~~
yesenadam
>keywords can also be used as variable names (thus you can define variables
IF, ELSE, THEN and have a legal SAS statement like “IF ELSE=IF THEN IF=ELSE
ELSE THEN=IF;”)

OK, so have a preprocessor check for variables[0] with keyword names and
change them to an unused name. Problem solved.

[0] e.g. an X or Y in "X=Y"

~~~
eesmith
If your preprocessor is clever enough to disambiguate things then you can use
the same logic directly in your parser.

