

Show HN: “Did You Mean?” for Python - jamesdutc
https://github.com/dutc/didyoumean

======
acron0
It niggles me that the library is called "Did you mean?" but the error message
is "Maybe you meant?". Why is the error message not "Did you mean X?"

------
jamesdutc
This is probably not a good idea. (dutc = Don't Use This Code)

It's horrible that it even works.

It's surprising how well it works when it does work.

It was a great opportunity to solidify some knowledge about low-level
(implementation) details.

~~~
maxerickson
Since you mention thinking there must be a better way in hook.c, you can
override the traceback printer:

[https://docs.python.org/3.4/library/sys.html#sys.excepthook](https://docs.python.org/3.4/library/sys.html#sys.excepthook)

I haven't thought through how to access the state needed to make a suggestion
though.

~~~
jamesdutc
Yeah, that was one of the (less-bad) approaches I tried first:

[https://gist.github.com/dutc/3f2c79048d95287be138](https://gist.github.com/dutc/3f2c79048d95287be138)

But you can see it's somewhat limited. And I was curious to actually hook an
internal Python call!

By the way, that hook.c comment was wondering whether there might be a nicer
(more nearly portable?) way to set-up these assembly sections. Some of this
stuff was hacked together, but I have a couple other games I want to play with
this gimmick, and I want to make the hooking a bit better.

[https://github.com/dutc/libhook](https://github.com/dutc/libhook)

------
jamesdutc
Someone on GitHub asked "why shouldn't this be used":

[https://github.com/dutc/didyoumean/issues/1](https://github.com/dutc/didyoumean/issues/1)

Here's the answer I gave:

It's not a particularly useful feature in practice. Misspellings resulting in
AttributeErrors are generally caught pretty quickly, and this addition to
standard error reporting would mostly be useful in interactive settings where
the mistake would be obvious.

(Note that linting tools like PyFlakes already do a fairly good job of picking
up on NameErrors, which result from using variables that don't exist.)

I have two other approaches to supplement error reporting with spelling
suggestions
([https://gist.github.com/dutc/3f2c79048d95287be138](https://gist.github.com/dutc/3f2c79048d95287be138))
that are a little less ‘janky.’

What makes this approach particularly offensive are:

\- I implement this by hooking into a C function in the Python interpreter
itself, `PyObject_GetAttr`:
[https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...](https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155df6c6e3b59b914bacf20/src/didyoumean.c#L147)

\- I find the function, unprotect its memory page, then clobber first few
assembly instructions with a jump:
[https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...](https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155df6c6e3b59b914bacf20/src/hook.c#L29)

\- I need to jump to an absolute address, since I don't want to (or can't?)
calculate the relative addresses. I don't believe I can do this with a `push`
and a `ret` or with a regular `call`, so I use a `jmp` instruction. The `jmp`
instruction won't take an absolute address as an immediate value, so I have to
use the `%rax` register:
[https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...](https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155df6c6e3b59b914bacf20/src/hook.c#L12)

\- Since I'm using this register, I have to save & restore its value. In the
hooking code, I save its value with a `push %rax`. In order to restore its
value, I have to patch the assembly for the hook function to stick a `pop
%rax` before any other instructions:
[https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...](https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155df6c6e3b59b914bacf20/src/hook.c#L50)

\- In order to figure out the candidates for the spelling correction, I need
to call `dir` on the object. But `dir` calls`PyObject_GetAttr` internally, and
those calls can themselves trigger exceptions. In order to avoid this
unbounded recursion, I have to implement a parallel code-path for `dir` by
creating a `safe_PyObject_Dir`:
[https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...](https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155df6c6e3b59b914bacf20/src/safe_PyObject_Dir.c)

\- In some builds of Python, the internal CFunction which provides the Python
builtin function `getattr()`, `builtin_getattr`, is compiled without explicit
calls to `PyObject_GetAttr`. In order to hook into these `getattr` calls, I
need to patch the builtin module. But because someone could have already
gotten a handle on `getattr`, I need to directly patch the function:
[https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...](https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155df6c6e3b59b914bacf20/src/didyoumean.c#L150)

This approach is probably not portable, and it's definitely not a good idea.

However, figuring out all these small problems was a great way to put some
low-level knowledge to use!

~~~
Someone
_" In order to avoid this unbounded recursion, I have to implement a parallel
code-path for `dir` by creating a `safe_PyObject_Dir`"_

I thought this was intended to not be used. Why then, don't you do the
(in)sane thing and set a global flag _" I am calling `dir`; ignore calls to
PyObject_GetAttr, please"_, and check that flag in your patch? (if you aren't
sure whether a single global flag will do, use a thread-local one, but I
think/guess that is overkill, given Python's GIL)

And, by the way, this is how extensions in Mac OS pre Mac OS X did their
magic. It got really fun when multiple extensions tried to patch up the same
OS call, or when the OS would unpatch your patches. For some OS calls, that
would happen when the Finder launched, for others whenever an application
launched.

~~~
jamesdutc
Setting a flag is another approach. There's already an extension mechanism for
storing arbitrary per-thread state in a dictionary object, accessible via
`PyThreadState_GetDict`.

[https://hg.python.org/cpython/file/1d708436831a/Python/pysta...](https://hg.python.org/cpython/file/1d708436831a/Python/pystate.c#l361)

However, implementing the parallel code-path turned out to be easier (just a
bit of cut & paste) and a bit saner to debug. `PyObject_Dir` tries to look up
a few attributes on the object (`__dir__`, `__bases__`, &c.) which may not
exist. There's a high likelihood of multiple AttributeErrors each time
through.

Any documentation on Mac OS patches? Would be very curious to learn more about
how these worked out in practice!

~~~
Someone
That old MacOS stuff is hard to find on the Internet. One thing I did find
that gets a bit close is
[http://www.mactech.com/articles/develop/issue_16/Radcliffe_f...](http://www.mactech.com/articles/develop/issue_16/Radcliffe_final.html).
Problem is that it likely is a bit hard to follow if you do not know about the
stuff. Moreover, it doesn't really show patching itself.

I am fairly sure there is some article about patching in those archives of
Develop, but I couldn't find it.

Wikipedia has little, too.
[http://en.m.wikipedia.org/wiki/INIT](http://en.m.wikipedia.org/wiki/INIT) may
provide some starting points.

------
w-m
Didn't I see a question on how to get the object when an AttributeError is
thrown on stackoverflow yesterday?

~~~
Noelkd
You did[1]

[1] [http://stackoverflow.com/questions/26548094/how-to-get-
hold-...](http://stackoverflow.com/questions/26548094/how-to-get-hold-of-the-
object-missing-an-attribute)

