Hacker News new | past | comments | ask | show | jobs | submit login

Well my best-worst debugging story concerns a friend's work based email account and Microsoft Outlook (in 2012). Occasionally it would fail to connect to send email to the server, just randomly.

Obvious troubleshooting ensued, web traffic worked, could ping the email server, could connect with telnet and read email that way; Thunderbird worked. Created a new account, that would work for a while and then fail again.

Less obvious troubleshooting, traced route to server whilst running the connection - route worked, connection failed. Outlook logs showed attempts to connect to the correct URL but the connection wasn't being made. Checked for malware. Reset router, actually I think we replaced it. Pulled out a sysinternals tool, tcpview IIRC, watched the connections being made ... hang on, what's that IP address??

Turns out Windows was querying and getting the IP address but somewhere it was reversing the dotted quad and when Outlook said it was connecting to the relay.example.com server - lets say - it was instead attempting to connect to ...

I didn't track whether it was MS Windows or Outlook that was in error, I just dropped the correct address in as a line in the HOSTS file on the three affected computers. Fixed.

Very satisfying to find the way the problem arose and an easy fix; but would love to have seen internally where the error was arising and exactly why. I did find one other report that sounded like the same problem IIRC. My only idea was that an automated reverse-IP hostname like some ISPs use - like "9-8-7-6.ispnet.com" - was for some reason getting parsed in as the IP, but I wasn't about to start reverse engineering stuff to find out.

Sounds like someone forgot to call this function, and maybe most of the systems were big-endian, so it didn't matter, but one was little-endian: https://linux.die.net/man/3/ntohl

Endianness was the first thing I thought of as well. Or maybe one component of the stack thought that IP addresses are char[4] while another thought they're u32_t, though you'd expect that to be caught by the typechecker.

Similar to the UK academic network JANET's problems with computer science department emails in the late eighties/early nineties. JANET used X.25 before transitioning to TCP/IP, with its own idiosyncratic email addressing that reversed the order of the domain name segments (relative to DNS order) in an email address. So, a University of Edinburgh CS department members with an address like 'grkvlt@cs.ed.ac.uk' was translated into user 'grkvlt' and host 'uk.ac.ed.cs' then promptly sent off to Czechoslovakia by overly-keen international mail gateways. This led to many CS mail server sub-domains gaining an initial departmental 'd', thus 'dcs', giving 'dcs.ed.ac.uk', 'dcs.gla.ac.uk' and so on...

Might well be the parsing. Many people might use \d+.\d+.\d+.\d+ or worse.

I.e. not escaping the .

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact