Hacker News new | past | comments | ask | show | jobs | submit login
Search: .lenght - Github (github.com)
262 points by uyhayuy on Jan 14, 2012 | hide | past | web | favorite | 102 comments

I remember seeing a Github bot a couple weeks ago that strips out whitespace and adds a .gitignore file to a repo (I also remember this really rubbing some people the wrong way). This search indicates that it would probably be useful to have a linter bot running on Github for all the popular languages. It would find syntax errors, common mispellings, and compilation issues, and then submit pull requests to fix the issues.

I have no time to work on something like this myself, but I'm sure a lot of people would find it useful, especially if it acted as a "first defense" before deployment. Curious what other HN'ers think about this.

I wrote that bot!


Feel free to fork it to do whatever you want, that's why I made it.

Hey, some of us actually use trailing whitespace! :-)

I use it to create useful indentation guides in Komodo. If the whitespace is stripped away, the indentation guides have gaps where there's a blank line in an indented block.

Maybe Komodo could use a different method to decide where to draw the lines that didn't depend on trailing whitespace. It looks like Sublime Text 2 has a different approach for its indentation guides - maybe the Komodo guys should look at that. But in the meantime I'm using Komodo as it actually works today, so the whitespace on blank lines is important. Let me keep it, please? :-)

I wouldn't mind stripping out trailing whitespace on nonblank lines - that wouldn't affect my precious indentation guides.

But wait a minute, what about Markdown? Two spaces at the end of a line to get a <br>, right? Does the bot skip Markdown files?

Finally, for the folks who have automatic whitespace removal in their editor settings... Please be careful: With this setting, you'll be very likely to make a commit that includes both significant code changes and a mass of whitespace changes in the came commit.

Those kinds of changes should be separated: one commit for the code itself, and a separate commit for the whitespace with a comment like "Whitespace cleanup, no code changes."

This allows people who diff the revision history to diff with whitespace significant most of the time, the only exception being when reviewing a whitespace-only change.

(Edited for friendlier tone...)

From a quick look at the source, WhitespaceBot already excludes Markdown files:

    banned = ['.git', '.py', '.yaml', '.patch', '.hs', '.occ', '.md', '.markdown', '.mdown']

If you create a JS macro that is triggered on file open, with the contents "komodo.view.scimoz.indentationGuides = komodo.view.scimoz.SC_IV_LOOKBOTH", it should give you the indentation guides without the whitespace actually there. http://www.scintilla.org/ScintillaDoc.html#SCI_SETINDENTATIO... has some explanation of the possible values.

Is there any equivalent of the robots.txt standard for public code repositories? Being able to opt-in to certain bots might be helpful (opt-out being the default, of course).

That's a really cool suggestion and I would love to see some sudo-standard on this.

Ah, aggressive trailing whitespace removal. That I can completely get behind. I've already got command-s bound to a custom macro that strips trailing whitespace in TextMate for myself and my co-workers; but this would be an even more inclusive solution.

Fantastic. If you use vim, you should have this in your .vimrc:

" Remove any trailing whitespace that is in the file

autocmd BufRead,BufWrite * if ! &bin | silent! %s/\s\+$//ge | endif

I prefer using the vim-trailing-whitespace plugin and fixing it manually: https://github.com/bronson/vim-trailing-whitespace

Or you can use my competing plugin: https://github.com/bitc/vim-bad-whitespace

which has some advantages (described in the README)

One advantage taken from the readme:

  This plugin is better than using the builtin vim 'list' command because it
  doesn't show an annoying highlight while you are typing in insert mode at the
  end of a line.

Or, if you prefer a more manual approach (ie. my fellow paranoid developers) simply use `list` and `listchars`:

    set list
    set listchars=trail:•
In other people's code, use `set nolist` to prevent hyperventilation.

In general, explore `help list`.

Uhm. Why is this annoying, other than the fact that it shows up in your git commits?

If you use Vim from the terminal to edit text (which I do), trailing whitespace shows up as big white blocks. It's slightly visually distracting, but it's really just an OCD thing.


Absence or presence of trailing whitespaces is sometime significant, such as in big ereg in re.VERBOSE mode in Python.

Therefore, it must be visible, and if it is visible it must be removed when it has no use.

I prefer to have my editor strip that whenever I save a file without any manual action.

I should clarify that command-s is the save command. My macro overrides the standard behavior.

in my 10-years old project, removing all trailing whitespace would produce a 1000+ lines commit, and make many contributors' lives harder.

so, thanks, but I'll keep my whitespace.

Perhaps it would be more productive to advocate the use of local pre-commit hooks. Git makes it very easy to configure validation locally long before anything gets sent to Github.

Would be nice if Github provided better documentation and a selection of validation templates to include in new projects. This would better leverage the power of Git and its distributed nature than a bot running on Github.

I'll look for more on this, but if you had something you would recommend as a tutorial, I'd appreciate it.

Didn't have a specific example in mind; there is definitely opportunity for Github to help with education and adoption.

For some reference material check out:

And a couple simple examples:


I actually just replied with a suggestion that Github should implement some built-in functionality for simple error-checking. I think it's a great idea. It would be really helpful if you got a simple little list of notifications for a commit indicating possible error points.

avoid feature creep! github's platform allows for bots to run, a bot is a perfect way to implement this instead of it being "built-in".

So basically something like Krazy does with KDE: http://ebn.kde.org/krazy/

I thought this is actually interesting but I would like to know if >4k misspellings is a lot or not. Here is one way to do it:

    JavaScript 4252 2907459 = 0.0015
    C 18981 2902857 = 0.0065
    Java 7706 2348900 = 0.0033
    Ruby 10789 1690604 = 0.0064
    C++ 9458 1315552 = 0.0072
    PHP 3116 1167924 = 0.0027
    C# 1352 937647 = 0.0014
    Python 3662 737292 = 0.0050
    Ruby 1232 380484 = 0.0032
    Perl 1239 258892 = 0.0048
    Objective-C 679 238051 = 0.0029
P.S. There is something wrong with Github's language breakdown algorithm, sometimes it shows same language twice with a different number of hits.

I noticed another problem is that the highlighter will select the language name as well as the term which means <?php and (c) Copyright are shown instead of the actual mistake.

I put together a GitHub Illiteracy Index script https://github.com/jond3k/sandbox/tree/master/github-illiter... which you can play around with if you like :D

C# I can understand being so low, since it's almost always written in Visual Studio or MonoDevelop (both of which provide autocompletion). But how is JavaScript the next lowest?

Hmm, I guess it's because length is a commonly used function in Javascript, but not in other languages. In Python you do len(list) instead, so the word length is more likely to appear in comments and therefore less likely to be corrected.

Because the built-in .length property is frequently used, and will fail if misspelled.

Whereas C and C++ programs tend to have a lenght operator implemented by the programmer, and from there the error gets snowballed by IDEs and debuggers.

C# is also a compiled language, so you wouldn't be able to get anything to run with a typo like that hanging around. I find it surprising that there are so many commits making that mistake!

Being a compiled language isn't sufficient to prevent "no method" errors. It's completely possible for a compiled language to define methods at runtime or use duck typing.

It shows up in comments and variable names frequently.

Odd that compiled languages (C, C++, Java) are higher than some interpreted languages (PHP, Javascript). Of course, the search will match comments as well as code, so it may just mean they have better comments.

Also fun to search on "functino".

I can't think of anything in C or C++ that uses length off the top of my head. Size and Len, sure, but nothing that's length.

I would guess that most of the spelling errors get propagated through autocomplete. That's how most of the spelling errors in my code get there anyway.

    C#            1352     937647    = 0.0014   <- best
    JavaScript    4252    2907459    = 0.0015
    PHP           3116    1167924    = 0.0027
    Objective-C    679     238051    = 0.0029
    Ruby          1232     380484    = 0.0032
    Java          7706    2348900    = 0.0033
    Perl          1239     258892    = 0.0048
    Python        3662     737292    = 0.0050
    Ruby         10789    1690604    = 0.0064
    C            18981    2902857    = 0.0065
    C++           9458    1315552    = 0.0072   <- worst

For the record, Github's search index is wayyy out of date sometimes. The second user here is me and I deleted that user like two years ago: http://cl.ly/0y271f0T3G0X2J1L022E

Same here, I contacted support two times in two years and they said "we're working on it". Obviously, that isn't true and they just don't care about the outdated search index.

Your latter statement is unequivocally false, for what it's worth. On both counts.

Perhaps some info about the progress being made would be more helpful as elk3 has mentioned contacting support over the past 2 years.

'wtf' is a good search term when coming into contact with a new codebase; https://github.com/search?type=Code&language=JavaScript&...

  #  Language    Illiteracy
  1  C           0.02877583  
  2  Perl        0.01635618  
  3  Ruby        0.01560477  
  4  JavaScript  0.01330989  
  5  Shell       0.01235425  
  6  Python      0.01046104  
  7  PHP         0.00910218  
  8  Java        0.00736395  
(For height, length and hierarchy, averaged out)

And you thought this would end up being a PHP joke...


not an attribute of a standard object type, though.

Neither is length in many languages.

But length is in Javascript. Both String and Array have that property.

Which is why I said many and not all.

Even the search has a bug. The query is for ".lenght" but many of the highlighted results are just lenght without the dot.

Prob a reg exp so matches any char...

But then shouldn't the highlighted bit include the char in front? It's also case insensitive. I think it's trying to be clever.

More likely is form/input validation.

A common typo, it seems. But I'm a bit confused as to why this was submitted.

In JavaScript, if you check for a non-existent property on a variable (e.g. aVar.lenght vs aVar.length) it will return "undefined". So people often rely on this behaviour to check if something is an array or not (no comment on whether this is good or bad), with:

        // do things with array
So misspelling of length can be making a lot of code out there behave in an unexpected way.

The same pattern is widely used to test whether an array-like object is empty. Since a length of 0 is also "falsey", it evaluates as false when the array has no elements. A typo in this case would result in the tested array always being "empty".

That is a bad thing as String also have a length property

"bar".length; // 3

In a static language this would be flagged as an error. I assume something less than ideal happens in languages such as Ruby.

I once worked at a company where a very early piece of code had a typo "properites" instead of "properties". This misspelling became institutionalized, and was used throughout the codebase because it was deemed too expensive to fix. And this was with a static language (with good IDE refactoring support)!

In ruby, and I think most dynamic languages, this type of typo is likely to raise an exception. It could hurt, but a simple test run is likely to discover it.

The way javascript (which is what is linked) handles this, as amirhhz described it, leads to silent errors which could turn out a lot worse.

Yes, but it would raise the exception at run-time, and only when the particular path is taken.

There's actually no exception, it just returns "undefined" and the if statement fails. That's why it's such a deadly bug -- no exception, and path dependent. Combine that with the async nature of JS and it's going to be a long night tracking that one down

There are ways to raise this sort of error even in static languages: objc_sendMessage comes to mind

I'm confused as to why they couldn't simply:

  grep -R properites .

The whole software infrastructure was a scary house of cards. They were afraid that there was unknown code that might be depending on it. For example RESTful services in other departments that were not under our immediate control.

Perhaps the same reason why "referer" has not been corrected to "referrer".

Yes, it was exactly like that- lots of code had grown around the "bug", and it was not immediately obvious what other software had come to depend upon it. "Little hairs", as Joel might say.

Obviously you have never worked on a code base with 1000s of developers. If you edit almost every file then basically everyone needs to stop writing new code while the change is made. Otherwise the merges others have to do is going to be a disaster.

Honestly, do you really work on the same code base with "1000s of developers"? I find it really dubious.

Yes. It's called Microsoft.

Well, then problems with such process are pretty well documented[1]. For comparison, at Google global refactorings are pretty common and usually painless, there are even custom tools to support such changes (push them through code review, ensure no tests are broken etc.)

[1] http://moishelettvin.blogspot.com/2006/11/windows-shutdown-c...

I know the Microsoft process all too painfully. RI,FI,RI,FI,RI,FI,RI,RC,RTM,Ship It,Repeat.

But as to the "usually painless" at Google. So when does that pain happen?

Can you take me through the following scenarios: change a variable name, change a base class name that lots of people extend from, file renames?

How do you go about refactoring? Do everything at once? Breaking it into pieces? Do file rename then variable and base class renames? Or smallest piece at a time?

Once the refactoring is complete how do you communicate to others the changes so when they merge the code in they don't get too messed up? Or worse undo something in the refactoring. (Also follow up is it better to do the big refactoring so there is the one big merge or a bunch of little refactorings and lots of little merges across the spectrum).

I guess the code change isn't the problem. It's making a big change and getting people on the same page is much harder. Especially when their are varying degrees of skill and experience on a project. And it's this stuff that is painful and leads to not wanting to do big refactorings at a lot of shops.

Hacker News has a short attention span and this probably won't be seen by many, but I'll try to answer your question nevertheless. There are several factors I'd like to mention.

1. Most importantly, the version control head is always the point of reference, and the burden of merging is on people who keep long-lived pending changes. This means that conflicts are resolved as soon as possible by a person who actually knows the context, instead of being postponed until a dreaded merge window. Ultimately, a programmer pursuing refactoring is only responsible for making sure it works on the head, and should announce the change so that others are prepared for merging.

2. There are some huge code bases at Google, but nowhere near the size of Windows. On the other hand, I'm sure that even Windows has to be separated into more or less decoupled components. When I doubted that you work on the same code with thousands of other programmers I was thinking in terms of components, not final products.

3. Cultural aspect shouldn't be disregarded. Code hygiene is encouraged at Google, and some people volunteering their 20% time to help with that. Moreover, there are some custom tools that make global refactorings much easier and safer.

Hope that was helpful.

I am not the average HNer... Thanks.

Perhaps they were using . . . something other than *nix.

EDIT: Justly downvoted <strikeout>chastised</strikeout> for attempting humor without understanding.

That command works nicely for me in Cygwin.

EDIT: Eh, apologies if this sounded like chastising -- I didn't mean to. As a developer who's been trapped in "Windows-mindset" for many years, I wanted to try to inspire other Windows devs to try to use *nix-based solutions even if their only option is Windows development. Cygwin is in a very good spot right now -- it's achieved so much acceptance that even the most hardened institutions now allow it to be installed.

No snarkier than my remark, and less snarky than it might have been for being so much smarter.

I had this problem as a junior dev when my english was weaker. The problem stems from that 'height' is spelled with 'ht', but width with 'th'. Since one often write those words in conjunction, it is easy to mix the endings up. If you're then a non-native speaker and don't run spellcheck on your code, you might end up writing 'lenght' and 'heigth' quite a few times, I know I did :)

My experience is more with languages that are typically compiled and would report this error as an error fairly early on, so the coder would correct it long before checking the code in.

What's the trade-off by having "undefined" returned instead of having an error reported as soon as the code is loaded?

It prevents you from later defining a 'lenght' method and using it at runtime without a recompile.

For core methods like 'length', it seems silly to think that you'd want to redefine it. And indeed, it's usually counterproductive - that's why any experienced JavaScript dev will have coding conventions like "Don't muck with the prototypes of built-in objects."

But at the application layer, this can be really useful. Imagine you're adding a new field to a message deep in the storage system, and then you want to pass that along to a template in the rendered HTML. It's really useful to be able to do this without recompiling & restarting each individual server between the backend and the frontend, and just edit a few template files and have them automatically pick up any changes to backend data formats.

Ditto adding a new database column, if you're using an RDBMS - it's pretty handy to have your model objects instantly reflect the new field, instead of needing to manually add accessors to each of your model classes. Rails and Django are built on this principle.

Also, you have a versioning problem with statically-compiled code in a distributed system. Imagine that you add this new 'lenght' field to a backend message, and add it to the frontend, and they both compile & deploy. Now imagine that a message from an old backend hits a new frontend (it's not possible to upgrade a whole distributed system at once without downtime). What does the new frontend do with it? It needs a piece of data, but the backend had no idea that it had to provide that piece of data. The only thing it can do is return the equivalent of 'undefined'.

In C++/Java code, you usually deal with these by inventing frameworks. Google code, for example, is littered with

  if (msg.has_new_field()) {
  } else {
checks. If you use a more dynamic language like Python, you can use language mechanisms to represent undefined values or fields that are defined at runtime. If you use a static language, you're stuck mimicking them with hashmaps and null.

Whether your language is compiled is not the issue, it's how you model objects and calling methods on them. In smalltalk and other languages that take a message passing approach doing a.b() sends a message "b" to object a, and the object can do anything it likes with that.

Now the normal (and optimized) route is to find the method on a’s method table and then call that, but if a doesn't have that method then a second method may be called to allow this to be handled. Once you have that sort of mechanism you can make ORM libraries that dynamically examine a schematic at run time and generate accessor methods only as they are needed, decorators, proxies and many other patterns become wonderfully simple, and there are often many more opportunities for meta-programming at run time.

The downside is of course that it becomes harder to find errors when writing or compiling, but tight integration of your development environment with your runtime can help with this.

It should be possible to build a bot that automatically generates patches and pull requests for these kinds of typos.

Someone wrote a spellchecker a while ago using perl spellchecker: http://blog.holdenkarau.com/2011/08/automatic-spelling-corre...

105395 results for heigth, now beat that ;)

And this is why we have testing frameworks.

... and compiled languages. Testing won't ensure 100% code coverage.

... and nice things like https://github.com/scrooloose/syntastic for vim (or your editor of choice).

And static analysis.

Recieve. Has to be my number one pet pieve.


This reminds me of a US company I worked with that outsourced some of their service layer work to a company with heavy European influence. As a result, API methods also had the spelling of certain words eg. getColour() or getFavourites(). Good times.

In the LaTex editor that I'm using (WinEdt), I have a custom color highlighting that marks \rigth and \heigth in red+bold+strikeout, so I don't have to wait to compile and see a strange error to spot the mistake.

It'd be great if Github would scan your code for errors like these and just let you know they exist (in case you didn't want them to, which I would assume you wouldn't for the most part).

For some reason in video game source code I see the word 'hierarchy' in comments spelt wrong a lot in every project I've been on.

Is there any context to this or are you just pointing the humor of out how common this is?

Well it is basically a list of bugs -- and a rather long one too.

Of course there are rare cases where "lenght" is a variable and that name is used in every instance but mostly, these are bugs in code that we all use.

most of them are variable names which is acceptable!

Not at all. It's irritating and confusing.

Come back to that code in a year and try to extend it, stuff will break because you start to use the correct name.


The first results are all comments and spellcheckers.

It's not, because reutrn is a syntax error, but .lenght is valid and would return `undefined`.

Yes it is, it would work fine as a variable. E.g.

  var lenght = 23;
will not cause any troubles. And many of the results returned by the search are of this kind.

Were you replying to me? I explicitly said .lenght would return undefined. As in, a typo on the .length property.

Sorry -- I figure I misunderstood you.

I just tried "heigth", it's almost as bad.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact