Hacker News new | comments | ask | show | jobs | submit login
Analysing C# code on GitHub with BigQuery (mattwarren.org)
109 points by matthewwarren on Oct 13, 2017 | hide | past | web | favorite | 85 comments

1. ArgumentNullException, 2. ArgumentException and 5. ArgumentOutOfRangeException shows how badly C# needs more ability to specify/enforce contracts.

It's awesome that C# 8 will address the number 1 problem, with non nullable references but number 2 won't be addressed until records post C# 8 by the looks of things and number 5 not until exhaustive pattern matching which doesn't look to be on the cards at all AFAIK.

Seriously. Contracts. I inherited a relatively substantial C# WCF business line app with a large number of classes each with many nullable properties. The null handling was awful, but there were too many classes to easily refactor to apply data contract attributes across the board (at least given that I am the sole available resource to do so). Instead I tried selectively applying the null object pattern to some of the most common troublemakers. It's improved things but god I wish the original dev had used data contract attributes from the beginning.

Similarly, it's staggering the number of times I've seen someone wrap their entire method (or worse yet their entire program) in empty try-catch blocks just to avoid responsibly handling null and out of range exceptions. To me, it seems like the ubiquity of this antipattern indicates that there's something inadequate about the language, tooling, or developer culture. Or some combination thereof.

I'm not enough of an expert to say what the problem is or how to fix it, but in my relatively limited experience I've found requiring data contracts up front smooths over a lot of the pain points. But it comes at a cost. I've never used an equivalent in any other language, so I don't how C# and .NET compare, but it does seem like a lot of boilerplate code that can limit decoupling if misapplied.

Well this is the problem Option classes are meant to solve but it's pretty tough to make that work without breaking all the old code where any reference value can be null

Good point, yeah all the instances of 'ArgumentXXXException' points to there needing to be better language/tooling support for it.

There were the below two items [0], [1] which are runtime and tooling pieces. I've used them years ago in side projects and they were helpful but haven't kept up with them recently. They both help find or handle argument validation issues as enumerated above.

[0] https://docs.microsoft.com/en-us/dotnet/framework/debug-trac...

[1] https://www.microsoft.com/en-us/research/project/pex-and-mol...

C# developers overwhelmingly prefer putting an opening brace { on it’s own line (query used)

Putting it on it's own line seems like a waste of space to me and doesn't contribute to the readability.

If I am not mistaken, Visual Studio automatically places an opening brace on the next line. When setting up coding standards, I asked my team not to fight Visual Studio. It generally has good auto-formatting functionality, but even more important is the uniformity and consistency of the code.

Opening brace on a new line is the default, but VS has excellent granular settings in this area. I was tired of having different standards for C and C#, so at one point we switched our defaults and ran a utility to convert all our C# code. We use a new line for namespaces, classes, and methods and the same line for everything else. VS handles it flawlessly and we haven’t looked back. Thinking back, I have not seen any inconsistencies committed in that time. VS autoformat is that good, with braces and white space.

> a new line for namespaces, classes, and methods and the same line for everything else

accidentally, we arrived to the same code style for both C# and C++...

Probably accidental here, too, in that the C code was older and larger. There was no good answer for the question of why C# was different, so the change was made.

The only downside would be if I worked on someone else’s project with different standards. I don’t think VS can figure out autoformat at the project level, like, say, python mode in emacs.

Have you looked at editorconfig?

I hadn't seen it before, but that looks great. It has such a wide variety of plugins. Thanks for the tip.

Are we really going to derail this thread to argue about this? Everyone has their preferences but the most important thing is just picking a style and this is the one VS defaults to.

I used to agree, but since adopting the Allman style have found that in many cases it's very helpful in making code readable (especially if using an editor which doesn't provide any auto-format options).

That said, some languages are impacted by style choice; e.g. JavaScript. Given many of us have to develop in multiple languages, it would be great to select a style supported by all languages (better would be if all languages supported all styles).


To waste something there needs to be a finite amount of it in the first place.

Considering the large number of coding standards that mandate the placement on a new line, tossing out readability because it doesn't matter to you seems short sighted.

There is a finite amount of vertical space on one's screen, and thus a finite number of lines of code that can be displayed simultaneously without scrolling. Reducing the amount of scrolling that needs to be done to view the entire body of a method can itself be an improvement to readability.

I would argue that fitting the most lines on the screen as possible is not a noble goal. Whitespace is very important for readability.

I find it easier to visually match the start and end of the blocks that the brace pair defines. Having said that, my team uses the opening brace on same line style and that's fine too. Since I use an IDE, it doesn't matter much either way because It provides great features like block folding and auto highlighting of a matching brace.

Don't understand why we spend so much time arguing about such low stakes stuff as this.

Beauty is in the eye of the beholder. For example I used to put the opening brace on a line of its own, mostly for simmetry reasons ( `{` in the same column as the relevant `}` ).

By the way, after many years I had to change it for coding style reasons. Now when I look back to old code and I see that lonely `{` my reaction is "ewww", so I guess it is also a matter of habit.

I know this is a matter of preference and habit, but I also cringe when I see this style, makes it less readable for me as well.

It’s the Visual Studio default. It definitely improves readability for me sometimes. Having empty lines makes it easier to read.

Perhaps it makes copy/paste easier? I have a tendency to put the '[' of an array on a separate line for that reason, for example.

For me it does.

Interesting to see 'var' isn't used as much as I thought it would be.

Yeah, I thought the same.

I now realise the better test of 'var' would be to compare it to how many times it's not used. That is count the times that 'var myClass = new MyClass()' is used v 'MyClass myClass = new MyClass()'. But my regex skills aren't good enough for that and I'm not even sure if you can do it purely in a regex?

Also I imagine that 'var someNumber = 1' is used less that 'int someNumber = 1', but again, I'm not sure how to do the regex? I certainly only use 'var' when the same word is on both sides of the '=', so not in the case of int/double/decimal etc.

I still use var in that scenario, because if you need different types you have to suffix numbers with an identifier anyway:

    var i = 1; //int
    var j = 1f; //float
    var k = 1m; //decimal
    var l = ""; //string
    var m = ''; //char

There are a few more for numeric types:

  var integer = 1;
  var long = 1l;
  var float = 1f;
  var double = 1d;
  var decimal = 1m;
  var unsignedInteger = 1u;
  var unsignedLong = 1ul;

Yeah, I know you can do that, but I can never remember which suffix is which ;-)

Back when I used to work at a large enterprise the use of 'var' was banned in our coding standards.

Using Linq was heavily frowned upon. It took quite a few years before it started to be accepted..

At one point we were also strongly encouraged to place a #region around every single method! I never understood why, and luckily that one did get dropped.

What's the point of not using stable language features? I don't get it.

I've (unfortunately) had to work with this rule before. The justification was that it makes the code more readable.

I can understand banning SQL style linq and allowing only functional style, but I guess that's more of a personal preference.

I used to work in a company with a similar policy. Never got a satisfactory answer, I think there's just lots of FUD around any language feature that looks different than what people are used to.

Neither did I.

goto is a stable language feature.

While it shouldn’t be banned, it should be used appropriately. We have some areas of our project that communicate with hardware where we cannot use it because of the performance hit.

Why on earth? It is resolved by the compiler, there is absolutely no performance difference between using var and explicitly specifying the type.

undoubtedly that person was talking about LINQ and not var usage

Undoubtedly he didn't read properly then. The banned feature was var, linq was just heavily frowned upon.

Or possibly confusing "var" and "dynamic"?

Yes, LINQ not var.

Well, that on the other hand does indeed make sense, depending on the context. The only question there could be if it's a good idea to use a garbage collected language to directly[1] interact with hardware.

[1]: Well, at least directly enough for LINQ performance to be a problem.

We're not real time, but we do have some time constraints.

Yes, all this was kicked around and argued about.

Yeah, I agree. You can make some seriously shitty unreadable code using LINQ just to look smart. But that's what code reviews are there for.

I agree. That, and not using LINQ often enough doesn't resolve that problem per se. Rethinking the problem (or the way you attempt to solve it) might very well improve the code much more.

One of Kevlin Henney's talks references a really neat quote by Poul Anderson on this subject:

> I have yet to see any problem, however complicated, which, when you looked at it in the right way, did not become still more complicated.

Sometimes ReSharper makes some seriously wild, unreadable suggestions for converting code to Linq.

Yeah, but IMO after Roslyn I don't see the benefit of ReSharper, definitely not for the 100 dollar price per year.

That happened right when I started a new job so I haven't touched it that much. R# used to be absolutely essential though.

Are you talking about "var"? Because that shouldn't affect your performance, this generates equal IL code:

var x = 10;

int x = 10;

Yesterday a colleague wanted to compile _and deploy_ code. Checked it out. It .. failed to compile. Sure enough, he fixed it (and checked the change in) before deploying it (test env, thank god).

The code was generating some JSON using Linq and created JSON properties: new JsonObject(LinqExpressionHere)

The name for the property was defined as


which he helpfully changed to


which then compiled, but crashed in the test environment because of - surprise - duplicate attributes..

I just did this in my code! Ported a ton of PHP to C# and forgot to add $ to the front of strings. A compiler warning would be nice for this.

> At one point we were also strongly encouraged to place a #region around every single method! I never understood why, and luckily that one did get dropped


Especially since VS has plugins (or maybe it's built-in?) that will let you fold/hide individual methods, if you really want to do that!

FYI: It's built-in.

I use it religiously now, so much quicker to program with imo as you don't need to find out the return type. Surprised it's less used than async/await though, which I think MS massively overhypes the usefulness of (and is annoyingly making a lot of classes have async behaviour by default which can result in very weird random hangs).

Although I imagine a significant number of class declarations would contain no var or async/await simply because of their nature.

Interesting how 'css/style.cs' is number 10 on the size list. Is this just a style sheet with the wrong extension? I always assumed GitHub's language detection was based on syntax rather than just extension. Does anyone know if this is the case?

I don't know about what's displayed on GitHub.com, but the dataset I queried only looks for a '.cs' extension, so there's a chance that some non C# files got in. The dataset is here https://bigquery.cloud.google.com/table/fh-bigquery:github_e...

Fortunately, most of the queries I've done are aggregations or looking for C# syntax, so there's only a few places non C# code could get in (I already filter out binary files, which I noticed earlier)

By the way, GitHub's language detection is powered by their Linguist library [1].

[1] https://github.com/github/linguist

Thanks for the link, I've always wondered how GitHub does language detection

I'm surprised to see so little var, especially when ReSharper by default scolds you for not using it.

It's always nice to see BQ-based research like this!

Yeah BQ is basically magic, it's so impressive how much data is can process in 30 secs or less.

(I know under-the-hood it consists of some very impressive algorithms/hardware/programming etc, as explained in https://cloud.google.com/blog/big-data/2016/08/in-memory-que..., but it feels like magic!)

exactly. I also performed several researches back in days like this https://medium.com/@sAbakumoff/github-cancer-180db780d99d

Thanks for that list, I've seen a few of those, but quite a lot of new ones in there, reading my way through them now!

I'm glad it was helpful. I found this list (a very helpful list, with good posts!) during $DAY_JOB. Following your comment, I've shared this more widely here: https://news.ycombinator.com/item?id=15464921

And has excellent sql support, even before standard sql support was a thing.

HN navel-gazing, BQ style:

Top 100 Users on Hacker News by H-Index | https://news.ycombinator.com/item?id=11512455 (Apr 2016, 81 comments)

and other social media: https://github.com/antontarasenko/smq

> Finally, an interesting list is the top 10 using statements that aren’t System, Microsoft or Windows namespaces:

> using System.Web.Mvc;

But... that is a System namespace.

LOL, yeah I completely missed that one!!

I've just updated the post, 'log4net' makes in into the top 10 now.

using UnityEngine;

Is the second most used 'using' that's non-Microsoft (after Nunit). Shows how important Unity is for C# take up. And presumably most people using Unity to write games are not going to be sticking their code on GitHub either so this must be a fraction of the real use.

IT's important to open source C# take-up, but I'd wager the vast majority of C# usage is closed-source shops.

c# just isn't favoured by hobbyists except for game-programming. But in business world (at least from my UK perspective) where nothing comes close to github, c# still seems as healthy as ever and that doesn't depend on unity at all.

Oh yes for sure, but that's always been there. .NET has always been used widely for business internal stuff and that's what kept it alive. But the recent C# renaissance seems to be about app building stuff like Xamarin or Unity.

I’m really confused why you think #regions are terrible. I use them generously to collapse sections, and they objectively make code more organized. I would not be as productive without them.

It always disappoints me as a C# dev is the lack of top level application products like WordPress and friends. All the top lists in Matt's post are either infrastructure or development tools.

Iguess it's the pre-compilation requirement. Might change with the new version, but I think the inertia was lost for the language long ago.

Here in the UK C# seems a lot more popular than Java, but I must admt I'm getting more and more js jobs these days, which fills me with a slight dread for the future as I really dislike programming in it.

I think Unity has helped C# become more popular for the masses again.

I am not worried about C#. It just lacks open source applications. There are tons of poster boy closed source applications :)

Hmm, yeah that's a good point. I'd not looked at the individual repos too much, but you're right, there's not many (popular) products written in C#.

However, I guess my lists are limited by what is on GitHub, Paint.NET springs to mind, but it's not open source.

Yeah, paint.net also crossed my mind. Would not even show up. Too "small".

I suppose its because people would have had to host a C# Wordpress on Windows servers instead of nice cheap Linux ones. Maybe that will change going forwards with .NET Core etc

And not just the expensive windows server but for a long time people considered windows builds of postgres / mysql not production-ready (How much of that was fud and how much truth I do not know), so typically any DB solution is Sql-Server which is another significant expense.

They still do.

We have customers running Linux instances just for Postgres.

I’ve always assumed that’s because C# used to not be cross platform (yes, there’s Mono, but that isn’t as popular as it should be)

It's not as application-heavy as a lot of other platforms, but there are definitely some great applications built on top of .NET.

If you're looking for a CMS, you'd be hard-pressed to find a better open-source CMS that runs on ASP.NET than Umbraco.

Not looking :)

I completely agree, it is disapointing. On the plus side, I'm hopeful that now that .net (dot net core at least) is OSS, cross platform and maturing things will start changing for the better.

Surprised there wasn't more Class1.cs

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact