>The Valid method takes a context (which is optional but has been useful for me in the past) and returns a map. If there is a problem with a field, its name is used as the key, and a human-readable explanation of the issue is set as the value.
I used to do this, but ever since reading Lexi Lambda's "Parse, Don't Validate," [0] I've found validators to be much more error-prone than leveraging Go's built-in type checker.
For example, imagine you wanted to defend against the user picking an illegal username. Like you want to make sure the user can't ever specify a username with angle brackets in it.
With the Validator approach, you have to remember to call the validator on 100% of code paths where the username value comes from an untrusted source.
Instead of using a validator, you can do this:
type Username struct {
value string
}
func NewUsername(username string) (Username, error) {
// Validate the username adheres to our schema.
...
return Username{username}
}
That guarantees that you can never forget to validate the username through any codepath. If you have a Username object, you know that it was validated because there was no other way to create the object.
I've also seen it called primitive obsession, which is also applicable to other primitive types like using an integer in situations where an enum would be better.
Definitely use to fall for primitive obsession. It seemed so silly to wrap objects in an intermediary type.
After playing with Rust, I changed my tune. The type system just forces you into the correct path, that a lot of code became boring because you no longer had to second guess what-if scenarios.
> Definitely use to fall for primitive obsession. It seemed so silly to wrap objects in an intermediary type.
A lot of languages certainly don't make it easy. You shouldn't have to make a Username struct/class with a string field to have a typed username. You should be able to declare a type Username which is just a string under the hood, but with different associated functions.
We use this pattern extensively in a large Java app. As long as you establish these patterns early on in the project, the team adapts to the conventions. It's worked well for us and the lack of language support doesn't get in the way much.
Yeah, modern type systems are game changers. I've soured on Rust, but if Go had the full Ocaml type system with match statements I think it would be the perfect language.
This term is typically used to refer to things like data structures and numerical values all being passed as strings. I don't think a reasonable person would consider storing a username in a string to be "stringly typed".
It definitely is stringly typed. It's just that it's a very normalized example of it, that people don't think of as being an antipattern.
If you want to implement what Yaron Minsky described as "make illegal states unrepresentable", then you use a username type, not a string. That rules out multiple entire classes of illegal states.
If you do that, then when you compile your program, the typechecker can provide a much stronger correctness proof, for more properties. It allows you to do "static debugging" effectively, where you debug your code before it ever even runs.
I don’t get what you’re about. The root comment clearly presents a structure of a separate type. The fact that it happens to contain a single string field is completely irrelevant (what type an actual username should be, a float?). “Stringly typed” is about stringifying non-string values to save typing work and is not applicable here in the slightest.
I wasn't replying to the root comment, I was replying in the context of the subsequent three comments, specifically:
> > > Crazy that actually using your type system leads to better code.
> > There's a name for this anti-pattern: "Stringly typed"
> I don't think a reasonable person would consider storing a username in a string to be "stringly typed".
#1 was saying that the root comment shows better code using the type system.
#2 was clearly referring to the case where you don't do this as being an anti-pattern.
#3 is saying that storing a username in a string, without wrapping defining a distinct type for it, was not stringly typed. But as I pointed out, it certainly is.
If you doubt my interpretation of #3, the same commenter said this in another comment: "Is it really more 'programmer friendly' to create wrapper types for individual strings all over your codebase?"
The One True Wiki[0] says "Used to describe an implementation that needlessly relies on strings when programmer & refactor friendly options are available."
Which is exactly what's going on here. A username has a string as a payload, but that payload has restrictions (not every string will do) and methods which expect a username should get a username, not any old string.
I don't agree that this example is more "programmer friendly". Anything you want to do with the username other than null check and passing an argument is going to be based directly on the string representation. Insert into a database? String. Display in a UI? String. Compare? String comparison. Sort? String sort. Is it really more "programmer friendly" to create wrapper types for individual strings all over your codebase that need to have passthrough methods for all the common string methods? One could argue that it's worth the tradeoff but this C2 definition is far from helpful in setting a clear boundary.
Meanwhile the real world usages of this term I've seen in the past have all been things like enums as strings, lists as strings, numbers as strings, etc... Not arbitrary textual inputs from the user.
You inherit some code. Is that string a username or a phone number? Who knows. Someone accidentally swapped two parameter values. Now the phone number is a username and you’ve got a headache of trying to figure out what’s wrong.
By having stronger types this won’t come up as a problem. You don’t have to rely on having the best programmers in the world that never make mistakes (tm) to be on your team and instead rely on the computer making guard rails for you so you can’t screw up minor things like that.
I agree on the one hand but empirically I don’t think I have seen a bug where the problem was the string for X ended up being used as Y. Probably because the variable/field names do enough heavy lifting. But if your language makes it easy to wrap I say why not. It might aid readability and maybe avoid a bug.
I would probably type to the level of Url, Email, Name but not PersonProfileTwitterLink.
I’ve refactored a large js code base into ts. Found one such bug for every ~2kloc. The obvious ones are found quickly in untyped code, the problem is in rare cases where you e.g. check truthiness on something that ends up always true.
Of those bugs I wondered how much a type would help. For example is it a misunderstanding of business requirements (nosurcharge bool = iscash
bool) or a “typo” / copy paste error. If the former types don’t help. The latter they might.
It definitely helps in larger applications where things are named similarly, especially if you're dealing with a massive DB schema. If you have some method to update some data where all the PRs are longs and it has the signature "update(long,long)", passing the wrong long value would be disastrous. Even if this type of error is 1:10k LOC, using wrapper classes pretty much eliminates this bug.
In our codebase, we use wrapper classes and the only time we had a defect with this is when one developer got lazy and used 3 primitive Strings in a class instead of wrapper classes. Another developer needed to update the code and populated the wrong wrapper class as they were not as familiar with that part of the codebase. Had the original developer simply used wrapper classes, the person maintaining the code wouldn't have had that confusion.
> Is it really more "programmer friendly" to create wrapper types for individual strings all over your codebase that need to have passthrough methods for all the common string methods?
That can be handled transparently in languages that have good support for strong type systems, like Rust or Haskell, using traits or type classes.
What you're saying is essentially that addressing stringly typing can only be taken so far in weakly typed languages, without becoming inconvenient.
> Meanwhile the real world usages of this term I've seen in the past have all been things like enums as strings, lists as strings, numbers as strings, etc... Not arbitrary textual inputs from the user.
The definitional question is not that interesting. The point is that the concept applies just as much to a username represented as a string as it does to any other kind of value being represented as a string.
The reason is simple, which is just that "string" is a general type that can represent anything, whereas "username" is a subset of all possible strings. If you're trying to use your type system to ensure correct code, you want to be able to type check a function signature like `f(user, company, motto)`, just to take a simple example.
I originally typed out `int` and wanted to do more, but I try to keep my comments as targeted as possible to avoid the common reply pattern of derailing a topic by commenting on the smallest and least important part of it. If I type `string`, `int`, `arrays`, `maps`, `enums`... someone will write 3 paragraphs about enums are actually an adequate usage of the type system, and everyone will focus on that instead of the overarching message.
Types limit you from making some mistakes, but it also impacts your extensibility. Imagine an enum with 4 values and you want to add 1 because 10 level deep one of the services need new value. How does it usually go with strongly typed languages? You go and update all services until new value is properly propagated to lowest level who actually needs that value.
Now imagine doing same with strings, you can validate at the lowest level, upper levels just pass value as it is. If upper layers have conditionals based on value, they still can limit their logic to those values
You only need to update the parser and the places that are using it. Depending on language, the parser might update itself (Scala generally works this way). Everyone else has an already parsed value that they're just passing around. That's the point: only run your validation at the outer layer of your application.
This is a good design pattern, but be wary of doing validation too early. The design pattern allows you to do it as early or late as you like, but doesn't tell you when to do it. Often it's best to do it as part of parsing/validating some larger object.
See Steven Witten's "I is for Intent" [1] for some ideas about the use of unvalidated data in a UI context.
I read through that piece and strongly disagree with the premise that their insight is somehow at odds with leaning into the type system for correctness.
The legitimate insight that they have is that anchoring the state as close as possible to the user input is valuable—I think that that is a great insight with a lot of good applications.
However, there's nothing that says you can't take that user-centric state and put it in a strongly typed data structure as soon as possible, with a set of clearly defined and well-typed transitions mapping the user-centric state to the derived states.
A text file and an abstract syntax tree can both be rigorously represented using types, but one is before parsing and other is after parsing. The question is which one is more suitable for editing?
Text has more possible states than the equivalent AST, many of which are useful when you haven't typed in all the code yet. Incomplete code usually doesn't parse.
This suggests that drafts should be represented as text, not an AST.
And maybe similarly for drafts of other things? Drafts will have some representation that follows some rules, but maybe they shouldn't have to follow all the rules. You may still want to save drafts and collaborate on them even though they break some rules.
In a system that's not an editor, though, maybe it makes sense to validate early. For a command-line utility, the editor is external, provided by the environment (a shell or the editor for a shell script) so you don't need to be concerned with that.
I’ve found it hard to apply this pattern in Go since, if Username is embedded in a struct, and you forget to set it, you’ll get Username’s zero value, which may violate your constraints.
But if you then create a constructor / factory method for that struct, not setting it would trigger an error. But this is one of the problem with Go and other languages that have nil or no "you have to set this" built into their type system: it relies on people's self-discipline, checked by the author, reviewer, and unit test, and ensuring there's not a problem like you describe takes up a lot of diligence.
It only relies on unit tests. The people can fail all day long and the unit tests will catch it every single time. Not special unit tests that attempt to seek out such issues, the same unit tests you are writing in languages that have a stricter type system.
If you forget to initialize a field and the tests don't notice, you didn't need the field in the first place, so it won't matter if it is left in an invalid state.
You just don't get the squiggly lines in your text editor. That's the tradeoff.
The pattern sounds nice in theory, but very cumbersome since now you have to obsessively ensure you have NewX calls everywhere or some form of "validated bool". In the end, you're just validating in a roundabout way and calling it "parsing".
I personally find being robust to errors and having clear error messages is the best option.
Don't focus so hard on getting things right, but rather dealing with things when they go wrong.
Because go doesn’t have exhaustiveness checking when initialising structs. Instead it encourages “make the zero value meaningful” which is not always possible nor desirable. I usually use a linter to catch this kind of problem https://github.com/GaijinEntertainment/go-exhaustruct
The issue is DRY often comes to wreck this sort of thing. Some devs will see "Hmm, Username is exactly the same as just a string so let's just use a string as Username is just added complexity".
I've tried it with constructs like `Data` and `ValidatedData` and it definitely works, but you do end up with duplicate fields between the two objects or worse an ever growing inheritance tree and fields unrelated to either object shared by both.
For example, consider data looking like
Data {
value string
}
and ValidatedData looking like
ValidatedData {
value int
}
There's a mighty temptation for some devs to want to apply DRY and zip these two things together. Unfortunately, that can really be messy on these sorts of type changes and the where of where validation needs to happen gets muddled.
Except Username is not exactly the same as string, and that's important. Username is a subset of string. If they were equivalent, we wouldn't need to parse/validate.
The often misinterpreted part of DRY is conflating "these are the same words, so they are the same", with "these are the same concept, so they are the same". A Username and a String are conceptually different.
DRY is just "Do not repeat yourself". And a LOT of devs take that literally. It's not "Do not repeat concepts" (which is what it SHOULD be but DRC isn't a fun acronym).
Unfortunately "This is the same character string" is all a DRY purist needs to start messing up the code base.
I honestly believe that "DRY" is an anti-pattern because of how often I see this exact behavior trotted out or espoused. It's a cargo cult thing to some devs.
That's why I like to tell people to always remember to stay MOIST - the Most Optimal is Implicitly the Simplest Thing.
When you add complexity to DRY out your code, you're adding a readability regression. DRY matters in very few context beyond readability, and simplicity and low cognitive load need to be in charge. Everything else you do code-style wise should be in service of those two things.
DRY has nothing to do with readability. The fact that it might help with it is purely coincidental.
DRY is about maintainability - if you repeat rules (behavior) around the system and someone comes along and changes it, how can you be sure it affected all the system coherently?
I've seen this in practice: we get a demand from the PO, a more recent hire goes to make the change, the use case of interest to the PO gets accepted. A week later we have a bug on production because a different code path is still relying on the old rule.
Maintainability and readability are two sides to the same coin. It's not exactly rocket science to cook up an example situation where making a change in one place is less maintainable than making it in two, because of overly DRY, overly abstracted nonsense leading to a _single_ place to change that's so far removed from where you'd expect it to be that it takes much longer and is much more wrought with risk than just having to do it twice.
Doing something twice is not an anathema, that's my point, not when doing it twice is a cognitively easier and practically faster task.
In almost every case, bugs are the result of human error, and keeping cognitive load as low as possible reduces the likelihood of human error in all cases. As DRY as possible is very rarely the lowest cognitive load possible.
In my experience (~20 years) with software development I developed the belief that people will go through the path of applying patterns, techniques, architectures, good practices, first as dogma, then to rejection, ending in acceptance of the knowledge that almost all of software development patterns/best practices are mostly good heuristics, which require experience to apply correctly and know when to break or bend the rules.
DRY applied as a dogma will eventually fail, because it's not a verified mathematical proof of infallible code, it's just a practice that gives good results inside its constraints, people just don't learn the constraints until it explodes in their faces a few times.
Like any wisdom, it's hard it will be received and understood without the rite of passage of experience.
Man I wish it was just jr devs. I cut jrs a ton of slack, they don't know any better. However, it's the seniors with the quick quips that are the biggest issue I run into. Or perhaps senior devs with jr mentalities
DRY vs premature optimisation is the landscape most long term devs find themselves in. You can say that FP, OO and a bunch of other paradigms affect this, but eventually you need to repeat yourself. The key is to determine when this happens without spending too much time determining when this happens.
One of the major issues with a lot of the outdated concepts in programming is that we still teach them to young people. I work a side gig as an external examiner for CS students. Especially in the early years they are taught the same OOP content that I was taught some decades ago, stuff that I haven’t used (also) for some decades. Because while a lot of the concepts may work well in theory, they never work out in a world where programmers have to write code on a Thursday afternoon after a terrible week.
It’s almost always better to repeat code. It’s obviously not something that is completely black and white, even if I prefer to never really do any form of inheritance or mutability, it’s not like I wouldn’t want you to create a “base” class with “created by” “updated by” and so on for your data classes and if you have some functions that do universal stuff for you and never change, then by all means use them in different places. But for the most part, repeating code will keep your code much cleaner. Maybe not today or the next month, but five years down the line nobody is going to want to touch that shared code which is now so complicated you may as well close your business before you let anyone touch it. Again, not because the theoretical concepts that lead to this are necessarily flawed, but because they require too much “correctness” to be useful.
Academia hasn’t really caught on though. I still grade first semester students who have the whole “Animal” -> “duck”, “dog”, “cat” or whatever they use into their heads as the “correct way” to do things. Similar to how they are often taught other processes than agile, but are taught that agile is the “only” way, even though we’ve seen just how wrong that is.
I’m not sure what we can really do about it. I’ve always championed strongly opinionated dev setups where I work. Some of the things we’ve done, and are going to do, aren’t going to be great, but what we try to do is to build an environment where it’s as easy as possible for every developer to build code the most maintainable way. We want to help them get there, even when it’s 15:45 on a Thursday that has been full of shit meetings in a week that’s been full of screaming children and an angry spouse and a car that exploded. And things like DRY just aren’t useful.
Yeah, no. Not at all.
I imagine that you are taking DRY quite literally, as if and critiquing the most stupid use cases of it, like DRYing calls to Split with spaces to SplitBySpace.
DRY's goal is to avoid defining behaviors in duplicity, resulting in having multiple points in code to change when you need to modify said behavior. Code needs to be coherent to be "good", for a number of of the different quality indicators.
I'm doing a "side project" right now where I'm using a newcomer payment gateway. They certainly don't DRY stuff. Same field gets serialized with camel case and snake case in different API, and whole structures that represent the same concept are duplicate with slightly different fields. This probably means that Thursday 15.25 the dev checked-in her code happy because the reviewer never cared about DRY, and now I'm paying the price of maintaining four types of addresses in my code base.
God no. Stop the copy pasta disease! It's horrible, mindless programming.
When reviewing code, I'm astonished anything was accomplished by copy pasting so much old code (complete with bugs and comment typos).
Incidentally, OOP encourages you to copy a lot. It's just an engine for generating code bloat. Want to serialize some objects? Here's your Object serializer and your overloaded Car serialize and your overloaded Boat serializer, with only a few different fields to justify the difference!
OOP is bad. Copy pasta is bad. DRY is good. All hail DRY, forever, at any cost.
Countless man-centuries have been lost looking for the perfect abstraction to cover two (or an imagined future with two) cases which look deceptively similar, then teasing them apart again.
OOP and Dry are compatible! I’ve actually done the thing that the above commenter suggests - create a base object with created on/by so that I never have to think about it. Whether or not you actually care about that, if you implement a descended of that object you’re going to get some stuff for free, and you’re gonna like it!
Nobody, ever, is claiming no abstractions are useful or worthwhile. The issue is DRY implies that you should always look for an abstraction to avoid repeating yourself. Trust me, that way lies madness. It should be “sometimes repeat yourself, based on enough context, consideration and experience”. But that’s not as snappy.
For what it's worth, I've always had an easier time combining WET code than untangling the knot than is too DRY code. Too little abstraction and you might have to read some extra code to understand it. Too much abstraction and no one other than the writer, and even then, may ever understand it.
There's a mistake many junior devs (and sometimes mid and senior devs) make where they confuse hiding complexity with simplicity - using a string instead of a well defined domain type is a good example, there is a certain complexity of the domain expressed by the type that they don't want to think about too deeply so they replace with a string which superficially looks simpler but in fact hides all of the inherent complexity and nuance.
It causes what I call the lumpy carpet syndrome - sweeping the complexity under the carpet causes bumps to randomly appear that when squashed tend to cause other bumps to pop up rather than actually solving the problem.
Go now has generics, so I'm confident some smart fellow will apply DRY and make it a generic ValidatedData[type, validator] type struct, with a ValidatedDataFactory that applies the correct validator callback, and a ValidatorFactory that instantiates the validators based on a new valdiation rule DSL written in JSON or XML.
This is a variation on one of my favorite software design principles: Make illegal states unrepresentable. I first learned about it through Scott Wlaschin[1].
There is no such requirement. Common wisdom suggests that you should ensure zero values are useful, but that isn't about every random struct field – only the values you actually give others. Initialize your struct fields and you won't have to consider their zero state. They will never be zero.
It's funny seeing this beside the DRY thread. Seems programmers taking things a bit too literally is a common theme.
Then the zero value is their problem, not yours. You have no reason to be worried about that any more than you are worried about them not getting enough sleep, or eating unhealthy food. What are you doing to stop them from doing that? Nothing, of course. Not your problem.
Coq exists if you really feel you need a complete type system. But there is probably good reason why almost nobody uses it.
> Then the zero value is their problem, not yours.
Except for all those times you're the consumer of someone else's library and there's no way for them to indicate that creating a zero-valued struct is a bug.
Again, it's the philosophy of "Just do the right thing everywhere and you don’t have to worry!" Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs.
> Except for all those times you're the consumer of someone else's library and there's no way for them to indicate that creating a zero-valued struct is a bug.
Nonsense. Go has a built-in facility for documentation to communicate these things to other developers. Idiomatic Go strongly encourages you to use it. Consumers of the libraries expect it.
> Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs.
Well, sure. But, like I said, almost nobody uses Coq. The vast, vast, vast majority of projects – and I expect 100% of web projects – use languages with incomplete type systems, making what you seek impossible.
And there's probably a good reason for that. While complete type systems sound nice in theory, practice isn't so kind. There are tradeoffs abound. There is no free lunch in life. Sorry.
> The vast, vast, vast majority of projects – and I expect 100% of web projects – use languages with incomplete type systems, making what you seek impossible.
…where, "what GP seeks" is…
> way for [library authors] to indicate that creating a zero-valued struct is a bug
I'd say that's a really low and practical bar, you really don't need Coq for that. Good old Python is enough, even without linters and type hints.
Of course it's very easy to create an equivalent of zero struct (object without __init__ called), but do you think it's possible to do it while not noticing that you are doing something unusual?
No, Python is not enough to "...work with a type system where designers of libraries can actually prevent you from writing bugs." Not even typed Python is going to enable that. Only a complete type system can see the types prevent you from writing those bugs. And I expect exactly nobody is writing HTTP services with a language that has a complete type system – for good reason.
> Of course it's very easy to create an equivalent of zero struct
Yes, you are quite right that you, the library consumer, can Foo.__new__(Foo) and get an object that hasn't had its members initialized just like you can in Go. But unless the library author has specifically called attention to you to initialize the value this way, that little tingling sensation should be telling you that you're doing something wrong. It is not conventional for libraries to have those semantics. Not in Python, not in Go.
No, you do. Anywhere the type system is incomplete means that the consumer can do something the library didn't intend. Rust does not have a complete type system. There was no relevance to mentioning it. But I know it is time for Rust's regularly scheduled ad break. And while you are at it, enjoy a cool, refreshing Coca-Cola.
> Go's zero-values are the problem
"Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs." has nothing to do with zero-values. It doesn't even really have anything to do with Go specifically. My, the quality of advertising has really declined around here. Used to be the Rust ads at least tried to look like they fit in.
This insane perspective of “nothing is totally perfect so any improvements over what go currently does are pointless” whenever you confront a gopher with some annoying quirk of the language is one of the worst design flaws in the golang community hivemind.
Tell us, why you hold that perspective? It's an odd one. Nobody else in this thread holds that perspective. You even admit it is insane, yet here you are telling us about this unique perspective you hold for some reason. Are you hoping that we will declare you insane and admit you in for care? I don't quite grasp the context you are trying to work within.
Top of the hour again? Time for another Rust advertisement?
The topic at hand is about preventing library users from doing things the library author didn't intended using the type system, not "what happens if a language has zero-values". Perhaps you are not able to comprehend this because you are hungry? You're not you when you are hungry. Grab a Snickers.
Don't worry, I have tried languages without zero-values. But they have nothing to do with the discussion that was taking place before the ad break. Now back to the show, you cannot prevent library consumers from doing things you don't intend without a compete type system. Rust does not have a complete type system. It leaves holes open for library consumers to do unexpected things and as such it has no relevance here. Sorry that your client's product isn't the be all and end all.
The original claim was that with go, doing certain pattern "[...] guarantees that you can never forget to validate the username through any codepath". Which is not true. It is not true, because go has its own billion-dollar-mistake called zero values.
If you go way back there was talk about that, but the discussion had long shifted to "Sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs."
I get it: You were in such a rush to fill your marketing quotas that you didn't bother to read the entire thread. Maybe the lesson here is don't use HN as an advertising platform next time? You should have known better from the get go.
You manage to present a strawman and produce a No True Scotsman fallacy all at once in this comment thread.
Nobody is suggesting that Coq should be used, so stop bringing it up (strawman). And yes, Coq might have an even stricter and more expressive type system than Rust. But nobody is asking for a perfect type system (no true Scotsman). People are asking to be able to prevent users of your library to provide illegal values. Rust (and Haskell and Scala and Typescript and ….) lets you do this just fine whereas Golang doesn’t.
And personally I would much rather have the compiler or IDE tell me I’m doing something wrong than having to read the docs in detail to understand all the footguns.
My personal opinion is that - even though I’m very productive with Golang and I enjoy using it - Golang has a piss poor type system, even with the addition of Generics.
> People are asking to be able to prevent users of your library to provide illegal values. [...] and Typescript
Typescript, you say?
const bar: Foo = {} as Foo
Hmm. Oh, right, just don't hold it wrong. But "sometimes it's nice to work with a type system where designers of libraries can actually prevent you from writing bugs."
Your example doesn’t even satisfy the base case, let alone the general case. Get back to us when you have actually read the thread and can provide something on topic.
It might be an accident. Someone uninitiated may think that is how you are expected to initialize the value. A tool like Copilot may introduce it and go unnoticed.
But let's assume the programmer knows what they are doing and there is no code coming from any other source. When would said programmer write code that isn't deliberate? What is it about Go that you think makes them, an otherwise competent programmer, flail around haphazardly without any careful deliberation?
I always understood "parse don't validate" a bit differently. If you are doing the validation inside of a constructor, you are still doing validation instead of parsing. It is safer to do the validation in one place you know the execution will go through, of course, but not the idea I understand "parse don't validate" to mean. I understand it to mean: "write an actual parser, whatever passes the parser can be used in the rest of the program", where a parser is a set of grammar rules for example, or PEG.
I'm not a Haskell developer, so it's possible that I misunderstood the original "Parse, Don't Validate" post.
>If you are doing the validation inside of a constructor, you are still doing validation instead of parsing.
Why that would be considered validation rather than parsing?
From the original post:
>Consider: what is a parser? Really, a parser is just a function that consumes less-structured input and produces more-structured output.
That's the key idea to me.
A parser enforces checks on an input and produces an output. And if you define an output type that's distinct from the input type, you allow the type system "preserve" the fact that the data passed a parser at some point in its life.
But again, I don't know Haskell, so I'm interested to know if I'm misunderstanding Lexi Lambda's post.
Parse don't validate means that if you want a function that converts an IP address string to a struct IpAddress{ address: string } you don't validate that the input string is a valid IP address then return a struct with that string inside. Instead you parse that IP into raw integers, then join those back into an IP string.
The idea is that your parsed representation and serializer are likely produce a much smaller and more predictable set of values than may pass the validator.
As an example there was a network control plane outage in GCP because the Java frontend validated an IP address then stored it (as a string) in the database. The C++ network control plane then crashed because the IP address actually contained non-ASCII "digits" that Java with its Unicode support accepted.
If instead the address was parsed into 4 or 8 integers and was reserialized before being written to the DB this outage wouldn't have happened. The parsing was still probably more lax than it should have been, but at least the value written to the DB was valid.
In this case it was funny Unicode, but it could be as simple as 1.2.3.04 vs 1.2.3.4. By parsing then re-serializing you are going to produce the more canonical and expected form.
Perhaps "normalize" or "canonicalize" is more appropriate. A parser can liberally interpret but I don't take it to imply some destructured form necessarily. There are countless scenarios where you want to be able to reproduce the exact input, and often preserving the input is the simplest solution.
But yes usually you do want to split something into it's elemental components, should it have any.
Thanks for that explanation! I hadn't appreciated that aspect of "parse, don't validate," before.
But even with that understanding and from re-reading the post, that seems to be an extra safety measure rather than the essence of the idea.
Going back to my original example of parsing a Username and verifying that it doesn't contain any illegal characters, how does a parser convert a string into a more direct representation of a username without using a string internally? Or if you're parsing an uint8 into a type that logically must be between 1 and 100, what's the internal type that you parse it into that isn't a uint8?
Personally I don't think I would have used the phrase "parse don't validate" for something like a username. It isn't clear to me what it would mean exactly. I generally only thing of this principle for data that has some structure, not as much a username or number from 1-100.
IP address would be about the minimum amount of structure. Something else would be like processing API requests. You can take the incoming JSON and fully parse it as much as possible, rather than just validate it is as expected (for example drop unknown fields)
> Or if you're parsing an uint8 into a type that logically must be between 1 and 100, what's the internal type that you parse it into that isn't a uint8?
Just for the sake of example, your internal representation might start from 0, and you just add 1 whenever you output it.
Your internal type might also not be a uint8. Eg in Python you would probably just use their default type for integers, which supports arbitrarily big numbers. (Not because you need arbitrarily big numbers, but just because that's the default.)
If you do that, people outside the package can also do Username(x) conversions instead of calling NewUsername. Making value package private means that you can only set it from outside the package using provided functionality.
If I have a user type, inferred from a Zod schema:
> { username: string; email: string }
And a function which takes that type:
> storeUser(user: User)
There is absolutely nothing that guarantees that the user object has been parsed by Zod. You can simply:
> storeUser({ username: “”, email: “no” })
And Typescript will not shout at you.
The only way to comparably solve it with Typescript is to inject a symbol into the object during parsing which confirms it has been passed through the correct parser function.
Personally, I just do basic type parsing on input data (usually request data) and more strict parsing where constraints like “is this a valid username, is this a valid email” during output (usually sending to the database). What happens in between I/O doesn’t matter much in many projects (CRUD), and in the places it does you can enforce more rigidity.
Depends on what the problem definition is. If it's having an as bullet proof solution as possible, you're of course right.
But there are simpler and cheaper solutions that might be good enough.
I guess my intention was also to point out that there're mature frameworks for many languages, but somehow most people in the Go community keep reinventing the wheel and unfortunately more often worse than better. Some years ago I wrote a Go web service. Of course I found the first two versions of OPs series. They're great to read and even greater to watch on YT, but I preferred the approach ardanlabs (Bill Kennedy). It was for sure interesting going through all of this, but incredible time consuming.
Well, I use ajv and they have ways of applying format validation, so not just saying: "this is a string", but rather, "this is a string and must be a valid domain name".
Now, if your complaint is rather that you can call whatever method and pass in your bogus data, I don't see the point in arguing that. It's your code, the only person who can stop you is you.
> Now, if your complaint is rather that you can call whatever method and pass in your bogus data
This entire comment thread is a discussion about how to prevent that from being a possibility. The person I responded to threw their hat in with a Typescript solution that doesn’t achieve the goal being discussed. I was simply pointing this out.
I mean, you may end up just wanting something like,
type UsernameError struct {
name string
reason string
}
func (e *UsernameError) Error() string {
return fmt.Errorf("invalid username %q: %s", e.name, e.reason)
}
And reason can be "username cannot be empty" or "username may not contain '<'" or something like that.
This is fine for lots of different cases, because it’s likely that your code wants to know how to handle “username is invalid”, but only humans care about why.
I have personally never seen a Go codebase where you parse error strings. I know that people keep complaining about it so it must be happening out there—but every codebase I’ve worked with either has error constants (an exported var set to some errors.New() value) or some kind of custom error type you can check. Or if it doesn’t have those things, I had no interest in parsing the errors.
I write mostly frontends. Sometimes the APIs I talk to give back beautiful English error messages - that I can't just show to the user, because they are using a different language most of the time. And I don't want to write logic that depends on that sentence, far too brittle.
Right—I think the “error code” here is going to be the error type, i.e., UsernameError, or some qualified version of that.
It’s not perfect, but software evolves through many imperfect stages as it gets better, and this is one such imperfect stage that your software may evolve through.
Including a human-readable version of the error is useful because the developers / operators will want to read through the logs for it. Sometimes that is where you stop, because not all errors from all backends will need to be localized.
The problem is that pattern "fails open." If anyone on the team forgets to define an untrusted string as UnvalidatedString, the data skips validation.
If you default to treating primitive types as untrusted, it's hard for someone to accidentally convert an untrusted type to a trusted type without using the correct parse method.
The dual problem would be any function which forgets to accept a ParsedString instead of a string can skip parsing.
Both cases appear to depend on there being a "checkpoint" all data must go through to cross over to the rest of the system, either at parsing or at UnvalidatedString construction.
>The dual problem would be any function which forgets to accept a ParsedString instead of a string can skip parsing.
>Both cases appear to depend on there being a "checkpoint" all data must go through to cross over to the rest of the system, either at parsing or at UnvalidatedString construction.
The difference is that if string is the trusted type, then it's easy to miss a spot and use the trusted string type for an untrusted value. The mistake will be subtle because the rest of your app uses a string type as well.
The converse is not true. If string is an untrusted type and ParsedString is a trusted type, if you miss a spot and forget to convert an untrusted string into a ParsedString, that function can't interact with any other part of your codebase that expects a ParsedString. The error would be much more visible and the damage more contained.
I think UnvalidatedString -> string also kind of misses the point of the type system in general. To parse a string into some other type, you're asserting something about the value it stores. It's not just a string with a blessing that says it's okay. It's a subset of the string type that can contain a more limited set of values than the built-in string type.
For example, parsing a string into a Username, I'm asserting things about the string (e.g., it's <10 characters long, it contains only a-z0-9). If I just use the string type, that's not an accurate representation of what's legal for a Username because the string type implies any legal string is a valid value.
In Go, capitalized identifiers are exported, whereas lowercase identifiers are not.
In the example I gave above, clients outside of the package can instantiate Username, but they can't access its "value" member, so the only way they could get a populated Username instance is by calling NewUsername.
Now what? the username is in an unexported field and unusable? I can kind of see what its going for but it seems like a way just to add another layer of wrapping and indirection.
It would need a getter here. Probably good to keep it immutable, if you want guarantees that it will never be changed to something that violates the username rules.
Yeah, thats what I figured. Im not sure if I want the tradeoff of calling .GetValue in multiple places just to save calling validate in maybe 2 or 3 places.
Not to mention I cant easily marshal/unmarshal into it and next week valid username is a username that doesnt already exist in the database.
Maybe this approach appeals to people and Im hesitant to say “that’s not how Go is supposed to be written” but for me this feels like “clever over clear”.
> Yeah, thats what I figured. Im not sure if I want the tradeoff of calling .GetValue in multiple places just to save calling validate in maybe 2 or 3 places.
The tradeoff is not that you save calling validate, it’s that you avoid forgetting to call validate in the first place, because when you forget to validate, you get a type error.
IMO it’s a little more clear this way:
type Ticket struct {
requestor Username
assignee Username
}
It lets you write code that is little more obvious.
I’m not sure I understand. In your example you’ve grouped related data in a struct and validating that it matches your system’s invariants, that feels good to me.
The original example was more “wrap a simple type in an object so it’s always validated when set” which looks beautiful when you don’t have the needed getters in the example nor show all the Get call sites opposed to the 1 or 2 New call sites. All in the name of “we don’t want to set the username without validation” but without private constructors Username{“invalid”} can be invoked, the validation circumvented and I’m not convinced the overhead we paid was worth it.
The countless bugs I've had to deal with and all the time I've lost fixing these bugs caused by people who forgot to validate data in a certain place or didn't realize they had to do so proves to me that the overhead of calling a get on a wrapper type is totally worth it.
I value the hours wasted on diagnosing a bug far more than the extra keystrokes and couple of seconds required to avoid it in the first place.
No, you’ve achieved an illusion of that as now your spending hours wasted on discovering where a developer forgot to call NewUsername and instead called Username{“broken”}. I cant see the value in this abstraction in Go.
But surely this is just another way of doing validation and not fundamentally "parsing"? If at the end you've just stored the input exactly as you got it, the only parsing you're potentially doing is in the validation step and then it gets thrown away.
Implementation-wise, yes, but the interface you're exposing is indistinguishable from that of a parser. For all your consumers know, you could be storing the username as a sequence of a 254-valued enum (one for each byte, except the angle brackets) and reconstructing the string on each "get" call. For more complex data you would certainly be storing it piecewise; the only reasons this example gets a pass are 1) because it is so low in surface area that a human can reasonably validate the implementation as bug-free without further aid from the type checker, and 2) because Go's type system is so inexpressive that you can't encode complex requirements with it anyway.
The validation is not completely thrown away, since the type indicates that the data has been validated. I understand "parsing" as applying more structure to a piece of data. Going from a String to an IP or a Username fits the definition.
I push my team to use this pattern in our (mostly Scala) codebase. We have too many instances of useless validations, because the fact that a piece of data has been "parsed"/validated is not reflected in its type using simple validation.
For example using String, a function might validate the String as a Username. Lower in the call stack, a function ends up taking this String as an arg. It has no way of knowing if it has been validated or not and has to re-validate it. If the first validation gets a Username as a result, other functions down the call stack can take a Username as an argument and know for sure it's been validated / "parsed".
one of my biggest pet peeves is when people take a Config object, which represents the configuration of an entire system, and pass it around mutably. When you do that, you're coupling everything together through the config object. I've worked on systems where you had to configure the parts in a specific order in order for things to work, because someone decided to write back to the config object when it was passed to them. Or another case was where I've seen it such that you couldn't disable a portion of the system because it wrote data into the config object that was read by some other subsystem later. The pattern of "your configuration is one big value, which is mutable" is one of the more annoying patterns that I've seen before, both in Go and in other languages.
The keyword here is “mutable” config object and not config data object in general. I use immutable config dataclass liberally in one of my python projects and i pass it around in all modules. Many functions rely on multiple values and instead of passing all of them as function parameters (which requires their own function typings), the dataclass has all variables with typing definitions in one place, its pretty handy design pattern.
It's a lot of boilerplate to create something that's not actually immutable. It also makes it harder to figure out which options are available, since now you can't just look at the documentation of the type, you have to look at the whole module package to figure out what the various options are. If one of the fields is a slice or map you can just mutate that slice or map in place, so it's not really immutable. The pattern as Pike describes it has the benefit that supplying an option returns an option that reverses the effect of supplying the option so that you can use the options somewhat like Python context objects that have enter and exit semantics, but in practice I've found that to be useful in a small portion of situations.
It could be mutated by anything in the package that contains the type. The only thing Go can make truly immutable - as in a compiler error if you try - is a constant primitive.
Functional options have some niceties - largely their ability to evolve without breaking changes - but as the GP points out, completely break discoverability with intellisense. Having to do some dance with filtering usage to find the options available is just worse than a static config struct with zero values that are meaningful for anything not set.
The options for the thing being constructed are all separate types from the thing being constructed; the options aren’t a facet of the definition of the type they mutate.
- main constructor is easily available from the main type's docs,
- option type is easily available from the main constructor's docs,
- all option funcs are easily available from the option type's docs (because in fact these option funcs are constructors for the option type).
Excerpt from grpc godoc index:
type Server
func NewServer(opt ...ServerOption) *Server
...
...
type ServerOption
func ChainStreamInterceptor(interceptors ...StreamServerInterceptor) ServerOption
func ChainUnaryInterceptor(interceptors ...UnaryServerInterceptor) ServerOption
func ConnectionTimeout(d time.Duration) ServerOption
func Creds(c credentials.TransportCredentials) ServerOption
etc...
One more hop compared to a flat argument list, that's true. But if you only commonly use maybe 0-5 arguments out of 30-50 available, it does not look like a bad deal.
I've tended to create a Config struct for each package and then a configs.Config struct that's just made up of each package's Config. It might not be a Go best practice but I like that I can setup the entire system's configuration on startup as one entity but then I only pass in the minimally required dependencies for each package.
It also makes testing a little easier because I don't have to fake out the entire configuration for testing one package.
I agree. We ran into sev by changing the top level config object before. You DO NOT want to modify it. The wasted man hour is not worth. You will never know where or how it get used. If you make changes it's better to derive from it instead.
Update:
What's funny was, in our design the config object was kinda immutable. You have to use the WARNING_DO_NOT_USE api to make modification. We did mutate the object and we caused a sev
Once you've loaded it and mutated it for testing purposes or for copying from ENV vars into the config, you can then freeze it before passing it down to all your app level code.
Having this wrapper object that can be frozen and has a `get()` method to read JSON like data make it effectively not mutable.
I use similar pattern myself. Was curious if the OP is using some other, like for instance splitting the struct into two (im/mutable) and then passing them around, or what.
BTW kudos on zanzibar. Love the tech and the code).
It took me a long time to settle on this pattern and I admit it's tedious to copy configuration over to the server struct, but I've found that it ends up being the least verbose and maintainable long term while making sure callers can't mutate config after the fact.
I can pass nil to NewServer to say "just the usual, please", customize everything, or surgically change a single option.
It's also useful for maintaining backwards compatibility. I'm free to refactor config on my server struct and "upgrade" deprecated config arguments inside my NewServer function.
I just use a struct literal, and then I have the type define a `func (t *Thing) ready() error { ... }` method and call the ready method to check that its valid. I prefer this over self-referential options, the builder pattern, supplying a secondary config object as a parameter to a constructor, etc.
> one of my biggest pet peeves is when people take a Config object, which represents the configuration of an entire system, and pass it around mutably.
How do you create immutable structs in Go? I didn't think you could, which makes this more a Go problem than a `passing around Config object` problem.
(One of my pet peeves, coming to Go from C, is how little of stronger typing there actually is. In C, I pass and return const objects everywhere I can, my enums are not just ints because the compiler can warn when I forget one in a switch statement, etc).
I really like Mat Ryer's work, and I've applied most of the ideas in the 2018 version of this article to all of my Go projects since then.
The one weak spot for me is this aspect:
>NewServer is a big constructor that takes in all dependencies as arguments... In test cases that don’t need all of the dependencies, I pass in nil as a signal that it won’t be used.
This has always felt wrong to me, but I've never been able to figure out a better solution.
It means that a huge chunk of your code has a huge amount of unnecessary shared state.
I often end up writing HTTP handlers that only need access to a tiny amount of the shared state. Like the HTTP handler needs to check if the requesting user has access to a resource, and then it needs to call one function on the datastore.
I'd love to write tests where I only mock out those two methods, but I can't write simple tests because the handler is part of this giant glob where it has access to all of the datastore and every object the parent server has access to because it's all one giant object.
Nothing against Mat Ryer, as his pattern is the best I've found, but I still feel like there's some better solution out there.
I've become increasingly sensitive to these high afferent coupling points in the repos I work on, especially the deeper I embed into the world of bazel and how dependency management and physical design influence the code I author.
Where possible plugins are a great strategy to lay down these code seam points that don't force all possibilities upon some body of code, because fundamentally with plugin architectures you pick and choose what you want. Plugins are opt out by default, you must explicitly opt into a plugin for it to manifest. I've been calling software that has this quality going as being an "a la carte" style.
But in general you do what you need to do to avoid "doing everything so you can do anything".
I tend to write most of my logic in packages... so a "users" package or a "comments" package (if we were building HN). These have NO http interface! They do however each have their own "main" and some sort of CLI interface: "//go:build ignore" in the comment of that file is your friend.
It means the object created by NewServer is dealing with too much. Probably has too many data types coupled to it and too much behavior.
Simple example is adding a logger. If you add it as a dependency to the constructor, the object starts doing a bit more than initial simple implementation. It's fine to do it, but shame to not figure out how to log without editing the implementation of a simple thing.
Higher order functions (a logger decorator) get there to allow composition, but even they have their drawbacks.
It's still some form of structure that you can deal with, not a mistake.
As you say, having a logger attached is one of those pragmatic and acceptable exceptions to the rule. In a perfect world we'd have the time to go to the trouble of implementing loggable types and data flows and associated higher order functions, in practice taking the compromise means getting the real business valuable work completed while still having the necessary (but usually "low priority") non-functional requirements like logging and metrics implemented.
I agree that too many arguments to the constructor may have the smell of too much coupling.
But if I really feel I can't avoid the need to pass a good amount of external context, I create a dedicated "options" struct and pass that into the constructor as a pointer. The purpose of the pointer (rather than pass by value) is if I want default arguments, I can pass nil.
I felt this way for a long time. And maybe I'm projecting my past struggles onto what you're describing. I shared my current approach in a different comment already [0]. The gist is that I use an optional config struct, whose values get validated and copied over to my server struct inside NewServer. This makes testing much easier because I can mock fewer deps.
FWIW, I really tried to make the functional option pattern work for me, as many others have suggested, but eventually abandoned it. I felt it was a little too clever and therefore difficult to read, while requiring more boilerplate than the config struct + validate and copy pattern.
The biggest problem with the "pass nil for unused dependencies" approach is that when you modify some code to actually use one of those dependencies when it didn't before, you have to go back through every test and populate it.
Automatic dependency injection doesn’t solve that, it masks it and makes the onset of pain from poor design appear much later than it would without - by which time it is harder to fix.
My experience has been the opposite - we've been maintaining large (tens of millions loc, thousands of developers) mobile app codebases with this approach for about 8 years now, and results have been much better than what was done before.
A number of bad decisions were made and later discovered over time, and it was much easier to address them with automatic dependency injection available than without. Using as "big" of a test as you can without regressing speed or flakiness gives you really solid, non-fragile tests, which makes large changes tractable to do safely.
I don't know whether certain poor designs would have been spotted sooner, but this certainly made the code easier to change and resulted in more useful and lower maintenance tests.
Basically, not all the handlers will use every dependency the server (which is the entire program in this pattern) has. Not every handler will use a database, for example.
While I may prefer a struct for this instead of separate arguments, I do agree it's useful to capture "the world" as the set of all dependencies, even if some handlers don't use them (yet).
> My handlers used to be methods hanging off a server struct, but I no longer do this. If a handler function wants a dependency, it can bloody well ask for it as an argument. No more surprise dependencies when you’re just trying to test a single handler.
For HTTP services in any language, your handlers will usually end up with a lot of business logic, logic which probably has many dependencies. I see single handlers using all of the following on a regular basis: DB, cache, blob storage, some kind of special authz thing specific to your endpoints, maybe some fancy licensing checker, a queue or two, a specialized logger, and specialized metrics client. Many of those (metrics, request/response logging) can live in middlewares most of the time, but in every code base there will be times where you need to do something custom with one or the other. As time passes, the more I wonder "why aren't these all just function parameters?"
Yes, that would be a lot of function parameters (9+ for a single handler, before even getting into the request or custom params themselves), and we all have many rules of thumb and linter rules which try to keep us from having lots of function parameters. But it's not like we're not writing code which depends on all those dependencies, instead we're just sticking them on the "server" class/struct and pretending that because the method signature is shorter, we have fewer dependencies!
As time passes, I find myself wishing more and more for code that takes all its dependencies in the function/method signature, even if there's 20 of them; at least then we wouldn't be lying about how complex the code's getting...
And in my main.go, or where I set up my dependencies, I create each operation, passing it its specific dependencies. I love that because I can keep all the helper methods for that specific operation/handler on that specific struct as private methods.
It does get tedious when you have one operation needing another, as you might start passing these around or you extract that into its own package/service.
This is kinda missing the point; each handler needs a lot of deps to do it's job, and the most obvious place to put them is in the parameters of the function. That is what I want. I do not want more indirection for aesthetics; I want clarity, even if it's brutal clarity.
Whether all the deps are in the method receiver (the parent struct) or in a struct that's a param; it's all just more indirection to hide all the "stuff" that we need cause we think it's ugly. I dream of a world where we don't do that.
You do have to instantiate that struct, and you can do it with.... a beautiful NewCreateUser(dep1, dep2, dep3, ..., dep20) *CreateUser {...}. This is essentially what he recommends with his "func newMiddleware() func(h http.Handler) http.Handler".
I'm pointing out that this is basically "passing all the deps at once" with extra steps but no functional benefit; they are at best aesthetic, at worst confusing.
I'd like a world that sacrifices a bit of aesthetics in order to erase ambiguity or confusion. So instead of putting your deps in a struct that's a param, or putting your deps in a parent closure, I'd like to put them in the function params.
Though I will admit that if I had to choose, I'd use (and have used) the closure approach most often.
It doesn't have to be 9+ separate arguments, in some languages it can be a single 'context' or 'env' object that contains just what the handler needs, something like `handleHello({ db, cache, blobStore, authz }, req, res)`. That way, if two handlers use the exact same context you can reuse, but it's also easy enough to declare a per-handler context at the call site.
I agree with a lot of this, I'll add my own opinions:
* I would pass a waitgroup with the app context to service structs. This way the interrupt can trigger the app shutdown via the context and the main goroutine can wait on the waitgroup before actually killing the app.
* If writing a CLI program, then testing stdout, stdin, stderr, args, env, etc. is useful. But for an http server, this is less true. I would pass structured config to the run function to let those tests be more focused.
* I disagree with parsing templates using sync.Once in a handler because I don't think handlers should do template parsing at all. I would do this when the app starts: if the template cannot be parsed, the app should not become ready to receive any requests and should rather exit with a non-zero exit code.
If you just have a context, than your app cannot kill itself and the environment has to do it. That is better than nothing, but having the app do the killing is advantageous because: A) it can die faster (and so you can e.g. do your blue-green rollout faster) and B) you can write a log to say that your app is finished shutting down all its components, which can be useful for troubleshooting if your app was mid-transaction when it was killed.
The problem with this approach is writing openapi by hand from scratch is incredibly tedious process. Writing Protobufs, capnproto or any such similar idl feels much more productive
Agree 100% with all points. I love contract-first. Better for producers and provides multiple tooling options and better docs for consumers.
I was agreeing with parent that some spec formats frankly suck to write by hand and openapi yaml is IMHO one of those (as opposed to say .protos which are nice for humans to read/write).
I use LLM as glorified interactive autocomplete to speed up writing the yaml specs (not the code! Use deterministic generators for that!) and it works great for me, personally (n=1 anecdote).
All the advice in the article is still helpful, but it takes the "how do I make sure X is initialized when Y needs it" part completely out of the equation and reduces it from an N*M problem to an N problem, ie I only have to worry about how to initialize individual pieces, not about how to synchronize initialization between them.
I've used quite a few dependency injection libraries in various languages over the years (and implemented a couple myself) and the simplicity and versatility of fx makes it my favorite so far.
>All the advice in the article is still helpful, but it takes the "how do I make sure X is initialized when Y needs it" part completely out of the equation and reduces it from an N*M problem to an N problem, ie I only have to worry about how to initialize individual pieces, not about how to synchronize initialization between them.
I gotta say, I hate these dependency injection frameworks.
In a well designed system this should be trivial. Making sure something is initialised when you want to use it is just a matter of it being available to pass in a constructor as a parameter.
There shouldn't be any sort of "synchronisation" of initialisation needed because your code won't compile if you do something wrong. If you add a cyclic dependency you will clearly see that because you won't be able to construct things in the right order without an obvious workaround.
If you have ever topologically sorted 100 components connected in a complex graph by hand or found the right spot to insert the 101st, you'd quickly appreciate more help than a compiler check.
I’m not against DI, but I don’t find your argument convincing: having dependencies modelled directly with the simplest language constructs (variables and arguments) and validated by the compiler makes “debugging” a ton simpler than dealing with DI errors, even in a good DI framework. Having an error just means I wrote invalid code: even a junior can easily figure it out.
I don't disagree with you, I've argued against the usage DI frameworks plenty of times on projects I was working on. Many are not well made, are overly complicated and do much more than one single thing.
Especially in Go, where you don't have destructors to help with shutdown, having common structure in place to help tear down components has always been a net benefit for me.
But when micro-services are so common, it seems like people use them (Spring) because everyone else does, not because they actually provide needed value.
It should be inserted literally right next to it's first use case. Your IDE will literally point it to you with red squigglys because the places where you've added a dependency will be missing a parameter. Go to the highest one and add it on the line above.
I've never seen a tree graph, not without lots of global mutable state to cheat around DI. Your logger is just going to be needed almost everywhere.
What do you do on shutdown? In languages with destructors, that can automatically give you a call order in reverse of the construction order, but in Go you end up manually ordering things or just not having panicless shutdowns.
Okay, it's not a tree. Because multiple objects will depend on something like a logger. But it's an acyclic graph if designed properly. Which is incredibly simple to setup and teardown.
If your loggers are needed everywhere, then you just pass them as a constructor to the objects that need them. You're literally doing this with fx anyway.
Like, a logger is probably the first thing you new up in main(). So now you can pass it down as a dependency in constructors.
For shutdown you just defer your shutdown functions. Have a basic interface where your services have a Shutdown() method and then you can push them onto a stack and pop them off during shutdown.
There's no manual ordering involved. Your initialisation is a linear top down process, your shutdown is bottom up. It can't be any simpler. If you keep code as close to usage sites then there's only 1 possible order.
I agree with you on all of this. fx is not doing much more for shutdown than what you describe (calling a handler pushed to a stack created during initialization). Instead of implementing this for every app, I just prefer to use a library with great documentation and tests.
Great article with lots of interesting ideas. Can't believe I didn't know about signal.NotifyContext. Finally I'll be able to actually rememeber how to respond to signals instead of copy-pasting that between projects.
In my newTestServer, I spin up a server with fakes for my dependencies. If I want to test a dependency error, I replace that property with a fake that will return an error. I can validate my error paths. I can validate my log entries. I can validate my metric emission. I can validate timeouts and graceful shutdowns.
After the server starts, I inspect to determine which port it is running on (default is :0 so I have to wait to see what it got bound to).
My "unit" tests can test at the handler level or the http level, making sure that I can fully test the code as the users of my system will see it, exercising all middleware or none. I can spin up N instances and run my tests in parallel.
I just run Go servers under fcgi. You get orchestration and crash recovery with a very simple interface. Fcgi will launch server processes as needed, feed them events, and shut it down when there's no traffic. Performance is good, and you can run on cheap hosting.
Which hosting do you use? I use fastcgi with python on Dreamhost and it works fine, but I’m sorta worried that they’ll turn it off because it seems kind of niche and under-documented
The author goes on to explain a few scenarios where the pattern is helpful. It's not to keep main.go as small as possible, it's so that you can test parts of your main.go file properly. In my experience, if all of my logic is stuffed into `func main() {}`, then I can't actually test it. If I have a helper method(like run in this case), I can test out specific scenarios and ensure the application handles it properly. Some of the examples Mat gave were handling context cancellations properly.
There are so many situations where I have a feeling that people are solving problems that don’t exist. In code I run into at work, code and projects I see online, etc
The “whose dreams are you making come true” really applies here, because dreams are exactly what they are.
I spent quite some time writing an automatic image resizer and optimiser for my blog. Does it matter? No! Should I have spent that time writing blog posts instead? Yes! Still I was chasing some dream.
I've never been a fan of making main.go one line. I create the logger, parse the flags, create objects from the flags, and call Run() or something. In the tests, you aren't ever going to do those things in the same way, so there is really no point in putting them in some other file.
Usually your main function can't be used by any other part of your program. You should move all component implementations to modules so they can be re-used elsewhere.
For bespoke internal services, I like to keep main.go as flat as reasonable, like a "script". Handlers can have their own files but the bulk of the control flow and moving parts should be apparent from reading the main file.
Abstracting things away from main makes it less readable and is general pointless for bespoke services that will be deployed in exactly one configuration.
That's a nice way of putting it. When exploring a new codebase for the first time it can be very helpful to have main.go give you a high level idea about the overall structure of the program.
main() is the only place where you can't return an error. In order to keep as much of the code as idiomatic as possible you just call something like run() where you can do so.
In addition there is the testing aspect. You can't invoke main() from your tests.
The idea is to keep the untestable code as small as possible but in practice you just add a layer of indirection and all of your untestable init code is in a different castle.
The encode example contains a bug and a lint issue. Firstly, calling w.Header().Set after w.WriteHeader is likely a bug, as the w.WriteHeader method call should occur after setting the headers.
The second issue involves passing an unused *http.Request, which will likely cause the linter to flag it.
I'm not familiar with that package structure, unfortunately. It might be good, but I'm not sure what the reasons are for structuring the project that way.
I did write my own HTTP stuff in C (and more generally internet stuff), on linux (sometimes without a libc, namely direct syscalls), running on ARM64 and x86_64.
And I plan to move to rv64 assembly once I can get reasonably performant hardware (it is already here, but it extremely hard to get some where I am from and how I operate). I dunno if it will be bare metal or with a linux kernel first (coze a minimal TCP stack is already a big thingy).
Only thing I agree on is putting all the paths in one file.
In most other programming languages I've done a lot of research how to make it nice and clean.
Was hoping this was it for Go because I'm cleaning up a big project.
But my very basic no nonsense current setup seems better to me than this in many ways.
If anybody has another example that is a lot better (and I don't mean complexer I don't have those ego issues), I am very interested.
But this I want to hide as best as I can from my dev team this is all wrong. It's clever in a lot of ways but it's wrong.
It does not have unit testing at all, all these tests would be duplicated in the end-to-end test.
I also like end-to-end tests better but why put them here, way better to put them in postman for example then you have the most up to date documentation always auto-generated.
Passing the config, man I had so many discussions with junior developers about this, don't do that you'll make things dependent on the config and cannot reuse them in other programs. But that was already mentioned a lot here.
There are also a lot of functions with like 10 arguments passed. If you have that many arguments just pass a stuct containing a lot of the arguments it's always super confusing when people make functions with 12 arguments. I'm always counting them an after 14 times counting I rewrite their function.
It's a matter of style so keep doing it this way if you like it, but it's not my style at all it makes no sense at all to me.
I used to do this, but ever since reading Lexi Lambda's "Parse, Don't Validate," [0] I've found validators to be much more error-prone than leveraging Go's built-in type checker.
For example, imagine you wanted to defend against the user picking an illegal username. Like you want to make sure the user can't ever specify a username with angle brackets in it.
With the Validator approach, you have to remember to call the validator on 100% of code paths where the username value comes from an untrusted source.
Instead of using a validator, you can do this:
That guarantees that you can never forget to validate the username through any codepath. If you have a Username object, you know that it was validated because there was no other way to create the object.[0] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...