Hacker News new | past | comments | ask | show | jobs | submit login

My rule is, don't use a dependency to implement your core business. Is JSON parsing our core business? No, so why would we ever write -- and thereby commit to supporting for its entire lifetime -- JSON parsing code? All the code you write and support should be directly tied to what you as a business decide are your fundamental value propositions. Everything else you write is just fat waiting to be cut by someone who knows how to write a business case.

To be clear, this is about the lifetime support of code. It's very, very rare that code can be written once and never touched. But that long tail of support eats up time and money, and is almost always discounted in these conversations. I don't even care that Jackson JSON parsing has years of work behind it, when I can hack together a JSON parser in a day. I care that Jackson will continue to improve their offering without any further input, while that's not true of my version.

> don't use a dependency to implement your core business

In logic language, you're saying "If X is your core business, don't outsource X".

> Is JSON parsing our core business? No, so why would we ever write -- and thereby commit to supporting for its entire lifetime -- JSON parsing code? All the code you write and support should be directly tied to what you as a business decide are your fundamental value propositions. Everything else you write is just fat waiting to be cut by someone who knows how to write a business case.

The rest of your argument is interpreted as "If X is not your core business, don't in-house X".

These two logical implication statements are not equivalents of each other, but are converses. Casual language often conflates If, Only-If, and If-And-Only-If.

Since we're in pedanticville, these aren't converses, but inverses. The converse goes "If you don't outsource X, then X is your core business".

I agree with this statement also. By writing code to do JSON parsing, JSON parsing is now part of your business.

Thanks, you are right. A converse is logically equivalent to an inverse, so I'm only half wrong. =)

Ah, yes, everyone knows the three values of Boolean logic...

true, false and null

Actually it's True, False, and FileNotFound. :o)


True, False, Null and OffByOne

I think they both follow similar thinking.

You should spend time implementing your core business implies that you shouldn’t spend time implementing things that aren’t in your core business, otherwise the first statement is pretty useless.

Maybe not equivalents but definitely two arguments for his core point.

core business = C

outsource = O

Object = x

x ∉ C ↔ O(x)

If the symbols don't show up:

if-and-only-if x is not in C, O(x)

If I wanted to learn more about rigorous, non-elementary logic, do you have a recommended resource? I've taken a course in intro level probability theory which covered it generally and another course that built on it lightly but nothing rigorous and I am wooed by how concise things become in a logical form.

A Tour Through Mathematical Logic. You don't have to do any proofs. If you learn Propositional Logic and First Order Logic you'll already have most of the tools to invent the rest.

I think the problem is that the individual contributor has decided to make that chunk of logic their business. This will probably not benefit the team or the organization.

What is the point you're trying to make?

Ha! I had exactly the same thought.

Well, one special edge-case would be where you only need to parse some extremely tiny subset of JSON (for example: you only need to parse dictionaries whose keys and values are positive integers, like {1:2,3:4}). Then, depending how expensive the full json parser is, it might be worth your while just writing the limited parser yourself.

Of course, you might say, inevitably feature-creep will expand the list of things your parser needs to parse, but that's not a law of physics. Sometimes in certain limited, well-defined projects, it really is true that YAGNI.

Your example is more apt than intended: That's not valid json, which only allows string keys. If you use a library it'll either barf now or later when they fix it, so if you're forced to work with an API like that and can't change it, a custom parser is really the only way to go.

Well the good libraries have option flags that will allow you to handle JSON as found in the wild.

That's not JSON, though. It's absolutely something else. Maybe a JS snippet. Maybe YAML. Definitely not JSON, though.

(Some JSON libraries do have option flags, but usually it's about whether, during deserializing into a known type, unknown fields are an error or silently ignored. Or whether C-style comments are an error or considered as whitespace.)

It may not be JSON but it’s out there in the wild. A lot.

Let’s say I’m scraping a website. I can:

1) complain to the owner that “it’s not JSON”.

2) write a parser for a syntax that has no spec, (it’s not JSON, but it sure looks an awful lot like JSON with unquoted keys.)


3) Set ALLOW_UNQUOTED_FIELD_NAMES to true in the Jackson library.

> Maybe a JS snippet.

While acceptable, also misleading: JS only does string keys, but unlike JSON it'll convert whatever it's given into strings. Not a problem most of the time, since it'll do the same conversion for both accessing and setting, but good to be aware of if you're doing something like iterating Object.keys()

I use https://jsonformatter.org/json-parser sometime to parse json and to validate.

You can also apply YAGNI to 'do we need our own custom parser'?

You don't know what your requirements are. The customers haven't told you yet.

If you pick a library with a straightforward interface, especially one that isn't too opinionated, you can always drop in a custom implementation later on. Frameworks, not so much (but that cuts both ways; the people who will write libraries often love writing frameworks too)

There are fully correct JSON parsers you aren't going to beat, even if you implement a subset. [1]

[1]: https://branchfree.org/2019/02/25/paper-parsing-gigabytes-of...

No, I won't beat them. But if it a limited subset that I can implement with twenty lines straightforward code, that will often be cheaper.

I've been on projects where they imported xml-parsers many times bigger than the rest of the whole codebase just to send a well formatted order number.

xml and json are different beasts

A json parser can probably be implemented in an afternoon. But a conformant xml parser can take months.

There are some weird things in xml, for example, this is correct xml:

    <?xml version="1.0"?><!DOCTYPE abc[<?abc >]]<abc><abc/>]>>?>]><?x?><a/><?x <(x)>>?>
And for external entities, you need an http client in the xml parser, although it is probably better to not support that part of the standard.

And we didn’t need a conformant xml parser, which I know is huge and complex.

And what happens when your parsing needs to be expanded?

Then you reevaluate the situation. YAGNI is true here too.

YANGI is nice but when the PM asks you why it would take two months to accept a new JSON format from a client and you’ll answer well because we didn’t want to use an industry standard fully functional and vetted JSON parser so we essentially wrote our own edge case parser we both know how that conversation will end.

And YANGI doesn’t have anything against dependencies.

I said, then you reevaluate the situation.

When there are new requirements you do a quick estimate if you should add four new lines to the existing 20 or if it is worth to switch to an external library. 4 new lines to the 20, just add them to the core. But if this is regularly occurring that you have to add things, or requirements affecting this particular little parser that was supposed to be simple and static isn’t, then you should probably change your decision and use the library.

But you do that only then. Because chances are that with your approach you are going to drag along a large generic library that you only use a tiny fraction of. And that also has costs. In particular if your immediate impulse always is to add another library instead of writing things yourself.

That's IMHO key – solve problems once you know them, not earlier. Old idea, also core to XP.

taking in consideration how business work, in a few years you are going to have a full parser in your hands.

With all the technical debt associated with it, which is the problem basing your project on a dependency that would allow you to easily scale and add features is a huge benefit.

This is like saying you should roll your own crypto because you only need to do a very limited sub set of crypto operations so why use something like NaCl or Tink.

Encryption is a terrible edge case. If you are forced to half-ass encryption, you should seriously question the project requirements. Bad encryption can be worse than none at all. Things won't end well if data security is treated as a detail.

What do you mean by cheaper?

By using a third party library you are writing twenty lines less code, so it's cheaper in that aspect.

There are probably libraries that are faster than your twenty lines of un-optimized code, so it's cheaper as far as computing resources are considered too.

The only time it could matter is when you ship the code to the client through the wire (such as a Javascript bundle).

It’s cheaper in the sense that it is faster to write and maintain those 20 lines of code. Because someone has to evaluate the library, understand it well enough to actually call it and then make sure it stays up to date. And often there are a few lines of code to translate your data into a form that the library requires etc.

Plus for every developer to come, one call to an external library usually also means 30 pages of documentation to trawl through if they ever want to change anything, 29.9 of which is completely irrelevant to whatever your narrow use case is.

That's the real cost. The size of the code means absolutely nothing.

You can’t use a library with zero lines of code. On top of this library’s always have development overhead outside of the code you write. Ex: What version number should you use? Did the latest version break something? Did the old version break something on the latest compiler? Etc etc.

And it's going to take more than an afternoon to evaluate these parsers. You have to look at the options, evaluate the API, evaluate if they're stable and supported, evaluate if they integrate well with you're project, evaluate any dependencies they might have, etc. Then you need a plan to manage these dependencies long term.

If you're needs can be solved adequately by strtok then that's a far simpler and more maintainable solution that can be knocked out in an afternoon.

I agree.

> Of course, you might say, inevitably feature-creep will expand the list of things your parser needs to parse

If you've done your parser correctly, you'll be able to replace its implementation with the new dependency, with little to no need for extra refactoring in the rest of the codebase.

I think a JSON parser is not a good example though — takes longer than a few hours / an afternoon, to write a JSON parser, add tests, fix bugs, corner cases. More like a week, or weeks, ...

... Look, a tiny json parser — Not an afternoon project: https://github.com/rafagafe/tiny-json/blob/master/tiny-json....

And a question about small JSON parsers — didn't see any afternoon projects among the answers:


I suppose a JSON parser was just an example. Made the whole answer sound weird to me though :- ) when the blog is about afternoon-projects and then a reply is about a week(s), could be month(s), long project.

Same with CSV. It looks easy, but it isn't. I've never seen anyone who writes their own CSV parser actually implement features necessary to conform to the standard like quoting and escape sequences. The end result is software that breaks when delimiters or quotes appear in user input. Honestly, I prefer xlsx spreadsheets because of that. Nobody fools themselves into implementing the parser or serializer for the format themselves. The only tiny pitfall with them is when people create spreadsheets manually in excel and write numbers as text, but parsing strings to numbers is absolutely trivial. You have to do that with CSV anyway.

It really is easy:

$csv = str_getcsv( $input );

But PHP makes everything easy :trollface:

> I think a thing is not a good example though — takes longer than a few hours / an afternoon, to write a thing, add tests, fix bugs, corner cases. More like a week, or weeks, ...

You're making my point for me. This is exactly what I meant by the lifetime of support you're signing up for by writing lines of code. Once you write that code, you're now in the business of supporting that code. Was that a good decision for your business?

There's a fair middle ground when the dependency in itself doesn't have dependencies, and is small enough with a permissive license such that the entirety of its code can be dropped in to your project. Especially for very specific functionalities. I have used such tiny xml parsers, and I'm not affected by the fact that my copy is no longer the latest version. Its not so far from copying and pasting snippets of existing code.

Great rule. I was wondering, how do you manage updating the Jackson JSON parsing package. What if you have 100 such packages and they get updated weekly with breaking changes ?

If you have a hundred direct dependencies and they all break the API on a weekly basis then: you are either at a scale where you can handle that, or you are using wrong dependencies, or you are doing something wrong.

I can understand max 10 dependencies iterating so quick. But only when they are your own internal dependencies and these should definitely not break the API weekly.

* corrected spelling

For what reason are you updating your packages? Is there a severe security issue in that package or, if it works today, could you pin it to that version and wait until there is a compelling reason to update it.

Here's some reasoning - if this project was inhoused would we detect and patch it any quicker? Would we have a dev constantly assigned to it that would be pushing out patches to the rest of the team... or is it the sort of software we'd write once and then wait until a compelling reason to invest more into. Whether software is inhouse or outsourced you still retain decision making about how much time to invest in its maintenance.

> if this project was inhoused would we detect and patch it any quicker?

If it's a bespoke library, no one but you and hackers directly targeting you will test for security vulnerabilities. (Good thing you have a red team... right?) For widely-used libraries, the number of vulnerabilities isn't going to be much different from your own library, but the likelihood that they're found and exploited in your system is quite lower.

So no, in most cases, you would not detect and patch vulnerabilities quicker, because you probably don't see them until it's too late.

> if it works today, could you pin it to that version and wait until there is a compelling reason to update it.

If you pin versions for a long time, eventually there comes a point where you have to update something because of a critical bug or security advisory, and of course since it's a critical bug or advisory, you have to update "right now", "priority 1", "all hands on deck", "the board is involved" and everything. The fix is in version 5.1.2 of the library, but you're stuck at 2.6.5, so now you have to do three major version upgrades (with all the changes to your codebase that entails) before you can even think about upgrading to the version containing the security fix. And that's still an easy case. If the library in question is a framework like Rails or React, version upgrades of that size may be a major undertaking that takes weeks or months to prepare, execute and validate. That's very much not fun when management is pressuring you to close that vulnerability.

I think it's never a good idea to sit on ancient libraries. Put a recurring task in your team backlog to update dependencies on a schedule. It's not going to result in less work spent upgrading, in all likelihood it's more work in terms of raw hours worked compared to the update-on-security-advisory strategy, but it's much more plannable and less stressful. That doesn't mean you have to upgrade to latest-greatest immediately (you always have the freedom to hold off a particular upgrade until the new major version has had some time to mature etc.), but there should be some time reserved on your schedule for doing your updates.

For instance, I have my update-all-lib-deps reminder in my calendar on the 1st of every even month. When it comes up, I put a task in my backlog with a checklist containing every application I have to check, upgrade and deploy. Go 1.15 just came out today, so that's going to be on my desk come October. Great timing, actually, we're going to be one or two point releases into the 1.15 branch at that point, so it's going to be a safe and easy upgrade.

Only update dependencies when your code requires the new version, depends on a bug fix or it fixes a security vulnerability. Otherwise, continue using the same version.

Have good test coverage to catch bugs that may originate in dependencies and subscribe to a third-party service to track vulnerabilities in your dependencies.

Then you get 5 year out of date packages, which eventually have a security vulnerability, and now you have the task of upgrading and working through 5 years of (potentially) breaking changes and deprecations.

It's generally easier in the long run to keep your dependencies up to date. If a package has a new breaking change each week, that's a sign you probably shouldn't be using it for production code.

When you have a hundred dependencies- who is looking at the release notes to see what security vulnerabilities are being fixed?

Github can do it for you automatically.

Update all your dependencies periodically - monthly, quarterly, whatever. Freeze dependencies in the meanwhile.

If you're in a larger corporater environment this can also be used to create some predictable labour needs - create a seasonal updating taskforce so that the business get a more transparent view of how much labour is being sunk into maintaining these, break it down into specific dependencies if you've got one or two that you think are particularly expensive- showing after the fact labour numbers from one season may motivate sane inhousing for next season.

There's lots of opinions on this, all with good justification. My current team leaves most dependencies unlocked and depends on good automated tests to sniff out broken dependencies. If necessary we lock dependencies to a particular version or range (e.g. <2.0.0). Once tested, we freeze for distribution.

Some people just never upgrade until they need to. That's workable, though when you do need to upgrade a package you may be spending the rest of the week working out a cascade of breaking changes.

If you only upgrade when you need to, but not necessarily to the latest versions, odds are that whatever breakage is caused by the latest nodejs/npm/etc incompatibility has already been documented in issue trackers or stackoverflow

> What if you have 100 such packages and they get updated weekly with breaking changes?

The solution to that is simple, stop using node.js ;)

Only good solution in this thread :)

That's true.

Beside _lifetime support_, working on that core business feature make us _understand_ deeply about the that feature.

I've seen people integrate dependency for their core business. It helped to get started fast, but will create a blockage that required understanding deeper to overcome

So you're saying that I should implement my own ormapper just because my product is using a database? And even this is not thee case, writing everything yourself will end up in your own hands. No Bugfixes, no patches or improvements without spending man work. I've worked in such a company and it was a mess accompanied by dev leaders who's to proud of their code to allow any change.

I'm confused by your response. Is your core business mapping objects to databases? As in, that's what you get paid for? If not, my heuristic is that you should not be writing an ORM tool.

But "it's a good problem to have"!

Quite agree, every single line of code written requires lifetime support. Code adds up and reduces productivity gradually, so only write code in core business logics.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact