
How many lines of code is Candy Japan? - bemmu
https://www.candyjapan.com/how-many-lines-of-code-is-candy-japan
======
Retric
This may just be me, but I find well maintained projects tend to get sticky
around 10k lines of code. Much smaller than that and every new thing is just
that a new thing. Much larger and it's easer for new things to kind of get
tacked on at the edges. But, around 10k the project is small enough to
understand and most new code tends to result in some refactoring for shared
functionality. 10k lines is also fairly close to novel length.

Anyone else notice this?

~~~
wheels
Above 10k lines is where statically typed languages really begin to justify
their overhead. I work primarily in Ruby, C++ and Java, and while for smaller
projects, Ruby is far more of a joy to use, for large projects (I've worked on
C++ projects well over a million lines of code) static typing is a blessing.

Refactoring Ruby is painful in a large project, even with decent test coverage
(and a large portion of the tests basically enforce what you'd get
automatically with static typing). C++ is doable in that you'll get
(increasingly humane with clang and recent GCCs) compiler errors when you've
forgotten some detail. But refactoring Java is positively germane with all of
the tools that are enabled by such a consistent VM-model. (It really is neat
to be able to rename a method in an object on 100k LOC project simply by
renaming it in one place or to change a signature with helpers along the way.)

Some of that makes those languages less fun (and quick) to work with
initially, but for projects that are likely to eventually grow quite large,
there can be an eventual pay-off.

~~~
adamnemecek
What kills me about it refactoring c++ is dealing with headers. A lot of
times, something should really be a separate file (or should me merged into
another file) but it would be a pain in the ass to fix the headers.

~~~
webkike
This is a core problem with C++. Until we have packages, you might want to
consider another language. Rust is usually a good choice.

------
wwer444453
I work in a supermarket (in Japan).

Those EAN13 codes you are printing are US only and probably reserved. I
recommend that you change to any one of the in-store codes like 2*. It
specifically carries no restrictions.

~~~
raverbashing
What problems may this cause? It's not like he's scanning this at a
supermarket till

~~~
wwer444453
Nothing. It is like using reserved world IPs in your internal network. There
is perfectly valid range you can use so why not shift to that?

------
eriknstr
My guess when confronted with the number of lines of Python code for the front
page was 5000 SLoC. I lowballed it intentionally because I was under the
suspicion that it might be lower than one might normally guess. (Don't we all
take pride in keeping our SLoC count as low as we can for any given problem?)

The video had some closing points, including "NoSQL sucks for reports", which
was explained briefly. OP, if you could, I would like for you to elaborate a
bit more on this point. I am not doubting that it's true for Google App
Engine, just interested to hear about the particulars.

~~~
stickfigure
I can speak to this, being a heavy GAE user and formerly a heavy MongoDB user.

For a reporting system you want an ad-hoc query language, fast in-database
aggregation, and joins.

The GAE datastore has none of the three, and MongoDB lacks joins. These are
not fatal flaws - the GAE datastore has other advantages like infinite
scalability, built-in synchronous geographic replication, failover, zero
maintenance, etc. But for analytics, we replicate a subset of data to
Postgres. It's still cheaper to have developers write occasional replication
code than to hire a DBA to maintain the database, and I never have to worry
that an ill-conceived sql statement will create an incident. I've come to the
conclusion that using the GAE datastore as primary datastore and replicating
to specialized databases is a pretty good architecture for systems that
require reliability and low-maintenance.

MongoDB's aggregation framework is really painful because the query language
is weird and very low-level - you have to do most query planning yourself. And
without joins you hit the limits of the kinds of questions you can ask the
system very quickly.

FWIW, my guess was 10k lines of code.

------
unwind
Very interesting reading, as almost always from CJ.

I was impressed by the precision in the data, i.e. being able to tell exactly
how many lines of code deal with "fraud detection". Perhaps there's a
"frauddetection.py" file, but then there has to be someone importing it and
using it, and those lines are harder to count, I'd expect. But perhaps the
integrations are small enough to deal with manually.

~~~
bemmu
The code is short enough that what I did was actually just go through every
line of code and try to assign it to a category. It did have some which were
hard, such as the import statements you mention that multiple parts might use.
But I figured being off by a few dozen lines wouldn't be that important for
seeing the bigger picture.

------
asimuvPR
@OP Why not dump the data into postgres rather than code the schema and
transactions? I had a similar issue and postgres was a godsend.

~~~
brianwawok
Well he is on google cloud, which when he wrote it he used the best option.

Google cloud now has cloud sql, which would likely be much better. But then he
needs to code, test, and migrate.. perhaps not worth it?

~~~
maxmcd
The parent is likely referring to dumping the GAE Datastore data into a
Postgres instance for reporting and other analysis. Not swapping out the
primary datastore for the app. This comment elsewhere in this thread explains
the strategy nicely:
[https://news.ycombinator.com/item?id=12613223](https://news.ycombinator.com/item?id=12613223)

~~~
brianwawok
I don't think he has the scale to require NoSQL. Cloud SQL could host his data
and his reporting in a single database.

------
ec109685
It is supposing Recurly doesn't have a better fraud story. Did you look at
using one one "anti fraud" as a service companies or a different vendor that
does subscriptions?

~~~
bhelx
At the time candyjapan had their fraud problems, Recurly didn't have very good
support for these services. Since then, we've formed a partnership with Kount:

[https://recurly.com/press/recurly-teams-up-with-kount-to-
hel...](https://recurly.com/press/recurly-teams-up-with-kount-to-help-
subscription-based-merchants-combat-fraud/)
[https://docs.recurly.com/docs/kount](https://docs.recurly.com/docs/kount)

You get some basic protections for free but you can also integrate with your
Kount account if you want to customize.

------
yojex
Great read! My guess would have been greater than 10,000. Couple of typos
towards the end you might want to fix:

"I don't know how I could have expect it, but somehow from the start I should
have prepared for fraud. You want to at least keep an on any suspicious
activity and react quickly if you start getting many chargebacks."

'expected' and 'at least keep an eye on'

~~~
bemmu
Thanks!

------
dzdt
Coming from an enterprise background, that number is astonishingly low.

~~~
tobltobs
The Active Directory integration alone would take 5k lines.

------
late2part
Yes, but how many hundreds of microservices do you have? :-)

------
wkoszek
Do you guys know similar "show offs", where a author/business presents their
internals with real world requirements (shipping, freud protection) tied to
the software component? I've found it especially interesting.

------
tmaly
I totally get the point on NoSQL, I started out with Redis and some custom
micro services that wrote to Boltdb for my food site. This started getting out
of hand, now I am converting everything to Postgresql. I will still use Redis
for some fast access/caching, but for reporting, sql is still the best way to
go once you know what your data looks like.

------
Ro93
does he live in japan and ship from there? Or live in the uS and somehow gets
the candy?

~~~
icebraining
He does live in Japan: _" Bemmu started Candy Japan with wife Nachi and lives
in Tokushima, Japan."_

------
gcatalfamo
Thanks to this very informative blog post I also discovered
[https://embedd.io](https://embedd.io) that allows to embed HN and Reddit
comments to any website. Thanks again!

EDIT: I don't know why I am getting downvoted. I am NOT from embedd.io (why
else would you downvote otherwise?) and just genuinely discovered the service.
HN is truly toxic lately.

~~~
mtmail
It's off topic to this discussion. If you discovered a new interesting
website/service it's best to submit it as a new discussion.

~~~
jsnathan
I'm glad he mentioned it cause I wouldn't have noticed otherwise. I submitted
it [1].

[1]:
[https://news.ycombinator.com/item?id=12612574](https://news.ycombinator.com/item?id=12612574)

~~~
mtmail
I agree, nice website and glad it's resubmitted. I just tried to explain
possible downvotes (I didn't downvote myself).

