
Why do we need modules at all? (2011) - thomas11
http://erlang.org/pipermail/erlang-questions/2011-May/058768.html
======
thomas11
Armstrong's proposal reminds me a bit of Emacs extensions. Since Emacs Lisp
doesn't have namespaces or modules, all functions must be uniquely named which
is done by prefixing them: foo-replace. This is not that different from having
a module foo, as Armstrong notes: "managing a namespace with namees like
foo.bar.baz.z is just as complex as managing a namespace with names like
foo_bar_baz_z".

But what it enabled is an Emacs community where single functions are freely
shared, for example on
[http://www.emacswiki.org/emacs/](http://www.emacswiki.org/emacs/). People
just copy them into their Emacs init file. Sometimes they modify them a little
and post them again with their own prefix. This has obvious downsides such as
lack of versioning and organization. But it provides a low barrier to entry
and creates a dynamic community.

------
inflagranti
To me this is the same question whether we need directories or not in a file
system. Ideally, your file system is a flat database and files are indexes by
a vast array of automatic and manually added metadata that allows to easily
retrieve them. Microsoft tried to go this direction with WinFS that was
eventually cut for Vista, maybe because it wasn't practical (yet). Looking how
people use the Internet though, where 90% of browsing will start at Google,
this does seem a very reasonable approach for many things in the future. At
the end, why should humans do manual indexing and retrieval if the computer
can facilitate this part?

------
felixgallo
I think a lot of people are focusing on the implementation details here, which
is fun and great, but the real deep insight here is the idea of a global
registry of correct functions.

If you postulate for a minute that the (truly nontrivial) surface problems are
all solved, and concentrate only on the idea of a universally accessible group
of functions that accretes value over time -- like a stdlib that every
language on every runtime could access -- that seems like a pretty exciting
idea worth thinking about.

I had something like that idea almost two decades ago ([http://www.gossamer-
threads.com/lists/perl/porters/26139?do=...](http://www.gossamer-
threads.com/lists/perl/porters/26139?do=post_view_threaded#26139)) but at the
time it was all in fun. But these days, that sort of thing starts looking
pretty possible, especially for the group of pure functions.

------
andrewstuart2
Because humans suck at serialized content.

7 +- 2. [1] That's the number of things our prefrontal cortex/short term
memory can track at once. That's why we (humans) organize things into
hierarchies. That's why the best team size is around that number. Etcetera.

Heck, everything in the world on a computer is serialized into memory or onto
disk. Or addressed as some disk in a serial array of disks. Serialized as in,
"there's some data somewhere in these 2TB that tell me where in the same 2TB
the rest of the data is." Computers excel at this. Humans are terrible at
this.

I guess my point is, humans are the reasons we need modules.

[1]
[http://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_...](http://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two)

~~~
PythonicAlpha
That is exactly, what I thought, too!

It is all about handling complexity! By putting together things that belong
together, complexity is reduced. Also engines and other machines are designed
that way: Things belonging together, are put together in the same spot.

------
shaurz
I quite like the idea. I think it would probably still make sense to have
"collections" where a bunch of related functions can be grouped together,
discovered and worked on as a unit (this would just be an optional extra layer
on top of the global function database). Although there would no exclusivity
in collections so a function might appear in more than one, or zero,
collections.

Another idea: Unit tests could be stored as function metadata.

------
cwmma
JavaScript works similar to this this and apps/libraries that wrap themselves
in a giant closure work almost exactly like this. The disadvantage of this
over using modules is in dependencies between functions. When you don't have
modules and you try to refactor you get this annoying tendency for function a
in file b to break when you change function y in file z. When you have modules
you can easily tell before changing function a whether it is exported or not,
and if it is to see in file z wither file a is imported.

Not saying this Erlang idea isn't good or wont work, just these are the
pitfalls besides the obvious name spacing and conflicts.

~~~
seiji
_JavaScript works similar to this this and apps /libraries that wrap
themselves in a giant closure work almost exactly like this._

Nope. Joe's thought experiment is: what if _every function_ became available
in the global namespace? What if _every function_ got kept in a global
datastore so you: launch your REPL, run any function, and have it pulled down
and work immediately. No imports unless you need to pin a specific function to
a specific past revision.

Of course, someone would come along and say "these 30 functions only work
together when pinned to these specific revisions," so you end up pulling down
a named bundle of specific revisions, ...

------
Verdex
I saw Joe's strange loop talk [1] a while ago and I get the same vibe reading
his post as I did when watching the video. It sounds very cool, but I can't
shake the feeling that it only works for 85% of the code. That is to say if
you program in exactly the right way, you will be able to do everything you
want and it will work with this system, but there are ways of programming that
won't work with this system.

More specifically I feel like there are two problems. 1) It feels suspiciously
like there's a combination of halting problem and diagonalisation that shows
there are an uncountably infinite number of functions that we want to write
that can't be named (although I would want to have a better idea of how this
is supposed to work before I try to hammer out a proof). 2) I don't understand
how it's possible for any hashing scheme to encode necessary properties of a
function such that the function with necessary properties has a different hash
than an otherwise identical function without these properties. For example can
we hash these functions such that stable sort looks different than unstable
sort? Wouldn't we need dependent typing to encode all required properties? And
if that's the case couldn't I pull a Gödel and show that there's always one
more property not encodable in your system?

[1] -
[https://www.youtube.com/watch?v=lKXe3HUG2l4](https://www.youtube.com/watch?v=lKXe3HUG2l4)
[2]

[2] -
[https://news.ycombinator.com/item?id=8572920](https://news.ycombinator.com/item?id=8572920)
(thanks for the link)

~~~
nowne
There are countably infinite number of functions. A simple proof is that each
function can be represented as a string, and there are countably infinite
number of strings for a finite alphabet. You could also argue that functions
are equivalent to Turing machines, and there are a finite number of Turing
machines.

------
derefr
A function's true name should be its content hash. (Where that content hash is
calculated after canonicalizing all the call-sites in the function into
content hash refs themselves.) This way:

\- functions are versioned by name

\- a function will "pull in" its dependencies, transitively, at compile time;
a function will never change behaviour just because a dependency has a new
version available

\- the global database can store _all_ functions ever created, without
worrying about anyone stepping on anyone else's toes

\- magical zero-install (runtime reference of a function hash that doesn't
exist -> the process blocks while it gets downloaded from the database.) This
is _safe_ : presuming a currently-accepted cryptographic hash, if you ask for
a function with hash X, you'll be running known code.

\- you can still build "curation" schemes on top of this, with author
versioning, using basically Freenet's Signed Subspace Key approach (sort of
equivalent to a checkout of a git repo). The module author publishes a signed
function which returns a function when passed an identifier (this is your
"module"). Later, they publish a new function that maps identifiers to other
functions. The whole stdlib could live in the DB and be dereferenced into
cache on first run from a burned-in module-function ref.

\- function unloading can be done automatically when nothing has called into
(or is running in the context of) a function for a while. Basically, garbage
collection.

\- you can still do late binding if you want. In Erlang, "remote" (fully-
qualified) calls don't usually mean to switch semantics on version change;
they just get conflated with fully-qualified self-calls, which are explicitly
for that. In a flat function namespace, you'd probably have to make late-
binding explicit for the compiler, since it would never be assumed otherwise.
E.g. you'd call apply() with a function identifier, which would kick in the
function metadata resolution mechanism (now normally just part of the _linker_
) at runtime.

Plug: I am already working on a BEAM-compatible VM with exactly these
semantics. (Also: 1. a container-like concept of security domains, allowing
for multiple "virtual nodes" to share the same VM schedulers while keeping
isolated heaps, atom tables, etc. [E.g. you set up a container for a given
user's web requests to run under; if they crash the VM, no problem, it was
just their virtual VM.] 2. Some logic with code signing such that calling a
function written by X, where you haven't explicitly trusted X, sets up a
domain for X and runs it in there. 3. Some PNaCl-like tricks where object
files are simply code-signed binary ASTs, and final compilation happens at
load-time. But the cached compiled artifact can sit in the global database and
can be checked by the compiler, and reused, as an optimization of actually
doing compilation. Etc.) If you want to know more, please send me an email
(levi@leviaul.com).

~~~
pjc50
_a function will never change behaviour just because a dependency has a new
version available_

Presumably this only works for pure-functional languages?

~~~
endergen
I'm assuming his suggestion is no updates ever unless you update your
dependencies version. many module systems like say npm has a culture of using
fuzzy matching which means when you do an npm install again you can easily
pull in new versions of libraries that have had upgrades since you last did
npm install. I'm of a fan of strict dependencies but some people prefer to
make it easier to stay to the latest minor patch or other logical upgrade
patterns.

------
protomyth
Lambda the Ultimate's discussion [http://lambda-the-
ultimate.org/node/5079](http://lambda-the-ultimate.org/node/5079) is pretty
interesting.

------
philbo
To answer the question in the title directly, I think modules are to aid
reading and discovery.

The fact that it is difficult to decide which module a function belongs in
doesn't make them pointless. People who have to read or debug your code use
them to quickly zero in on areas of likely interest.

------
al2o3cr
In my experience, telling programmers "all functions must have unique names"
means you get a half-ass module system tacked on via common prefixes. In other
words, you get "foo_bar_function1", "foo_bar_function2" etc.

------
ryanisnan
While you're talking about Erlang specifically, the concepts you bring up can
be applied to programming in general.

Why does Erlang (or any other language) have modules?

The biggest reason for me (and I think the one with the most merit) is for
clarity and usability.

Modules exist as ways of grouping units of code by the responsibilities of
that code. If you removed this hierarchy, wouldn't things become a lot more
difficult to navigate and understand as a developer?

------
brianshaler
Is the author's use of the term `module` specific to erlang? To me, it sounds
like he's advocating for modules that are comprised of a single function,
rather than utility belt modules that contain many functions. As I understand
it, I agree with what the author proposes, and I feel like a subset of npm
already provides what he's talking about. The best example is probably
underscore.js versus lodash.js, which both have many functions and a wide API
surface area. What's notable is that you can cherry-pick individual lodash
functions and depend on a specific version[0]. (Admittedly, I lazily pull in
the full lodash module instead of importing only the function(s) I'm using)

Lately, I've been moving more toward the proposed design in my Node.js
projects. It keeps individual files concise, makes code sharing trivial,
encourages stateless methods, and it makes writing tests a breeze.

[0] [https://www.npmjs.org/browse/keyword/lodash-
modularized](https://www.npmjs.org/browse/keyword/lodash-modularized)

------
Alex3917
This is basically what Urbit is doing, among other things.

~~~
balquhidder
Is Urbit a real thing, or an elaborate hoax?

~~~
reirob
It made me curious too. Found this HN post about urbit:
[https://news.ycombinator.com/item?id=6438320](https://news.ycombinator.com/item?id=6438320)

~~~
reirob
And then I found these pages related to Urbit:|

[0] [http://lambda-the-ultimate.org/node/3855](http://lambda-the-
ultimate.org/node/3855) [1] [http://doc.urbit.org/](http://doc.urbit.org/) [2]
[http://www.popehat.com/2013/12/06/nock-hoon-etc-for-non-
vulc...](http://www.popehat.com/2013/12/06/nock-hoon-etc-for-non-vulcans-why-
urbit-matters/) [3] [http://moronlab.blogspot.fr/2010/01/urbit-functional-
program...](http://moronlab.blogspot.fr/2010/01/urbit-functional-programming-
from.html)

Trying to understand what it is about.

------
tel
The problem is now you either have zero data abstraction or uncontrolled data
abstraction without even a convention like "these functions work together as a
bundle" to save you.

That said, a nice SML module probably could work as the base abstraction here.

------
rymohr
The problem with this approach is you need to consider every existing function
name in order to define a new one.

The beauty of commonjs modules is they allow you to focus on implementation,
rather than identification. All functions can be anonymous, identified only by
their path and named at the whims of the caller.

------
endergen
Related to this would be all the cool content addressable third-party meta
data. Services could automatically generate pre-compiles of things or
alternate optimizations. Or auto complete data, or statistics, test suities,
behavioral diffing, example code, documentation, the options are endless.

------
jbert
So, immutability and/or api contract is important here.

If I'm pulling in a function, I want it to do what I think I want. Sometimes I
want that to change (get a bug fix), but sometimes I don't (someone introduces
a bug, or makes the func more general and introduces slowdowns).

This feels like a job for a content-addressable git-like tool. How about this:

I can discover my function (via whatever means). The function is actually
named 8804ea505fda087da53b799434c377f015933707 (the sha-something of it's
(normalised?) textual representation).

I then import it into my codebase as "useful_fun". My code reads like:

    
    
        useful_fun("do it", "to it")
    

but I have some kind of dependencies/import record which says that
"useful_fun" is actually 8804ea505fda087da53b799434c377f015933707. That means
one and only one thing across all time, the func with that hash.

So how do we handle updates? If we want a golang-like model, the developer
could run something like "update deps". This would:

\- go back to the central repository, looking for updates to
8804ea505fda087da53b799434c377f015933707. It might find 5. Local policy then
determines what happens. Could be "always choose the original authors update"
or "choose the one with the most votes" or "always ask the dev, showing
diffs".

Note that because the unique name is based on the function content, any change
to it creates a new item in the db. (Content-addressability, same way git and
other systems do it.)

\- stuff can be grouped and batched. If I pull in 10 functions tagged with the
same project ('module') and they've all been updated, I can say "and do the
same with all the others".

\- This kind of metadata allows all kind of good stuff. I can subscribe to
alerts on the functions I've imported and get told about new versions, or
security warnings. This kind of subscription information can be used as a
popularity contest to solve the "which fork on github do I want to use"
problem?

\- people can still publish modules. They now look like a git directory or
tree. A git tree is a blob which contains the hashes of the files within it. A
'module' could be a blob which specifies which (immutable) functions are in
it.

If we use normalised functions, we've now got a module representation which
allows arbitrary functions to be pulled together. At fetch time, we can
denormalise into the user's preferred coding style. At push time, we
renormalise. We aren't grouping stuff into files, so a 'project' or a 'module'
consists solely of the semantic contents, nothing to do with artificial
grouping for the file system.

Seems like an interesting future.

~~~
doty
I think Mr. Armstrong would approve, given his comments near the end of
[https://www.youtube.com/watch?v=lKXe3HUG2l4](https://www.youtube.com/watch?v=lKXe3HUG2l4),
where he opines that the web would be great if, instead of URLs, every
published document were just named with a hash of its content.

~~~
pyre
> every published document were just named with a hash of its content.

I see too many issues with this (for example):

\- I publish a news article. I publish a retraction/update to said article.
Now the article has a new hash. Does the old hash give you the old version of
the article, or redirect you to the new version?

\- How do we define 'document?' If we define it as the complete HTML page
served up to the browser, then changes to the design of the site would
invalidate all previous hashes. Pointing old hashes to new hashes is work,
which will not always be done (leading to the same situation we have with site
redesigns breaking old URLs).

~~~
endergen
Exactly, you should still be able to have references to persistent identities.
Much like the semantics of clojure which has a distinction between values and
references to identities like vars/agents etc.

These URLs would be clearly marked of course.

~~~
andrewflnr
Why not just have all URLs be mutable aliases for hashes?

------
fat0wl
isn't this issue sortof analogous to the expansion/contraction of a language
core?

Except in this case the core is user-generated and ever-expanding.

I bet there are a lot of issues in Java history that could predict possible
bumps in the road for such a system (since it was essentially concurrently
designed by a bunch of actors -- except in that case they were corporate
entities)

------
hyp0
reminds me of gmail: instead of hierarchical directories ("modules"), just
search, and have multiple tags, so an email can be in more than one directory
("metadata").

Seems especially applicable to fp (like erlang), where code reuse is more
often of small functions.

------
moron4hire
I think what you're discussing is really just namespacing ala C++, Java, or
.NET. Especially with Java and .NET, you don't import a self-contained module
directly from individual source files. The modules are technically all
accessible at any time (or at least, the ones linked in to the build, which in
the case of the Java and .NET standard libraries is quite a smorgasbord). You
just reference the class/function you want in some way: either with using
statements or with fully qualified names.

Because, really, if you start throwing everything into one store, you're going
to run into the naming conflict issue, and any attempt at addressing the
naming conflict issue is going to either look like importing modules or look
like namespaces. You either have to explicitly state what your program has
access to, or you explicitly state what function you mean when you have access
to everything. Realistically, if you give every function a unique name and
_don 't_ use namespaces, then there will start to be functions called
system_event_fire() and game_gun_fire() and disasters_house_fire() and you're
right back to having namespaces, just not in name or with a syntax that makes
things nice when you know you're dealing with specific things.

Though, it'd be nice if types weren't the only thing that could be placed into
a namespace directly in .NET. I'd like to put free functions in there. The
Math class in the System namespace only exists because of this. I'd have
prefered there to be a System.Math namespace and Cosine and Sine be members of
it. Then I could "using System.Math;" and call "Cos(angle)". Instead, I'm
stuck in a limbo of half-qualified names.

And I like it. I like it a lot more than Python, Racket, Node.js, etc. and
having to import this Thing X from that Module Y. I like the idea that linking
modules together is defined at the build level, not at the individual source
file level. These languages are supposed to be better for exploratory
programming than Java and C#, but actually, you know, doing the exploring part
is harder!

Sometimes, I really do just want to blap out the fully qualified name of a
function, in place in my code. System.Web.HttpContext.Current.User. If I'm
doing something like that, it's a hack, and I know it's a hack, and having the
fully qualified name in there, uglying it up, makes clearer that it's a hack.
Though, I suppose I'm one of the rare people who actually do go back and clean
up my hacks.

EDIT: I thought I wrote more, weird.

The network-accessible database of every library, ever, is definitely a great
idea. I think it's where we're heading, with tools like NPM, NuGet, etc. It
seems like a natural progression to move the package manager into the compiler
(or linker, rather, but that's in the compiler in most languages now). Add in
support in an editor to have code completion lists include a search of the
package repository and you're there.

------
zo1
I don't know Erlang, so I might be missing something key here.

"I am thinking more and more that if would be nice to have _all_ functions in
a key_value database with unique names."

Yeah, sure... Sounds good, right. Until you have naming conflicts.

So then the patch is "oh, let's just add another column to make it more
unique", without realizing that you've just, in essence, created a "module" of
sorts except it's stored in some sort of giant key/value database.

And then you've come full-circle back to the dilemma the author complains of
which is that he doesn't know where to put a function that seems to belong in
two modules.

Eventually, I'd say this is a general failing of modules that could
potentially solved by some sort of inheritance. Maybe even a tagging mechanism
if you really want to be "patch-work joe" about it.

~~~
seiji
Okay, let's try to not shit all over new ideas here. If we take what Joe-
from-2011 means instead of hallucinating him to be incompetent...

Let's re-word "all functions in a database" as "a revision control system in a
database."

So, let's make a revision control system. All contents, branches, tags are
kept in a database.

 _Sounds good, right. Until you have naming conflicts._

No, no, no. There are no naming conflicts. The names humans will use are just
pointers to the most recently updated underlying contents. The _actual_ names
are garbage hash identifiers. The _usable_ names are human names bound to
underlying contents.

So, if master is commit A and you make commit B, there is no naming conflict
on the name "master," you just re-point it to commit B.

 _a function that seems to belong in two modules._

That's the problem with explicit hierarchy and why the world now runs on
tagging-based crowdsourced folksonomies.

~~~
zo1
_" A revision control system in a database"_? A revision control system _is_ a
database. I think what you and the author are trying to get at is some sort of
"docker, but for functions" type of thing. And we all know what a mess that is
when it comes to public docker images.

 _" No, no, no. There are no naming conflicts. The names humans will use are
just pointers to the most recently updated underlying contents. The _actual_
names are garbage hash identifiers. The _usable_ names are human names bound
to underlying contents. So, if master is commit A and you make commit B, there
is no naming conflict on the name "master," you just re-point it to commit
B."_

"Master" is the actual name that is going to be conflicting, if I understand
your example.

~~~
simoncion
> "Master" is the actual name that is going to be conflicting...

Yes, but... Just like in git, "master" points to one and only one commit, but
the pointed-to commit might be different in the future. The name of each
_commit_ is a guaranteed-unique-for-the-forseeable-future hash of the contents
of each commit. That name never, ever changes.

If you want a particular _commit_ , you use its unique, non-human-friendly
name. If you just want a particular branch and don't care too much about any
particular commit, you use its collision-prone human-friendly name. Naming of
branches and/or projects is still going to be "hard". Naming of particular
releases/versions of code is not.

------
tracker1
dibs on create_uuid_v4!!

------
the_cat_kittles
this talk about modules as a way to organize similar code makes me wonder- if
you had all the functions in a global namespace, you could probably
automatically generate some kind of organization by extracting relevant
features from each function and doing some kind of clustering. maybe some
features could be the function's dependencies, who depends on it, what it
returns, its signature, and maybe even nlp in the hope that people are
actually using descriptive variable names.

