
Always start with simple solution - sasa_buklijas
http://buklijas.info/blog/2018/01/01/always-start-with-simple-solution/
======
zzzeek
JSON is not a human friendly config format; it has harsh rules for quoting,
brackets and commas and most importantly has no intuitive facility for
comments. YAML should be preferred to JSON for config. However, ConfigParser
is even better and a list of IPs need only be space separated (eg hosts =
config.get('hosts').split() , quite simple ) or you could even make the
_value_ of the parameter be a JSON list if you really like to type brackets
and quotes. The "simplest" approach is in fact to use the most _idiomatic_
approach that everyone recognizes, and if you're doing config files in python
that means ConfigParser.

Edit: it seems they went with a newly invented language called json-config
that looks like a hybrid of JSON and C-style comments. That's fine, but using
an invented config format that nobody knows isnt the "simplest" approach at
all. It's the most _complicated_ solution. When people use invented idioms in
their projects for things like config and logging that have widely accepted
idiomatic solutions included in Python std lib that's an immediate red flag.

~~~
philipov
I like YAML, but it needs shell-style $ substitution tokens to allow
constructing values from previously-defined values (such as when defining a
PATH variable, or subdirectory).

To that end, I'm going down the path of writing my own extension to YAML. Can
you recommend one that already exists that accomplishes this goal? I don't see
an alternative, because I can't find something standard that actually meets
the requirements.

Shell scripting languages are platform-specific, and not hierarchically
structured, so that's what I'm trying to replace to begin with.

Ansible requires a central server and full commitment to immutable
deployments, so it's too heavy for a package manager that can support
piecemeal adoption or that could be used by researchers doing ad-hoc
development. It doesn't support windows anyway (linux for windows subsystem
doesn't count).

Conda is nice, but even though it uses YAML for defining dependencies, it
nonetheless falls back to shell scripts for environment variables, and doesn't
offer features for environment modularization.

EDIT: Hierarchical structure is needed to be able to both express and
manipulate configs for software that expects input as XML, so ConfigParser
isn't good enough. Setting a value to a blob of JSON or XML denies the ability
to operate on that blob to do things like merge subtrees. Instead, I'm using
ruamel.yaml as my starting point, and adding post-processing after it's
converted to nested dict/lists.

~~~
zzzeek
> Ansible requires a central server and full commitment to immutable
> deployments

i dont know what that means. Ansible is like a much more structured form of
shell scripting, you can run ansible playbooks from anywhere to do anything
and make them do anything. i don't know what an "immutable deployment" is.

also ansible has nothing to do with "config", YAML is used as a declarative
scripting language rather than a config file format.

~~~
philipov
Really, I just want to know how ansible provides for token substitution, so I
could reuse a value defined elsewhere, so I could define all the
subdirectories in a playbook relative to a single base directory.

YAML doesn't offer a syntax for writing "${BASEDIR}/subdir". There's gotta be
something in ansbile that makes this possible, but I can't find it.

~~~
zo1
That "thing" that you're referring to is handled by Ansible, and not by YAML.
It uses the Jinja2 templating engine. And I would assume Ansible uses some
sort of scoping order/rules in order to provide values for the template
generation.

E.g. look here in the Ansible docs:

[http://docs.ansible.com/ansible/latest/playbooks_loops.html#...](http://docs.ansible.com/ansible/latest/playbooks_loops.html#standard-
loops)

It provides a construct that allows you to loop over a sub-list of YAML items,
and then let's you use the looped-variable in a string-substitution via the
templating language. That's the name: "{{ item }}" that you see there.

Edit. Typo

~~~
philipov
Thank you! this is what I've been looking for, but searches for token,
substitution, and "defining subdirectories" have not led to this. Calling it a
loop is confusing.

It seems the values have to be defined in some other file, so I couldn't chain
them, though.

Suppose I want to write something like...

    
    
      paths:
         BASEDIR: /somedir
         SUBDIR1: {{ paths:BASEDIR }}/subdir1
         SUBDIR2: {{ paths:SUBDIR1 }}/subdir2
    

Doesn't seem like that would work. I would end up putting all my definitions
into a separate file.

------
userbinator
_For example, here I needed a configuration file that will have a list of IP
addresses, which I will iterate in for loop.

This was the smallest requirement that I needed for my problem._

My instinctive solution would be even simpler --- a text file with one IP per
line, similar to a HOSTS file.

~~~
sasa_buklijas
You are correct, that is the simplest way.

What did I not disclose in the article is that I also wanted somehow to name
data.

I wanted this because I also know that in the configuration file, expect IP
addressed I will also need to store sleep duration.

I just find it more articulated if I also have a name for data.

Now I use [https://github.com/pasztorpisti/json-
cfg](https://github.com/pasztorpisti/json-cfg) for storing configuration as
JSON. json-cfg has the ability to add comments to JSON file.

~~~
t0mbstone
If you want to name your data, then you could have had a couple a different
text files that were named differently

~~~
sasa_buklijas
I meant data inside the text file.

Anyway, this is more question of preference :-)

~~~
beefield
I find plain csv a bit underrated in these kind of applications. As long as
you have flat data, you can read and edit the file with practically anything
without too much trouble.

------
tinymollusk
Related, in today's Farnam Street email, he linked to an article examining the
psychology behind why humans prefer complex things to the simple[0].

It's especially interesting because most other psychological biases appear to
be the result of a mental short-cut. Perhaps we see something complex and
mistakenly infer the preconditions were complex, and that somehow makes it
more valuable.

An especially interesting quotation, relevant to this forum is in that
article, by a sportswriter: "Most geniuses—especially those who lead
others—prosper not by deconstructing intricate complexities but by exploiting
unrecognized simplicities."

[0] [https://www.farnamstreetblog.com/2018/01/complexity-
bias/](https://www.farnamstreetblog.com/2018/01/complexity-bias/)

~~~
CM30
Wonder if ego has anything to do with it as well? Perhaps for some people,
they feel 'complexity' makes them look better, even if the simple solution
would better in every rational way. Then again, it could just be boredom. The
more complex solution also usually has a bunch of 'novel' issues to solve, and
solving those can be a lot more enjoyable for certain individuals than doing
things the reasonable way.

~~~
tinymollusk
Yeah, this creates some sort of "moat of knowledge" that increases status if
you make something easy look difficult. Also could make someone appear to be
less replaceable ("we would have to think about something that looks hard to
replace person XYZ!").

There's also the danger of overfitting when building the mental model or
process.

------
stuaxo
The first complexity in the ConfigParser example comes from trying to parse
something like a python list.

Just accept ips seperated by spaces, users will be thankful.

The other complexity is using the raw loading and converting from bytes.

ConfigParser can give you back strings and read the file for you.

Here is a full example:

    
    
        import ConfigParser
        import io
        import shlex
    
        def main():
            config = ConfigParser.ConfigParser(allow_no_value=True)
            config.read('config_ini.ini')
            testing_1 = shlex.split(config.get('other', 'list_queue1'))
            print(testing_1)
         
        if __name__ == '__main__':
            main()
    

This is the same as a comment I made on the website, but for the benefit of
any python devs here.

------
jcoffland
> At that time, I was accessing only one device (only one IP address).

> But could see that in future (in few months to one year), I will need to do
> the same set of command on more devices.

IMHO, this statement represents one of the biggest design smells in
programming. It's usually a bad idea to write software you may need in the
future. Circumstances change. Writing code is expensive and coding in
anticipation of what you might need is often a waste of time.

 _Write code you need now, not code you may need later._

It makes sense to think ahead but if you're spending a lot of time writing
code you might need, then you're probably doing something wrong.

~~~
sasa_buklijas
> IMHO, this statement represents one of the biggest design smells in
> programming. It's usually a bad idea to write software you may need in the
> future. Circumstances change. Writing code is expensive and coding in
> anticipation of what you might need is often a waste of time.

Completely agree, as you said, "... is often a waste of time."

Unfortunately, I was not so lucky, literally 2 days after I have finished the
first version of the program I needed to add 2 more IP addresses.

I do agree that often is a waste of time.

------
orf
This is a pretty confused article. The author starts with a list of useful
things which he then deems 'too complicated', and ends up with using a JSON
based config file. Ok. But the things he listed are not too complex, and the
solution is a simple line-based config file of IP addresses:

    
    
       import netaddr, sys
       ips_to_connect_to = [*netaddr.IPGlob(ip) for ip in open(sys.argv[1])]
    

Those two lines handle points 1, 2 and 3 of his 'nice to haves'. You could
easily add 4 and 5 in under 10 lines. You could expand it to read from stdin
rather than a fixed file easily enough as well.

~~~
sasa_buklijas
Thank for mentioning
[http://netaddr.readthedocs.io/en/latest/](http://netaddr.readthedocs.io/en/latest/),
I did not know about it.

~~~
orf
The built in ipaddress module is enough for your current use case, netaddr
just handles globs.

------
sly010
Command line args would have been the simplest solution imho. Works simple for
simple cases and power users will create a bash wrapper anyway.

Hard to be wise without knowing the context, but I avoid config files as much
as possible, because:

\- Where does the config file live on different machines?

\- What if I want to have 2 configurations on one machine?

\- What if I need to change the list dynamically?

`for ip in sys.args` would give you most flexibility.

~~~
sasa_buklijas
I agree but I wanted to avoid this:

... writing 34 IP addresses as CLI parameter, that is around 373 letters, is
not a nice solution.

~~~
jcoffland
A bash script where you call the CLI program once for each IP would be a very
simple solution.

    
    
        ./script 127.0.0.1 5
        ./script 127.0.0.2 5
        . . .
        ./script 127.0.1.254 10

~~~
sasa_buklijas
I was on windows. But OK, there is batch script also.

I had to call program will all IP addresses, one by one would just not work.

Your idea is fine, but it was just acceptable in my use case.

------
vesak
It's sad that the perfect template for all non-binary data, configuration and
otherwise, has been around for quite some time: S-expressions.

Nobody but lispers use them. Why?

~~~
xaedes
How would the example from the post look with s-expressions?

~~~
vesak

        (test_list (test_1 test_2 test_3))
    

For the JSON configuration file, I suppose?

------
jimnotgym
Is that really simpler than writing the list out in a file called config.py
like

    
    
      ips=['192.x....',
    
      '192.y',
    
      '192.z']
    

Then in your main

    
    
      from config import ips
    

Or does distributing as an .exe preclude this?

Edit: formatting

~~~
sasa_buklijas
You are correct, that is the simplest way.

What did I not disclose in the article is that I also wanted somehow to name
data.

I wanted this because I also know that in the configuration file, expect IP
addressed I will also need to store sleep duration.

I just find it more articulated if I also have a name for data.

Now I use [https://github.com/pasztorpisti/json-
cfg](https://github.com/pasztorpisti/json-cfg) for storing configuration as
JSON. json-cfg has the ability to add comments to JSON file.

~~~
jimnotgym
Could you not do exactly the same but with a python dictionary rather than a
list?

~~~
sasa_buklijas
I was distributing my Python code as EXE, so use of Python code as
configuration was not possible.

Altho, I think that Python code as the configuration is a good solution if you
are executing source code, and only developer (not average user who does not
know what Notepad is) will edit it.

~~~
UncleEntity
I'm confused...

You're building a custom configuration file format for people who don't know
how to edit configuration files?

And, as they say, "xml is like violence..."

~~~
sasa_buklijas
I had to distribute my Python program as EXE because Windows PC where the
program needs to be executed did not have Python installed.

That is why using Python as configuration file was not possible.

Hope that I have explained it well.

------
yeukhon
YAML - I am not a big fan these days. Perhaps I am have OCD on format, but
with YAML ordering is “up to the user”. At least INI has a somewhat
“header/section” so it looks more organized. The “yes Yes True TRUE true 1 =>
True (python)” is flexible but can be seen as negative if again you are like
me OCD. You can enforce style guideline in your dev team, but for end-user,
probably worth reconsidering your strategy.

The reasons I’d consider JSON for configuration are (1) when the configuration
is really simple and short, and (2) I don’t want an extra dependency. You
don’t want to handcraft for a larger data structure, and context switch
between your terminal and JSON validator.

I like INI-style configuration file. But ConfigParser’s API is horrible, and
everyone seems to like tweak and invent their own “INI” format.

Instead, for those really need a good configuration file, I recommend TOML
[1].

For data file, either YAML or JSON are fine. But each comes with gotcha.
Trailing comma in JSON is invalid (which is probably #1 “wtf what’s wrong with
my json”). For YAML you need to be very careful with “do I want an int or do I
want a string.”

[1]: [https://github.com/toml-lang/toml](https://github.com/toml-lang/toml)

~~~
Kamshak
How does TOML solve the “do I want an int or do I want a string." part? I only
know YAML and this caught me a few times, would love to know how toml does it.

~~~
yeukhon
None of them do, sorry if I was being too casual. The reason I named YAML
because in YAML you can just write

    
    
        name: bob
    

where bob is assumed to be a string by the YAML parser. This is bad if you use
Ansible because "bob" could be the name of a variable.

    
    
        bob: "I am bob"
        name: bob
        # at runtime Ansible sees 'name: "I am bob"'
    

But both JSON and TOML need the users to be more explicit. So while users
still need to be mindful of "1" vs 1, JSON and TOML don't assume as much as
YAML does.

Let me show you in Python.

    
    
        import toml
        import yaml
    
        s1 = "name: bob"
       
        yaml.load(s1)
        --> {'name': 'bob'}
    
        s2 = "name = bob"
        toml.loads(s2)
    
        Traceback (most recent call last):
         ....
        File ".../python2.7/site-packages/toml.py", line 664, in _load_value
        v = int(v)
        ValueError: invalid literal for int() with base 10: 'bob'
    
        s3 = "name = bob eve"
        toml.loads(s3)
        Traceback (most recent call last):
        raise TomlDecodeError("This float doesn't have a leading digit")
    

See how a space and without space yield a different exception different? Not
sure if it's an implementation problem, or the spec says so though. But the
point is TOML doesn't assume "bob" without a quote is a string, which is a
good thing.

------
_paulc
Actually this is pretty simple if you use ConfigParser properly:

    
    
      # test.ini
      [host1]
      ip = 1.2.3.4
      [host2]
      ip = 5.6.7.8
      [host3]
      ip = 9.10.11.12
    
      # config.py
      import configparser
    
      c = configparser.ConfigParser()
      c.read("test.ini")
      ips = [ c[host]['ip'] for host in c.sections() ]

------
bicubicmess
I prefer to state it this way: "Avoid speculative complexity".

~~~
twic
Or "do the simplest thing which could possibly work":

[https://ronjeffries.com/xprog/articles/practices/pracsimples...](https://ronjeffries.com/xprog/articles/practices/pracsimplest/)

------
scarface74
I'm really a fan of JSON config files for simple configuration. In a strongly
typed language you can parse it into a "Settings" object with one line of code
and work with the object.

Creating the settings object at least for C# is just a matter of pasting it
into a website (after removing all of the sensitive bits of course)

------
majewsky
Not directly related to the article, but can someone explain why people from
Eastern Europe and Russia tend to drop most of the articles (like "a", "an",
"the") when speaking/writing English?

~~~
grzm
Likely because Russian doesn’t have articles.

> _”There are no definite or indefinite articles (such as the, a, an in
> English) in the Russian language. The sense of a noun is determined from the
> context in which it appears.”_

[https://en.wikipedia.org/wiki/Russian_grammar](https://en.wikipedia.org/wiki/Russian_grammar)

~~~
sasa_buklijas
Correct, I (author) am from Croatia, and in the Croatian language, there are
no definite or indefinite articles.

I have noticed that I do not even see that I drop most of the articles. Only
when I use some grammatical spell checker than 90% of errors are missing
articles and that I notice it.

