
Can C++ become your new scripting language? - mariuz
http://www.nu42.com/2015/05/cpp-new-scripting-language.html
======
pierrec
As the author of a program that's both scripting-centric and performance-
centric [1], I've looked at this question in detail. I needed the fastest
possible embeddable scripting language and interpreter. When I say "fastest
possible", I mean for signal processing, so my application is different from
the article at hand, which is about shell scripting rather than application-
embedded scripting.

C / TCC was great, but TCC doesn't attach much importance to final speed
optimization. I had a working prototype but ended up scrubbing it for that
reason.

C++ / Clang / LLVM has the best end speed and optimizer, and as the article
points out, the language is looking better and better. However, as a library
it's pretty massive, difficult to embed, and compilation would probably be too
slow for a REPL/JIT type of situation, though I haven't tested this with a
working prototype.

Lua / LuaJIT is what the project is currently using. Since this is HN, I guess
I don't need to say anything about how fast it is. However, my application
(and many others, I suspect) would get a large performance improvement (I
estimate 3x) if the compiler was capable of working with float32 operations
and of optimizing them into vectorized SIMD instructions. This is why I'm
currently looking at the next possibility:

Javascript. With the recent introduction of different float widths and SIMD in
major Javascript JIT compilers, this option is starting to be the fastest
possible (in my case). I'm planning to make a prototype to verify this, and
I'm not too keen on joining the JS bandwagon, but if LuaJIT development
continues to be basically halted, I'll have to get with the times. What I need
is at the bottom of an epic TODO list for LuaJIT [2] that hasn't really moved
for years. I don't know who's in a position to make a move on those LuaJIT
open sponsorships, I really wish they would do it! I for one don't really feel
up to the task.

[1]: [http://osar.fr/protoplug](http://osar.fr/protoplug)

[2]: [http://wiki.luajit.org/Open-Sponsorships](http://wiki.luajit.org/Open-
Sponsorships)

~~~
seabee
Since you're already using Lua, maybe you'll have an easier time switching to
Terra. It supports SIMD, and you only need to deal with embedding LLVM.

[http://terralang.org](http://terralang.org)

~~~
pierrec
Interesting, I looked at Terra before but hadn't noticed the SIMD. The fact
that it only supports doubles (instead of float32 which Javascript now
supports and vectorizes) makes this a bit less awesome though. SIMD becomes
more powerful when you can use smaller data types, which is why JS engines are
proud of the float32 + SIMD combination that they now have.

And ideally, you shouldn't need to always specify vector operations, the
compiler should optimize your loops (for example) into SIMD instructions. But
as far as I know, very few compilers are capable of doing this now. I think
some C/C++ and Java compilers are among the only ones, though the example
given here suggests that Spidermonkey should also do it to some extent:

[https://bugzilla.mozilla.org/show_bug.cgi?id=894105#c2](https://bugzilla.mozilla.org/show_bug.cgi?id=894105#c2)

------
jnotarstefano
It's interesting to note that High Energy Physicists all around the world will
answer "yes", in fact they even created a C++ interpreter:
[https://root.cern.ch/drupal/content/cint](https://root.cern.ch/drupal/content/cint).

~~~
cnvogel
Oh, the memories. Haven't used CINT (or rather, root) for quite some time,
though.

But if I remember correctly, then CINT wasn't actually developed by the root-
team, but rather already existed when work on root started.

~~~
cozzyd
Ironically written by someone with a last name of Goto; the jokes just write
themselves.

------
Veedrac
Considering how contrived the implementation of this is, there's no wonder it
looks terrible in two languages.

For example, this Python is both better behaved and much, much simpler:

    
    
        import sys
    
        for filename in sys.argv[1:]:
            try:
                with open(filename, "rb") as file:
                    word_count = sum(len(line.split()) for line in file)
            except IOError as e:
                print(e)
            else:
                print("{}: {}".format(filename, word_count))
    

I'd suggest you think up an example that justifies its implementation before
making such a comparison. If the aim is to make both implementations do the
same dance, not just to make them give the same results, even Haskell won't
look much different to C++.

~~~
lqdc13
Yours doesn't hash unique words though

~~~
Veedrac
The original implementation's bucketing was completely irrelevant to the goal
- it still ended up counting the number of words, duplicates included.

~~~
lqdc13
They're not counting duplicates though... Both perl and C++ one use a hash
table.

~~~
Veedrac
They use a <word → word frequency> mapping, and then sum the values. This
gives the total number of words. The hash table is totally pointless.

~~~
draegtun
The author did have a point he wanted to put across with the hash table - _"
Obviously, a frequency distribution was not necessary for just the counts, but
I did want to highlight basic autovivification"_

ref:
[http://www.reddit.com/r/cpp/comments/369lcn/can_c_become_you...](http://www.reddit.com/r/cpp/comments/369lcn/can_c_become_your_new_scripting_language/crcklhr)

~~~
Veedrac
That doesn't feel like a good reason. The author could easily have come up
with a better task (eg. count the number of words with more than one
occurrence). This allows you to approach the problem in the optimal way in
each language.

------
lqdc13
Basically this whole thing is crazy.

First, this can be done with one line in Python.

Second, it's almost a page of C++.

Third: if done in a 10 lines in python it is around 30% faster than the C++
version and 2x faster than the perl version.

Here is my reference Python version that also counts the words in the same
non-code golf kind of way:

    
    
        import sys
        from collections import defaultdict
    
        def wc(path):
            d = defaultdict(int)
            with open(path) as f:
                for line in f:
                    for w in line.split():
                        d[w] += 1
            return sum(d.values())
    
        if __name__=='__main__':
            path = sys.argv[1]
            print(path, ':', wc(path))
    
    

EDIT: They are counting new lines correctly.

~~~
Veedrac
Some quick points:

* You need to loop over sys.argv[1:].

* You could use binary mode to avoid decoding (which the C++ avoids).

* You could use collections.Counter() and its `update` method, which should be both cleaner and faster on Python 3:
    
    
        d = Counter()
        with open(path) as f:
            for line in f:
                d.update(line.split())

~~~
lqdc13
Counter is a lot slower though because it uses a heap underneath.

~~~
Veedrac
It doesn't - it directly inherits dict.

I'm wrong, though; the optimizations Counter applies only make it faster when
each `update` takes an iterable of a large number of elements. This is
unlikely to happen in our case.

You can actually bypass the Counter wrapper and use its accelerator directly:

    
    
        from _collections import _count_elements
    
        def wc(path):
            d = {}
            with open(path, "rb") as f:
                for line in f:
                    _count_elements(d, line.split())
            return sum(d.values())
    

However, this uses implementation details and is as such bad code.

~~~
lqdc13
Yup, I just checked. They're only using heap for most common.

Nevertheless, defaultdict == win for counting things like if we actually
wanted unique words in this case.

------
lmm
No, because compilation takes longer than your typical script. And there's no
REPL either.

C++ is improving, which is to be applauded. But all of these examples seem to
be "look, C++ used to be a lot worse than Python, now it's only a little bit
worse than Python". Show me the USP, the compelling use case where only C++
will do.

~~~
sigzero
A REPL is not necessary.

~~~
poooogles
Speak for yourself. Shitty developer here, I love my REPL.

~~~
wernercd
Speak for yourself. Low-Mid range developer here and I've never seen the need
for REPL.

~~~
jaredsohn
>Speak for yourself

I think that was the case here.

------
SCHiM
I sometimes use C++ as a scripting language in MVC++. I'll gladly take the
compilation time if it means I get to use the debugger when stepping through
my code.

~~~
lqdc13
I had to use debugger on regular basis in Java and C++, but never in Python,
because Python programs end up being much simpler.

Also, you can simulate large parts of the program in IPython and not only find
what the problem/bug is, but solve it right there and copy the solution back
to the original program.

------
ryanmk
I use lua for most scripting. A big reason for this is that the entire lua
environment is in lua.exe. This allows for me to distribute scripts easily
without having to include a C++ installer with them.

~~~
edem
Lua is an underrated gem in my opinion if it comes to scripting.

------
okasaki
One thing I don't get is why do people prefer functions like accumulate? Why
not:

    
    
        int sum = 0;
        for(auto&& wc : word_count) sum += wc.second
        return sum;
    

It's shorter, more general, more familiar, possibly has better
performance[[citation needed]], and gives better errors if you mess something
up.

~~~
adrusi
Its more composable, and because its less general, it more clearly articulates
intent and iis easier to scan for bugs.

Or at least it would in an ecosystem where functional programming is embraced,
but I don't know how practical it is in practice. Its not composable if you
can't expect any other given piece of code to be written in a way conducive to
composing with accumulate.

------
jbergens
For simple things on the command line it is usually easiest to use a language
you're comfortable with if there is also good libraries availible. If you
write C++ all day you can use C++ for file mungling if you find the libraries
you need (for example html-parsing or xml-parsing).

I used to write scripts in Ruby since I liked the syntax, it was very easy,
nowdays I have tried javascript since I use it anyway in web projects and
Node/NPM has a lot of modules ready. One thing I like with scripting languages
is that I usually need to tweak the script a bit over time and that is very
easy to do with scripting languages, even on different computers which may or
may not have a compiler.

------
fibo
Also Perl has lambdas, and for some tasks like string manipulation is the best
option out there. Sometimes you think you need to learn another language, but,
sometimes the smarter choice is to improve the knowledge of the tools you
already use.

------
t44
Considering C++ as scripting language makes companies have to pay more to high
quality programmers and wait longer time for simple scripting. You may want to
consider Rust and Nim as C's superiors rather than C++. Both Rust and Nim are
as speedy as C++. Rather than C++ with boost, they also have leaner and
extensible syntax fpr specified purpose called DSL. For example, nim can
prevent SQL injection from compile time.

------
EGreg
No, because I work on public internet facing apps, I want evented programming
to be easy, and have async libraries widely available.

~~~
totony
PHP? :|

~~~
EGreg
PHP doesn't have evented either

~~~
aikah
PHP has something called reactphp. However, since most PHP libs and core libs
are blocking on the i/o level, it's not that useful.

------
pjmlp
While C++14 is quite an improvement over C++98 code bases that are still out
there, maybe a ML like language is a better option.

------
charliefg
Could it be my new scripting language? It sounds pretty avant-garde... maybe a
little too much for me. My mindset is just not there. As others have mentioned
- I also like the interactive environment that is available with traditional
scripting languages.

Ah yes, auto! I've been using that a lot since I saw Herb Sutter's talks.

------
edem
So the answer I derived from the comments is "Yes but it will do you no good."

------
jpatokal
Betteridge's law seems apt here: "Any headline that ends in a question mark
can be answered by the word no."

