
The Devil in Plain Text - ctoth
http://devblog.arnebrasseur.net/2013-04-plain-text
======
lmm
Honestly this sort of thing is a much better fit for statically typed
languages. In modern typed language like Haskell or Scala (and presumably also
F#) it's very easy natural to build parsers that can move back and forth
between string and text representations, and the APIs one uses are largely
generic whether one is building XML, Javascript, SQL or something else. It's
also natural to use "tagged" datatypes to statically enforce that user input
is handled differently from internal strings (I understand perl has a "taint"
module that does the same thing, but presumably it has to happen at runtime,
and I'm not aware of similar functionality in Python or Ruby at all).

Try a statically typed language sometime - they're not all like Java. You
might be surprised.

~~~
kamaal
>>Try a statically typed language sometime - they're not all like Java. You
might be surprised.

This statement shows the kind of damage Java has done to the larger
programming community.

Let me tell you the reason why people shudder to use to Java for things like
text processing. To do as simple a task- to simply read from a file, do
minimal processing(like extracting fields) and write to another, you have to
deal with tens of api calls often with long names. Its extremely difficult if
not impossible to remember each and every api call and the order they need to
called in. So the net result is every time you need to do something, no matter
how simple. You simply go to the internet, get a known-to-work-well template
and just copy-paste it into your code.

And this is just the beginning, a sufficiently large Java application can be
totally incomprehensible to understand, modify or develop without an IDE. Far
too many nested api calls, far too much XML. Bulk of Java code today in the
wild is machine generated and machine modified(I mean through IDE's)

Java these days feels like assembly language programming. You have to play
with far too many api's to do even the simplest of the tasks.

No wonder people flock to a dynamic language at every chance they get.

------
demetrius
I think a blog post by Peter Bex, "Structurally fixing injection bugs", is a
better description the problems associated with strings: [http://www.more-
magic.net/posts/structurally-fixing-injectio...](http://www.more-
magic.net/posts/structurally-fixing-injection-bugs.html)

Quoting it: "In other words, you're performing string surgery on the
serialized representation of a tree structure. Just stop and think how insane
that really sounds!"

~~~
tsewlliw
That quote is great, made all the better by the reality that writing code is
doing that surgery manually.

------
Shish2k
His example of how things should work is pretty much 1:1 how the
webhelpers.html library works in python:

    
    
        >>> from webhelpers.html import literal
        >>> p1, p2 = literal("<p>"), literal("</p>")
        >>> foo = "foo&<bar>"
        >>> type(p1)
        <class 'webhelpers.html.builder.literal'>
        >>> type(foo)
        <type 'str'>
        >>> p1 + foo + p2
        literal(u'<p>foo&amp;&lt;bar&gt;</p>')
    

And yes, having this sort of thing implemented at the language level, in a
generic way that can apply to HTML, SQL, and anything else would be wonderful.

While we're asking for ponies, I'd also like it if "costOfPieInDollars =
distanceInMiles + angleInDegrees" would be an error (unless each part was
explicitly cast to a compatible type, eg the base int)...

~~~
john_fushi
I'm not sure I understand your last point. Do you desire for compilers to
deduce the underlying type from the name of the variable or a type system that
would let you create subtypes to give them additional contextual informations
that the compiler would then use to enforce integrity?

~~~
demetrius
Well, actually Kawa Scheme has a feature called quantities [1]. (+ 1cm 2m)
equals to 201.0cm, (+ 1cm 2degr) throws an exception (degr has to be declared
first, with (define-base-unit degr "Temperature")).

To tell the truth, I’ve never found any use for this feature.

[1] <https://www.gnu.org/software/kawa/Quantities.html>

~~~
zokier
I'd imagine that such type/units system would be most useful in interactive
use, like in WolframAlpha where the system has quite good "understanding" of
units and how they should be applied to formulas.

~~~
scott_s
It would be very useful in production code as well - consider the case of the
Mars Climate Orbiter:
[http://en.wikipedia.org/wiki/Mars_Climate_Orbiter#Cause_of_f...](http://en.wikipedia.org/wiki/Mars_Climate_Orbiter#Cause_of_failure)

------
VLM
Why not just use the real thing, and code in Perl? Besides if you're doing
anything "sane" someone probably already did it and uploaded it to CPAN. For
example, if you're writing your very own XML parser you're probably doin it
wrong, just select one from CPAN and be done with it. I've done innumerable
jobs which appear complicated but boil down to "use" two (or more) weird
apparently unrelated things from CPAN and set one equal to another, rolled up
in some initialization and error handling code.

~~~
cafard
I have to say that HTML::TreeBuilder saved our bacon a couple of years ago.
But as far as XML goes, Python's implementation of Expat is perfectly usable

------
bostonpete
I was expecting to see this...

    
    
             *                       *
                *                 *
               )       (\___/)     (
            * /(       \ (. .)     )\ *
              # )      c\   >'    ( #
               '         )-_/      '
             \\|,    ____| |__    ,|//
               \ )  (  `  ~   )  ( /
                #\ / /| . ' .) \ /#
                | \ / )   , / \ / |
                 \,/ ;;,,;,;   \,/
                  _,#;,;;,;,
                 /,i;;;,,;#,;
                ((  %;;,;,;;,;
                 ))  ;#;,;%;;,,
               _//    ;,;; ,#;,
              /_)     #,;  //
                     //    \|_
                     \|_    |#\
                      |#\    -"  
                       -"

------
knome
Fancier string manipulation is still just string manipulation. The more
correct way to handle it would be to use something like an xml-builder to
construct your html, which could then automatically escape all of the given
text-node data appropriately.

------
gbog
Don't want to be the prototypical middle brow dismissal but this sounds like a
bad idea to me. Or it would need to be 100% implemented in the guts of the
language. Why? Because strings need to be pickled, memcached, passed to other
machines using other languages ding other work, etc.

There was a template language called ptl in python using things like htmltext
and it is actually a crazy dog biting everything's and everyone touching it.

------
andrewaylett
It sounds very much like the OP is looking for Yesod.

<http://www.yesodweb.com/>

