Hacker News new | past | comments | ask | show | jobs | submit login
The Devil in Plain Text (arnebrasseur.net)
45 points by ctoth on May 13, 2013 | hide | past | favorite | 20 comments



Honestly this sort of thing is a much better fit for statically typed languages. In modern typed language like Haskell or Scala (and presumably also F#) it's very easy natural to build parsers that can move back and forth between string and text representations, and the APIs one uses are largely generic whether one is building XML, Javascript, SQL or something else. It's also natural to use "tagged" datatypes to statically enforce that user input is handled differently from internal strings (I understand perl has a "taint" module that does the same thing, but presumably it has to happen at runtime, and I'm not aware of similar functionality in Python or Ruby at all).

Try a statically typed language sometime - they're not all like Java. You might be surprised.


>>Try a statically typed language sometime - they're not all like Java. You might be surprised.

This statement shows the kind of damage Java has done to the larger programming community.

Let me tell you the reason why people shudder to use to Java for things like text processing. To do as simple a task- to simply read from a file, do minimal processing(like extracting fields) and write to another, you have to deal with tens of api calls often with long names. Its extremely difficult if not impossible to remember each and every api call and the order they need to called in. So the net result is every time you need to do something, no matter how simple. You simply go to the internet, get a known-to-work-well template and just copy-paste it into your code.

And this is just the beginning, a sufficiently large Java application can be totally incomprehensible to understand, modify or develop without an IDE. Far too many nested api calls, far too much XML. Bulk of Java code today in the wild is machine generated and machine modified(I mean through IDE's)

Java these days feels like assembly language programming. You have to play with far too many api's to do even the simplest of the tasks.

No wonder people flock to a dynamic language at every chance they get.


Ruby has a tainting mechanism, but I don't believe it's used very much.

http://www.ruby-doc.org/docs/ProgrammingRuby/html/taint.html


I've grown to hate the phrase "static typing". It's almost as meaningless as "strong typing". While checking types compile-time certainly has it's benefits, the issue presented here is almost entirely orthogonal to that. The problem here is that strings are handled essentially as untyped opaque blobs as far as the type system is concerned. If the strings were typed properly, then both static and dynamic type systems (if otherwise equivalent) would catch the errors.


The idea is that you wouldn't use the generic string type for everything. You'd use an SQL type for queries and a different type for user input and there would be methods to convert between the two. This ensures at compile time that all data is properly escaped because its a type error to mix a user input value with an SQL value.


I think a blog post by Peter Bex, "Structurally fixing injection bugs", is a better description the problems associated with strings: http://www.more-magic.net/posts/structurally-fixing-injectio...

Quoting it: "In other words, you're performing string surgery on the serialized representation of a tree structure. Just stop and think how insane that really sounds!"


That quote is great, made all the better by the reality that writing code is doing that surgery manually.


His example of how things should work is pretty much 1:1 how the webhelpers.html library works in python:

    >>> from webhelpers.html import literal
    >>> p1, p2 = literal("<p>"), literal("</p>")
    >>> foo = "foo&<bar>"
    >>> type(p1)
    <class 'webhelpers.html.builder.literal'>
    >>> type(foo)
    <type 'str'>
    >>> p1 + foo + p2
    literal(u'<p>foo&amp;&lt;bar&gt;</p>')
And yes, having this sort of thing implemented at the language level, in a generic way that can apply to HTML, SQL, and anything else would be wonderful.

While we're asking for ponies, I'd also like it if "costOfPieInDollars = distanceInMiles + angleInDegrees" would be an error (unless each part was explicitly cast to a compatible type, eg the base int)...


The Boost.Units library for C++ does this: http://www.boost.org/doc/libs/1_53_0/doc/html/boost_units/Qu...

Some examples from the documentation:

  quantity<length> L = 2.0*meters;
  quantity<energy> E = kilograms*pow<2>(L/seconds);

  quantity<plane_angle>    theta = 0.375*radians;
  quantity<dimensionless>  sin_theta = sin(theta);
  quantity<plane_angle>    thetap = asin(sin_theta);
I figured Haskell's type system could also do this sort of thing, and it can. I found Dimensional: https://code.google.com/p/dimensional/, but I'm not sure if it's in standard use in the Haskell community. Examples: https://code.google.com/p/dimensional/wiki/IspExample


I'm not sure I understand your last point. Do you desire for compilers to deduce the underlying type from the name of the variable or a type system that would let you create subtypes to give them additional contextual informations that the compiler would then use to enforce integrity?


Well, actually Kawa Scheme has a feature called quantities [1]. (+ 1cm 2m) equals to 201.0cm, (+ 1cm 2degr) throws an exception (degr has to be declared first, with (define-base-unit degr "Temperature")).

To tell the truth, I’ve never found any use for this feature.

[1] https://www.gnu.org/software/kawa/Quantities.html


I'd imagine that such type/units system would be most useful in interactive use, like in WolframAlpha where the system has quite good "understanding" of units and how they should be applied to formulas.


It would be very useful in production code as well - consider the case of the Mars Climate Orbiter: http://en.wikipedia.org/wiki/Mars_Climate_Orbiter#Cause_of_f...


> I'd also like it if "costOfPieInDollars = distanceInMiles + angleInDegrees" would be an error

Type tagging.


Why not just use the real thing, and code in Perl? Besides if you're doing anything "sane" someone probably already did it and uploaded it to CPAN. For example, if you're writing your very own XML parser you're probably doin it wrong, just select one from CPAN and be done with it. I've done innumerable jobs which appear complicated but boil down to "use" two (or more) weird apparently unrelated things from CPAN and set one equal to another, rolled up in some initialization and error handling code.


I have to say that HTML::TreeBuilder saved our bacon a couple of years ago. But as far as XML goes, Python's implementation of Expat is perfectly usable


I was expecting to see this...

         *                       *
            *                 *
           )       (\___/)     (
        * /(       \ (. .)     )\ *
          # )      c\   >'    ( #
           '         )-_/      '
         \\|,    ____| |__    ,|//
           \ )  (  `  ~   )  ( /
            #\ / /| . ' .) \ /#
            | \ / )   , / \ / |
             \,/ ;;,,;,;   \,/
              _,#;,;;,;,
             /,i;;;,,;#,;
            ((  %;;,;,;;,;
             ))  ;#;,;%;;,,
           _//    ;,;; ,#;,
          /_)     #,;  //
                 //    \|_
                 \|_    |#\
                  |#\    -"  
                   -"


Fancier string manipulation is still just string manipulation. The more correct way to handle it would be to use something like an xml-builder to construct your html, which could then automatically escape all of the given text-node data appropriately.


Don't want to be the prototypical middle brow dismissal but this sounds like a bad idea to me. Or it would need to be 100% implemented in the guts of the language. Why? Because strings need to be pickled, memcached, passed to other machines using other languages ding other work, etc.

There was a template language called ptl in python using things like htmltext and it is actually a crazy dog biting everything's and everyone touching it.


It sounds very much like the OP is looking for Yesod.

http://www.yesodweb.com/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: