Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: utt – Universal Text Transformer (github.com/queer)
127 points by notamy on March 7, 2022 | hide | past | favorite | 35 comments



FWIW nu shell can be used for a lot of the same use-cases:

    $ echo "[1, 2, 3]" | nu -c 'cat | from json | to yaml'
    ---
    - 1
    - 2
    - 3
    
    $ echo '{"key": [1, 2, 3]}' | nu -c 'cat | from json | to yaml'
    ---
    key:
      - 1
      - 2
      - 3


Cool! Glad I'm not the only one who saw a need for this stuff :D


Daniel, we would like to be able to award superpoints.


Good idea, but this is a painful limitation:

> For example, utt does not process data in a streaming manner, but rather loads the entire dataset into memory before processing


It is! I’m not very happy with it and would love to change it — I’m just fortunate to have a primary workstation with an absurd amount of RAM.


Interesting project and seems very useful. Reminds me of Pandoc, maybe you could make a diagram similar to how [1] Pandoc shows conversions possibilities.

[1] https://pandoc.org/diagram.svgz?v=20220210130556


Eagerly awaiting the arrival of the “Big Universal Text Transformer”.


As a backend dev, I approve


I like those. I cannot lie.


The spirit of the tool is great!

Not sure how long it takes today to start up and load classes on a JVM. The idea of using a java program regularly in the shell immediately makes me worry about slowness.

Also, the eventual goal of supporting more formats means slower startup time as the feature set grows.


Reminds me of the Haskell project Pandoc ... which is also aimed at this use case.


To me more like generalized jq. Didn't know pandoc supported these formats.


I would recommend highlighting what are the supported input and output formats or at least highlights of what is supported. Another key question I have is why use this tool instead of jq?


Cool project! Something to consider: "Transformer"[1] is already used to refer to a popular element of state of the art deep neural networks, especially ones that are used on natural language, i.e. text. That might make this name a little confusing for people who have a foot in that world, especially since "Universal" and "Text" are also words thrown around in similar contexts.

[1]: https://en.wikipedia.org/wiki/Transformer_(machine_learning_...


Transform is also used in data mangling to mean exactly what this tool does.

Transformation is also used in enterprise IT to mean modernising.

Transformer is also a mathematics term, a kids toy and many other things.

In short, ML doesn’t have a monopoly on it.


Well, the really ironic part of it is that the machine learning models using transformers were literally conceived for converting between languages. So this would be quite the valid use case. Unfortunately, it makes the name more confusing.


Deep learning also messed up the definition of tensor, and so they are the ones who should be careful with picking names.

(That you can represent a tensor by a multidimensional array does not mean that a tensor is a multidimensional array).

https://en.wikipedia.org/wiki/Tensor


It doesn’t look that messed up to me. The vector word is used both as an element of a vector space and the coordinates. Tensor was so far mostly used only for the multilinear relationship, but its numerical description is a natural extension of the vector word.


Making one oversimplification means you need to be very particular? The point stands that some people that look at this tool will likely expect something very different than what it is. If this tool gets enough prominence, it won't be a problem. If it doesn't, it will be a misleadingly named software stack. Worse things have definitely happened.


The term "transformer" is used throughout mathematics and computing. Machine learning hardly has the monopoly on it. Or should, for example, Haskell rename their monad transformers library?


Good to know, thanks! I was originally considering a name more like "universal text finagler" but then I realised that there's already a well-known "UTF" (:


uft: Universal Finagler of Text, Universal Format Translator? unt: unt is not a transformer?


UTF itself is already short for "Unicode Transformation Format" and certainly predates any usage of the middle word by ML projects.


Almost every Show HN includes detractors complaining about namespace collisions or like parent, general name critiques. This is fine, I just wish it'd be towards the tail end of the thread since as a rule of thumb, such discussions are intellectually low-value.


It would be nice, albeit a bit tricky to implement, for downvotes to be optionally augmented with a choice from capped list of reasons. Among them, "it would be better if..." and "i have a different opinion", in addition to "this is badthink" and "this is offensive to me".


To take this idea one step further, each of those options could be attributed to a colour value. Then, instead of "greying out" the resulting comment would be an amalgam of the received votes, giving an indication of exactly what it was in particular that caused the comment to be downvoted. The stronger hues would then indicate a degree of consensus among the audience.

Of course, the same should be applied the other way, so when upvoting, one could also choose to assign a value for "this is good but could be improved", "I have a different opinion, but the above comment is worthy of consideration", "this is simply a correct take" and "this comment is pleasing to me".

And for the sake of readability, maybe instead of the whole comment being coloured, a marker could be applied before or after unadulterated text so that it remained accessible during the course of this process!

The challenge would be in deciding which colors represent what sentiment, as these days even our visible light spectrum seems to be corrupted by political/moral/tribal prejudgements. So perhaps it would do more harm than good. It may make for an interesting experiment, all the same.


It's also a concept in haskell but I don't think anybody would claim a monopoly on the idea of transforming things


coreutils did initially. but there's also grep, awk and perl.

this could be called tiny, huge, text transformer: thtt

tiny because functionality, and huge for java


I’m thinking it should be called Cockroach something something


Thanks! Also see rq: https://github.com/dflemstr/rq


What license is this code under? I'm just curious if I might be able to use it as part of a work project in the future.


I added a licence -- MIT!


What's the purpose of the help message screenshot? Couldn't you use a code block, like in all the other examples?


I could and probably should, it was just really lazy at the time it was written >~<


Isn't it more effort to do the screenshot than a simple copy+paste?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: