Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Purl – A Simple Tool for Text Processing (github.com/catatsuy)
71 points by catatsuy 31 days ago | hide | past | favorite | 29 comments
Hello HN community,

I'm excited to share a new command-line tool I developed called purl, inspired by the simplicity of Perl one-liners for efficient text processing. Purl features include Perl-like regex that simplifies text manipulation, it's cross-platform so works equally well on macOS, Linux, etc., and it's quick and easy to install. The tool also supports simple commands such as -replace, -filter, and -exclude, and offers optional color output to enhance readability.

Purl is a practical alternative to traditional tools like sed and grep, designed to address some of their common limitations.

For more information and to try purl yourself, visit: https://github.com/catatsuy/purl

I appreciate any feedback!




> I appreciate any feedback!

Please don't use a name which is confusingly similar to an existing product (i.e. infringes on someone else's trademark). Open source communities have enough to deal with already without having to allocate scarce volunteer hours to working out legal issues.


I was thinking of “Persistent uniform resource locator” but then realised I think you are referring to the python library https://github.com/codeinthehole/purl (I get your argument although I quite like this new project. It’s closer to “Perl” in my mind. But then I am nearly 50..)


I'm referring to Perl, which is another product in the same market (unixy command line applications) and has the same pronunciation.

https://www.uspto.gov/trademarks/search/likelihood-confusion

> Trademarks don’t have to be identical to be confusingly similar. Instead, they could just be similar in sound, appearance, or meaning, or could create a similar commercial impression. Here are examples of trademarks that were found to be confusingly similar.

> These trademarks are confusingly similar because they could be pronounced the same way, even though they’re spelled differently.

> Your mark -- T. Markey

> Conflicting mark -- Tee Marquee


From your phrasing here, it sounds like you think the OP is unaware of Perl, which is odd, considering they mention it in their very first sentence.


To clarify: I had read that passage, and I was definitely aware that the name could be interpreted as a reference.

Trademark conflicts in open store sometimes start that way. For example, “CouchBase” was a reference to “CouchDB” — it was originally a complementary effort, but later became a competitor. By the time the CouchDB folks realized that they should’ve been enforcing their trademark, it was too late.


How will you even talk about this?

"can you install purl on the server so we can filter these logs?"

"do you mean the one with an e or the one with the u?"

Also, based on the name, I was half guessing this was going to be a slot in replacement to curl.


Another name conflict is with https://github.com/package-url/purl-spec - which is used for software identification (e.g., see the https://www.cisa.gov/sites/default/files/2023-10/Software-Id... report).


This was actually my first thought. This will confuse many people in the cybersec community.


Nah, that doesn't really work any more and it isn't really worth the cycles to try. Also, there are no legal issues with "Purl".


Whether or not there are "legal issues" with "Purl" being confusingly similar to "Perl" would be up the judgement of a court, should it come to that.

Mostly, when there are other open source projects whose names conflict with your trademark you ask nicely and people change the name. That's the path I'm hoping to guide the OP down — please just have some sympathy for your fellow open source developers!

Occasionally, trademark conflicts in open source become high stakes, painful affairs. For example, with the last few years, there was the "Commons Clause" situation (where among other issues "Apache License Version 2 with Commons Clause" infringed upon the "Apache" trademark owned by the ASF.)

This is unlikely to become a problem with a solo personal project unless it gains sustained traction. But I think it's valid feedback to offer, because if you have ambitions for your open source project you are eventually going to have to deal with trademarks.


I'm not a professional programmer, just someone who uses Linux as their daily driver.

Discoverability is the most important thing than other reason for me when selecting a tool.

Quoting from Purl's README:

    Simple Commands: Use straightforward options like -replace, -filter, and -exclude to manage your data.
    Edit Files Easily: The -overwrite option allows you to update files directly, making changes quick and simple.
When I use `find, xargs, cat, sed, awk, perl, like tools`, I often struggle to remember even the simplest options. What is the option for 'ignore case', 'in-place edit', or 'info'? -i could mean anything.

This is one of the reasons I switched my shell to fish. It offers more discoverability than bash and zsh Fish shell supports searching for a command's options name and description using <Tab><C-s> BY DEFAULT. I believe this approach is quite effective.

Thus, I prefer Purl like syntax more.


I understand what you mean (although I "grew up" on UNIX, don't find this a problem); however, it would be better if the long (i.e. readable) options were prefixed by double dash to follow usual UNIX convention.

Using single dash for single character options allows you to combine them, which is really useful (if you do remember these options of course), so `-exc` means `-e -x -c` etc.


I respect the run-time properties of Go. If you can build a better tool with better run-time properties, that is progress.

That said, I am not convinced that making a thing easy "even for beginners" is always practical. Beginners do need to ascend some learning curves.

I make heavy use of RegEx and "one-liners" (bash-grep-awk-sed, Perl). These are powerful tools. Yes, they have strengths and weaknesses but they are built with a successful philosophy. Yes, their notation is irksome to ardent Python users who deplore all things non-Python.

The success of the RegEx model is its terse power. Something more verbose and simultaneously too simple for complex use-cases will not replace RegEx. Something that can/should do the same thing on [ $OS for $OS in @OSlist ] is a thorny problem with $OS.

As for the complex use-cases, I do not expect "beginners" to enjoy RegEx when even many otherwise talented computer users have fits of scene-stealing negativity when they encounter a regex.

Others will rightly draw attention to the need for a different project name. Naming a thing effectively is important, don't underestimate this need if you truly aim for adoption.


Congratulations on the release! Thank you for making it open source.

For the old farts like me, would you consider adding in some comparisons to AWK?


Thank you for the congratulations! Currently, purl does not include features similar to AWK. We appreciate your suggestion and will consider it for future development.


I think GP was just asking for a comparison of how your tool is similar/different from awk.


Yeah and I think he responded saying its not similar.


The reasons outlined for not using Perl actually all apply to purl as well, except maybe "not just for one-liners".

But perl allows one-liners to emulate grep (filter and exclude) and sed (replace):

    perl -p -e 's/FROM/HELLO/' # replace
    perl -n -e 'print if /FROM/' # filter
    perl -n -e 'print if not /FROM/' # exclude


more examples (from https://softpanorama.org/Scripting/Perlorama/One-liners/tom_...)

    # add first and penultimate columns
    perl -lane 'print $F[0] + $F[-2]'

    # print just lines 15 to 17
    perl -ne 'print if 15 .. 17' *.pod

    # in-place edit of *.c files changing all foo to bar
    perl -p -i.bak -e 's/\bfoo\b/bar/g' *.c

    # command-line that prints the first 50 lines (cheaply)
    perl -pe 'exit if $. > 50' f1 f2 f3 ...

    # delete first 10 lines
    perl -i.old -ne 'print unless 1 .. 10' foo.txt

    # change all the isolated oldvar occurrences to newvar
    perl -i.old -pe 's{\boldvar\b}{newvar}g' *.[chy]

    # command-line that reverses the whole file by lines
    perl -e 'print reverse <>' file1 file2 file3 ....

    # find palindromes
    perl -lne 'print if $_ eq reverse' /usr/dict/words

    # command-line that reverse all the bytes in a file
    perl -0777e 'print scalar reverse <>' f1 f2 f3 ...

    # command-line that reverses the whole file by paragraphs
    perl -00 -e 'print reverse <>' file1 file2 file3 ....

    # increment all numbers found in these files
    perl i.tiny -pe 's/(\d+)/ 1 + $1 /ge' file1 file2 ....

    # command-line that shows each line with its characters backwards
    perl -nle 'print scalar reverse $_' file1 file2 file3 ....

    # delete all but lines between START and END
    perl -i.old -ne 'print unless /^START$/ .. /^END$/' foo.txt

    # binary edit (careful!)
    perl -i.bak -pe 's/Mozilla/Slopoke/g' /usr/local/bin/netscape

    # look for dup words
    perl -0777 -ne 'print "$.: doubled $_\n" while /\b(\w+)\b\s+\b\1\b/gi'

    # command-line that prints the last 50 lines (expensively)
    perl -e 'lines = <>; print @@lines[ $#lines .. $#lines-50' f1 f2 f3 ...


There are lots of different regex implementations. The readme mentions sed behaving differently on different platforms. The readme doesn't say what regex language this tool implements. There doesn't seem to be any tests. Thus I don't know what this tool actually does.


Thank you for your feedback. The current implementation of purl uses Go's regexp package for regex operations, which ensures consistent behavior across platforms. I acknowledge the README does not specify this yet, and I plan to update it to clarify the regex implementation used.


I often find myself wanting to do a search and replace across all files in a directory tree (e.g. git repo). With sed, sd, purl etc you do that by listing files with matches and piping into xargs. But that has lots of disadvantages: you're using regexes in two programs, which might differ, you have to know three different tools for one job, etc. I love unix pipelines but for this use case I can't help feeling that the replace tool should be able to do the directory walking.


Can it only work on piped input? I tried running in the form of `./purl main.go -filter func` and the result was:

> Failed to validate input: invalid replace expression format. Use "@search@replace@"

Do you plan to add the ability for it to act on files directly?


Thank you for trying out purl! Currently, the file name needs to be placed at the end of the command like this: purl -filter func main.go. This format helps purl understand which part of the command specifies the options and which part specifies the file.

I appreciate your feedback and understand that a more flexible command structure might be easier for users. We will consider making this change in future versions to accommodate different usage preferences.


Question: can it preserve original colors and also highlight the given word? Like when you do an "apt search whatever | grep another", and it is formatted in a nice way, but grep just eats the colors...


grep doesn't eat the colours, apt detects that it is not outputting to a TTY and suppresses the colours. Try `printf 'a \e[36mb\e[0m c\n' | grep a`, the colour is preserved just fine.

Some tools have an option to force coloured output regardless, e.g. GCC's `-fdiagnostics-color` or grep's own `--colour=always`, but apt doesn't seem to have anything like that.

In theory one could have a command in the style of nohup or stdbuf which sets up a PTY to trick the command into outputting colours. So one could run `fakepty apt search whatever | grep another` ...


Looks great


How so?


Yeah, awk, sed and the other gang of terribles in syntax must evaporate asap.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: