agadius's comments

agadius · 2024-10-12T14:53:48.000000Z

ChatGPT has a new canvas mode that allows for editing code. I found it very good. Less copy paste than before

agadius · 2024-02-03T21:22:46.000000Z

Twitter user @ghidraninja has published a video where a key extraction attack of a TPM protected Bitlocker encrypted Windows install on a Lenovo laptop is demoed. Link to tweet: https://twitter.com/ghidraninja/status/1753843667003986024?s...

agadius · 2024-02-03T00:30:15.000000Z

There’s also a yt video showing a demo that can be run after installing notcurses: https://www.youtube.com/watch?v=dcjkezf1ARY

Pretty hefty.

ilaksh · 2024-02-03T03:15:09.000000Z

Is it able to show actual videos or is that just to make the demo more interesting and inserted externally?

I would like to combine something like this with https://github.com/runvnc/tersenet or some of those ideas.

db48x · 2024-02-03T12:19:26.000000Z

It can show actual videos using the Sixel protocol. You already have terminals that support it too. Just run `xterm -ti 340`. In that terminal, run an application such as Gnuplot:

    gnuplot -e "set terminal sixelgd;set hidden3d;set view 62, 30, 1.1, 1.2;set samples 50, 50;set isosamples 51;set contour base;set cntrparam order 8;set cntrparam bspline;splot [-12:12.01] [-12:12.01] 2*sin(sqrt(x**2+y**2)) / sqrt(x**2+y**2)+x/53+y/37"

agadius · on July 20, 2023

Where’s the source code for the binary file called “unshackle” which is used in the solution? (This one: https://github.com/Fadi002/unshackle/raw/main/unshackle)

codys · on July 20, 2023

It appears to be a pyinstaller generated executable, presumably we'd expect it to contain https://github.com/Fadi002/unshackle/blob/main/src/unshackle..., but I haven't confirmed that.

agadius · on July 6, 2023

Love Joplin! I love it so much, my lazy ass actually donated. I spent quite some time searching for open source alternatives that don’t have an ulterior motive. Currently using nextcloud sync and it works. Sometimes the iOS app and the Linux desktop app are out of sync, but a sync fixes that. Would love to see mTLS implemented at some point!

agadius · on July 6, 2023

If you accept running Java, the Apache Tika is extremely good at parsing content (https://tika.apache.org/)

mcswell · on July 6, 2023

Tika can be used as a library in Python: https://pypi.org/project/tika/

ramraj07 · on July 7, 2023

As is customary for all of Apache, I have no clue what I’m looking at after trying to read through the links in that page for ten minutes. Like who is this tool for? When should I use this vs any other competing tools? No clue. I suppose it can read documents of any type and give it out as a dictionary? Why would I use this vs pandas?

mcswell · on July 7, 2023

I can't speak to the Apache documentation, but I once had the task of extracting plain text from many different document formats: Word, spreadsheets, PDFs, the EXIF information in JPEGs, and so on for a long list. I had written code with calls to extractor libraries for several of these formats, when I can across tika. Out when my if..then..elif..elif..elif.. code, to be replaced with a single (Python) call to tika.

I can't answer your question about pandas, though.

mbwgh · on July 7, 2023

I second this, there is absolutely no easily discoverable entry point to the documentation. In the end if you want to get a feeling of what this is you search for "tika tutorial" and get a rough idea via (in my case) some medium article I guess.

mcswell · on July 7, 2023

There's a book called "Tika in action" which I found useful.

convivialdingo · on July 6, 2023

I second this suggestion. I tested numerous Python tools to extract text - nothing matches Tika for general extraction of just about any data format.

However - if you can expect a certain format beforehand - then Python is better since you can extract higher-quality data (tables, lists) with the appropriate tool.

saeedesmaili · on July 6, 2023

Do you have any suggestions for Python libraries (other than what's mentioned in the post)?

convivialdingo · on July 6, 2023

I've had good luck with python-docx for reading word documents (typically specifications). Tables are supported - but it's not obvious where the table comes from in the document and I had to come up with a hack way to read image captions.

PDF has been hit or miss, but pypdf has improved in the last couple of years. Depending on the document you'll sometimes get random spaces or nospacesatall.

saeedesmaili · on July 6, 2023

I tried python-docx with a bunch of docx files (downloaded from Google Docs). It returns empty strings for hyperlinks and I couldn't manage to fix this. So if there is a sentence like "This is an important link to another doc or url." and the "link" is a hyperlink, python-docx returns "This is an important to another doc or url."

icegreentea2 · on July 6, 2023

Heh, I got a bit into hacking on python-docx last year (the original author seems to be focusing on other things than python-docx now) - I have a fork/branch where I tried to more properly implement external hyperlink functionality (https://github.com/icegreentea/python-docx/pull/7)

I realize now staring at this, that I might have broken API a little. You can't do "text = paragraph.text" anymore, but you can do "text = ''.join([run.text for run in paragraph.runs])" instead.

If you're curious at all why it breaks, it's because in the OOXML spec paragraphs are made up of a ordered list of runs or hyperlinks (and hyperlinks can then contain additional runs). The master branch just implements paragraphs as ordered list of runs (and ignores all hyperlinks).

saeedesmaili · on July 6, 2023

This sounds amazing! Thanks for sharing it, I will try it to see if I can replace it with the main python-docx. For my use case it suffices to have full text of each paragraph (even if it includes a hyperlink) and heading but also be able to have each of them separated when needed.

icegreentea2 · on July 6, 2023

Actually, I just realized that I had provided a 'one-off' hack to a similarish situation here: https://github.com/python-openxml/python-docx/issues/1123#is...

Replace the `qn("w:ins")` in the example with `qn("w:hyperlink")` and that should hopefully work?

convivialdingo · on July 6, 2023

Hey, that's fantastic. I'll definitely check that out.

jghn · on July 7, 2023

I found myself today trying to parse a TSV and substituting a few fields with a different value, then writing the new file out.

Something that perl would excel at, although I used Python. Because Perl isn't as maintainable as Python

I was intrigued by this comment. A JVM solution would also be viable in my tech stack. Would Tika be easier than line processing compiled regexes in Python? I tried looking at the Usage examples but it wasn't clear.

agadius · on June 25, 2023

Yeah, that website linked in the GitHub page really didn’t demonstrate the product. It showed the pricing model and that’s it (At least on mobile).

keepamovin · on June 25, 2023

We should definitely improve that. Maybe a video?