Hacker News new | past | comments | ask | show | jobs | submit | emilburzo's comments login

Romania is missing from the list of phone number countries on signup, not sure if on purpose or not.

Is any of the code public?

Or at least the tool(s) you use?

I have the same need but it's surprisingly difficult to get it right, at least with the `camelot` or `fitz` python packages.


No public code. This has been a long running project for me. Last I touched it- pre-LLM world- it had turned into a real Rube Goldberg machine. Hard to imagine anyone else putting up with it.

PDF to text (using either python or Java lib), which then is turned into a "header" structure with dates and balances via configuration driven regexes, and a "body" structure containing the transactions. The transactions themselves go through an EBNF parser to extract the date(s), narration, amount, and balance if reported. The narration text gets run against a custom merchant database for payee and categorization. It is a painful problem! The code is Clojure so there is not much of it, and there are high abstraction libraries like Instaparse that make it easy to use grammars as primitives. And the rube goldberg has yielded for me balance-validated data now for the last several years from half a dozen financial providers.

I have been incorporating local LLMs, running on an RTX 3090, into some other workflows I have, hope over the summer to see if those can help simplify some of the workflow.


> like convert PDF bank statements into CSV transaction files

I've tried this recently and it's surprisingly difficult. Any pro-tips?

Extracting pdf tables, while respecting the cell position, seems almost impossible in a way that works in all cases (think borderless tables, whitespace cells, etc)


It is remarkably difficult and continues to provide a good example of the limitations of LLM based systems.

In my case, I used perl, and exploited the fact for for a given bank, the statements are consistently formatted. Further, PDF OCR conversion responds consistently to the documents with the same formatting. With this combination, it is possible to extract the characters and numbers that are associated with transactions from the document, and then to take those extracted bundles of text and transform them into lines for a CSV file.

The caveat is that it works for only that bank, that "kind" of account (usually checking, credit card, or savings), and when using that specific document OCR tool. Within those constraints it is eminently reliable but utterly non-transferable to a general case.


If you use AWS config setup for the organization (aggregator), you'll get a athena-sql-queryable inventory of all your resources from all organization accounts.

So finding out which account owns a resource can be as simple as, roughly: select accountId where arn = "x"


You can also do this with steam pipe.

It might not scale well beyond tens of accounts though, depending in your query…


... how did I not know this existed.

That is exactly how we are setup, the amount of time I just spent going account by account looking for a specific resource.

Thank you! I have long wondered why it didn't exist, and apparently it did...


Be aware that AWS Config is not free. https://aws.amazon.com/config/pricing/


Yeah it's pretty nice feature wise but surprisingly expensive given all it really does is run API calls in a loop and export to S3


> To trigger this issue, the attacker must be on the local network [...]


> My EdgeRouter finally bit the dust

Do you still have it?

You might just have one with an on-board USB stick which is user-replaceable. Literally a USB stick in a USB type A port.

It seems that was the only part they cheaped out on, since it failed for multiple people I know that have models with the USB stick.

Source: erlite-3 wouldn't boot up, changed usb stick (with their downloadable OS image of course), good as new.

Bonus: quadrupled the available storage.


As one of the people who commented on that thread, it was really eye-opening to see the group dynamics involved between people who have experience in the domain vs those who don't.

It definitely made me look at online debates a lot differently, as previously I thought good points can come from anywhere (which can still happen), but it turns out experience in the domain is usually way more relevant.

I guess it's similar to that effect where if you see news about a topic you don't know, you tend to take/believe everything as-is, but if you happen to know the domain, you'll usually spot quite a few factual errors which tend to discredit most of the news.


Yes. Remember this fact.

My expertise is education. Everyone has opinions about that field. Many, many, many are just completely ignorant.

It's why I try not to weigh in too heavily on areas that I have no practical experience in.


From this[1] PR, it seems to be related to "sensitive information in synthetic URLs", as this[2] article was just introduced bundled with the advisory article.

So I assume the security incident is about those who:

> You have a synthetic monitor that contains sensitive information (like passwords or usernames) in a URL or script.

[1] https://github.com/newrelic/docs-website/pull/15307/files

[2] https://docs.newrelic.com/docs/synthetics/synthetic-monitori...


This is exactly the reason why I'm sticking with the CAT S6x series phones and willing to put up with mediocre performance/features, as far as smartphones go.

They've been the only ones that don't just turn off in really cold temperatures, even without babying them in warm pockets.

The general ruggedness is also pretty good to amazing, depending on how fragile your previous phones were. For example: it survived a ~10 meter (32ft) drop on rocks, whereas everyone was convinced it was done for.


Do you have one of the newer models with the thermal camera? I was very interested in seeing how well that works.


Yes, currently the S62 Pro. What would you be interested in knowing?

To be honest I first thought it would be a gimmick, but being able to see heat / rough temperature degrees has proven itself quite useful.

But you don't really realize it until you have it available. It's kind of like having another sense available to you.

Random examples of where it was useful:

- finding shitty chargers/electronics which were really hot while doing nothing -> wasted power

- spotting a water leak before it was visible (think cold spot in the middle of the ceiling)

- checking car tire alignment (one side hotter than the other)

- finding buried hot water pipes

- finding where cold air is leaking in during winter

- spotting damp areas

and probably others that I'm forgetting right now


Thank you for tkaing the time to repsond to me and giving me a good breakdown!


any SSD recommendations?

I've just been using regular hard drives for my RAIDs with really long lifespans, think 7-10 years -- and even then replaced just out of caution, they still work -- and I'm not sure if this is also achievable with SSDs or what to even look out for


I've been buying Samsung drives for a few years and had no problems. My general strategy is stagger the buys, in order to /hopefully/ get drives from different manufacturing batches however the reason I'm using a RAID setup is so I don't have to stress over durability. If one or two drives do tank, oh well. Buy new ones, clone contents from one of the parallel drives and go on with life.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: