Hacker News new | past | comments | ask | show | jobs | submit | Teleoflexuous's comments login

That's pretty much current state of knowledge.

Terms you want to check for more detailed info are 'liquid intelligence' and 'crystalized intelligence', but you basically nailed it.


I've seen 'fluid' as well as 'liquid' intelligence, but these are the terms the scientific community seems to use.

My use case is research papers. That means very clear text, combined with graphs of varying form and quality and finally occasional formulas.

Two approaches I had most, but not full, success with are: 1) converting to image with pdf2image, then reading with pytesseract 2) throwing whole pdfs into pypdf 3) experimental multimodal models

You can get more if you make content more predictable (if you know this part is going to be pure text just put it in pypdf, if you know this is going to be a math formula explain the field to the model and have it read it back for high accessibility needs audience) the better it will go, but it continues to be a nightmare and a bottleneck.


Depending on how much structure you want to extract before passing the pdf contents to the next step in your pipeline, this paper[1] might be helpful in surfacing more options. It's a review/benchmark of numerous tools applied to the information extraction of academic documents. I haven't been through to evaluate the solutions they examined, but it's how I discovered GROBID and IMO lays out the strengths of each approach clearly.

[1] https://arxiv.org/pdf/2303.09957


I have great news I wish someone delivered to me when I was in your shoes - try "GROBID". It parses papers into objects with abstract/body/figures! It will help you out a great deal. It is designed for papers and can extract the text almost flawlessly, but also give information on graphs for separate processing. I have several years experience with academic text processing (including presentations) working with an Academic Publisher if I could be helpful to anything?


I have no idea how did I miss them last time I was looking around, unless they grew significantly over last half a year or so. I'll check it out when I get back to this project, thanks.

I wish I was hiring, if that's what you're asking ;) Otherwise, if you have any ideas for processing formulas (even just for reading them out, but any extra steps towards expressing what they mean - ' 'sum divided by count' is 'mean'/'average' value ' being the most simple example I can think of) I'd love to hear them. Novel ideas in technical papers are often expressed with formulas which aren't that complicated conceptually, but are critical to understanding the whole paper and that was another piece I was having very mixed results with.


No worries. Sure, as to Formulas... I suspect many of them are LaTeX. If it is possible to parse that, it could help? At sufficient picture quality, vision models can accurately parse images of formulas to photos.

Neither will probably help you with a "readable" formula system because in my experience the readers that do this for LaTeX or normal formula text have flaws any way (it's also slightly cultural and dependent on field of study). Maybe the best bet is a prompt to a vision model with "read this formula out loud in a digestible, understandable concise way".. though this may have issues with the recall accuracy.


Check out appjsonify for research papers


Copying comment from another thread of the same study (https://news.ycombinator.com/item?id=40933910)

For the pure fun of breaking the narrative I found original article, it's here: https://bmjpublichealth.bmj.com/content/2/1/e001000

Time of day (or time after waking up per subject) when tests were administered has not been controlled. Cognitive abilities are mediated by wakefulness (not to mention, related for most people, digestive processes) cycle.

If '"Night owls" smarter than morning people' sounds more plausible than 'time since waking up and last meal predictive of cognitive performance' it's time to get one's identity checked. And I can't imagine 'journalists' from thrash like Sky (Guardian this time) not knowing that, which brings me to the final point: what is this link doing here?


> what is this link doing here?

People fall for the clickbait because it tells them something they want to believe. Motivated reasoning is powerful.

(And for those who don't want to believe it, they may read the article anyway because they want to debunk the claim. It's perfect clickbait, targeting potentially 100% of the audience.)


For the pure fun of breaking the narrative I found original article, it's here: https://bmjpublichealth.bmj.com/content/2/1/e001000

Time of day (or time after waking up per subject) when tests were administered has not been controlled. Cognitive abilities are mediated by wakefulness (not to mention, related for most people, digestive processes) cycle.

If '"Night owls" smarter than morning people' sounds more plausible than 'time since waking up and last meal predictive of cognitive performance' it's time to get one's identity checked. And I can't imagine 'journalists' from thrash like Sky not knowing that, which brings me to the final point: what is this link doing here?


And duplicated, two submissions got similar ~20ish upvotes about the same subject...


https://en.wikipedia.org/wiki/AI_effect We got it named already, it just needs to be properly propagated until there's no value left in calling things 'AI'.


Is there/do you plan on creating in foreseeable future post mortem of the tracking app beyond the note on your reddit?


Very sad to see this company greedy move. From an excellent unique B2C app which could have got an excellent future, to a B2B API. Worst thing is that they ignored the loyal users and they didn’t say anything about the future of the app or cared about the people who already paid


I wouldn't blame anyone for deciding that the point of a company is to maximize shareholder value, especially here, but that's not what I mean at all.

There's a lot of similar apps, people seem to use and like all of them, I'm just curious how it looked from the inside of this one.


Author mentions, but doesn't focus on, 'work being too challenging/not challenging enough'. I wrote a fair bit about it here (with a slightly different focus as name suggests, but I go over original research first) https://incentiveassemblage.substack.com/p/why-is-nobody-ser.... I'm not sure why 'challenge level' is less focused on compared to lack of interruptions - both seem about equally demanding to environment including manages and take similar amount of work to adjust.

Either way, to save you a click, Csikszenthmihalyi research wasn't mainly about cognitive load, because we already had a fair bit of research on cognitive load. It seems insufficient (although I do have my reservations), but addition of complexity of the task and w/e additional issues are happening is pretty solid predictor* of performance. Challenge/skill 'graph' presented can be reinterpreted with challenge/skill on X axis and a parallel flat line above it. Even better, and empirically supported, graph can be seen in first image in the post I linked, but it is a bit much to paint with words.

Flow research is cool, but there are more simple and actionable tools.

*Observant reader may notice that this is because of lack of units, but we do have physiological indicators if one desires to monitor them.


you bring up a good point. when work is very challenging then i get exhausted and need to take a break. however, i don't see that as a threat to the flow state because i see no point in trying to keep the flow at that point. i need a break anyways. so i don't see it as an interruption but more like having reached my limits. i have been wracking my brain over this piece of code and i don't understand whats wrong. then it's time to take a step back, take a break, do something else and look at the problem again with a fresh mind tomorrow.


For LLM to lie it would need to know the truth. That's an incredible level of anthropomorphization.


Whisper doesn't, but WhisperX <https://github.com/m-bain/whisperX/> does. I am using it right now and it's perfectly serviceable.

For reference, I'm transcribing research-related podcasts, meaning speech doesn't overlap a lot, which would be a problem for WhisperX from what I understand. There's also a lot of accents, which are straining on Whisper (though it's also doing well), but surely help WhisperX. It did have issues with figuring out the number of speakers on it's own, but that wasn't a problem for my use case.


WhisperX does diarization, but I don’t see any mention of it fulfilling my ask which makes me think I didn’t communicate it well.

Here’s an example for clarity:

1. AI is trained on the voice of a podcast host. As a side effect it now (presumably) has all the information it needs to replicate the voice

2. All the past podcasts can be processed with the AI comparing the detected voice against the known voice which leads to highly-accurate labelling of that person

3. Probably a nice side bonus: if two people with different registers are speaking over each other the AI could separate them out. “That’s clearly person A and the other one is clearly person C”


You can check out PicoVoice Eagle (paid product): https://picovoice.ai/docs/eagle/

You pass N number of PCM frames through their trainer and once you reach a certain percentage you can extract an embedding you can save.

Then you can identify audio against the set of identified speakers and it will return percentage matches for each.


With all seriousness, if we give up on solving the problem we would like to see, and try to solve what decision makers see as their problem, providing feed management tools (also?) to managers probably would be an improvement.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: