I'm actually part of the team developing the new PubMed. Very curious and interested to know what the hacker news community thinks and feels about their experience. https://pubmed.gov/labs
1) The timeline graph is very cool. However I thought clicking "download csv" would give me a CSV of all the search results, which I was very excited for! I was sad to see it only downloaded a CSV detailing the number of results per year, not the results themselves.
2) Search history (found on the "advanced" search page) is something I am usually against, but on a site like PubMed, it's very nice to have. Great job implementing that in a clear and usable way. I recommend linking to it from somewhere on the basic search page... something like "View your search history" which will also help them discover the advanced page once they land on it.
I really appreciate how fast the pages load. And despite the warning banner at the top, Javascript does not appear to be required for basic search, browse, & read functionality, which I very much appreciate.
I noticed that too. What features don't work without Javascript? I admit this suggestion is swimming against the tide, but maybe it could be a new thing for those warnings to start being more specific so people who routinely disable Javascript can make an informed choice about whether to enable it. You wouldn't want all your painstakingly written front end code to go to waste just because of people mistakenly assuming it's only for annoying animations, popups, and tracking.
Frequent PubMed user here. Congrats on the launch! I'm enjoying the new UI, very shallow learning curve even for someone with a long time on the old one. The updated MeSH term display is particularly nice.
This looks great: I like the search timeline, the ability to easily search for free full-text meta-analyses (a selection bias we should all be aware of), the MeSH term listing in a reasonably-sized font, and that there's schema.org/ImageObject metadata within the page, but there's no [Medical]ScholarlyArticle metadata?
I've worked with Google Scholar (:o) [1], Semantic Scholar (Allen Institute for AI) [2], Meta (Chan Zuckerberg Institute) [3], Zotero, Mendeley and a number of other tools for indexing and extracting metadata and graph relations from https://schema.org/ScholarlyArticle and MedicalScholarlyArticles . Without RDFa (or Microdata, or JSON-LD) in PDF, there's a lot of parsing that has to go down in order to get a graph from the citations in the article. Each service adds value to this graph of resources. Pushing forward on publishing linked research that's reproducible (#LinkedResearch, #LinkedReproducibility) is a worthwhile investment in meta-research that we have barely yet addressed:
> A practical use case: Alice wants to publish a ScholarlyArticle [1] (in HTML with structured data, as a PDF) predicated upon Datasets [2] (as CSV, CSVW JSONLD, XLSX (DataDownload)) with static HTML (and no special HTTP headers). 1 https://schema.org/ScholarlyArticle 2 https://schema.org/Dataset*
> B wants to build a meta analysis: to collect a # of ScholarlyArticles and Dataset DataDownloads; review study controls and data; merge, join, & concatenate Datasets if appropriate, and inductively or deductively infer a conclusion and suggestions for further studies of variance*
The Linked Open Data Cloud shows the edges, the relations, the structured data links between very many (life sciences) datasets: https://lod-cloud.net/ . https://5stardata.info/en/ lists TimBL's suggested 5-start deployment schema for Open Data; which culuminates in publishing linked open data in non-proprietary formats that uses URIs to describe and link to things.
Could any of these [1][2][3][4][5] services cross-link the described resources, given a common URI identifier such as https://schema.org/identifier and/or https://schema.org/url ? ORCID is a service for generating stable identifiers for researchers and publishers who have names in common but different emails. W3C DID solves for this need in a different way.
When I check an article result page with the OpenLink OSDS extension (or any of a number of other tools for extracting structured data from HTML pages (and documents!) https://github.com/CodeForAntarctica/codeforantarctica.githu... ), there could be quite a bit more data there for search engines, browser extensions, and meta-research tools.
Is this something like ElasticSearch on the backend? It is possible to store JSON-LD documents in the search index. I threw together elasticsearchjsonld to "Generate JSON-LD @contexts from ElasticSearch JSON Mappings" for the OpenFDA FAERS data a few year ago. That's not GraphQL or SPARQL, but it's something and it's Linked Data.
> We really could get more out of this data through international collaboration and through linked data (e.g. URIs for columns). See: "Open, and Linked, FDA data" https://github.com/FDA/openfda/issues/5#issuecomment-5392966... and "ENH: Adverse Event Count / 'Use' Count Heatmap" https://github.com/FDA/openfda/issues/49 . With sales/usage counts, we'd have a denominator with which we could calculate relative hazard.
W3C Web Annotations handle threaded comments and highlights; reviewing the reviewers is left as an exercise for the reader.
Does Zotero still make it easy to save the bibliographic metadata for one or more ScholarlyArticles from PubMed to a collection in the cloud (and add metadata/annotations)?
Sorry to toot my own horn here.
Great job on this. This opens up many new opportunities for research.
Pubmed publishes its dataset for download. Its rather large but update files come frequently. Its amazing. I beleive NIH adds the MESH terms.
ftp://ftp.ncbi.nlm.nih.gov/pubmed/
We had someone do a project with it. downloaded the dataset and used it and create a tool to do some searches that we found useful to find colaborators: (last author, working on a specific gene, paper counts, most recent).
Searching by Mesh Terms across species, and search with orthologs.
The dataset sometimes has a hard time disambiguating names (I think the european dataset assigns Ids to names)
Updated Technology
The updated version of PubMed uses Solr, an open-source enterprise search system, for document indexing, and MongoDB for storage and retrieval (see Figure 4). In addition to its scalability and reliability, Solr also provides many powerful out-of-the-box search functionalities, such as wildcards (‘*’), groupings, and joins. For example, unlike the current production PubMed, the updated version does not limit the number of variants for wildcards. The MongoDB storage solution provides default data replication between different data centers, which ensures redundancy. The updated PubMed runs on a modern cloud architecture that provides scalability and a reliable backup environment.
The updated PubMed uses the Django Web framework on the front-end, making use of the latest web technologies and standards.