Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: An extension to track your Wikipedia adventures (chromewebstore.google.com)
178 points by demegire 7 months ago | hide | past | favorite | 63 comments
Wiki Journey tracks your daily Wikipedia rabbit holes in a tree format.

Available on Firefox and Chrome: https://addons.mozilla.org/en-US/firefox/addon/wiki-journey/ https://chromewebstore.google.com/detail/wiki-journey/lehenb...

It's open source, feel free to contribute! https://github.com/demegire/wiki-journey




I wrote a plugin just like this, and every day, I have it present me with a quiz based on a summaries of the first paragraph of the pages I read over the day.

Basically, I was reading way too much Wikipedia and not actually storing much information, so I have the extension shame me if I don't remember what I read.


That's genius. Have you published this as an extension? I'd love automatically-written flashcards to quiz myself on what I've read that day...



We absorb the words in front of our eyes even if we're not conscious of it. A topic that you glossed over may come up in another context and remind you of that wiki article.

It shapes who we are.

And sometimes knowledge of the existence of a topic is valuable.


I remember seeing an article about it in HN.


I would love to mess around with this if you've published it somewhere! Folks would love it as a Show HN I bet too!


Have you tried out a tangled-tree visualization? [1] I've found it to be super useful when visualizing these sorts of relationships in a compact way, and it naturally sorts the data topologically.

[1] https://observablehq.com/@nitaku/tangled-tree-visualization-...


Very cool! One small point of pedantry:

> A tree with multiple inheritance (sometimes called tangled tree) cannot be represented by using a classic tree visualization. It is technically a directed acyclic graph (DAG) with one (or more) nodes identified as root.

What is the difference between a DAG and a tangled tree? Isn't any DAG a tangled tree? I don't see immediately why a new definition is required.


I'm not entirely familiar with tangled trees, but it seems like one of the larger differences is that a tangled tree isn't necessarily acrylic. For this example, someone could navigate away from one page, but potentially be linked back to it later down the adventure.


> A tree with multiple inheritance (sometimes called tangled tree)

By the author's definition, multiple inheritance prohibits cycles. DAGs can be modeled as tree with back edges to non-ancestors. So I'm pretty sure tangled tree = DAG.

> For this example, someone could navigate away from one page, but potentially be linked back to it later down the adventure.

Good point, maybe "tangled tree with back edges to ancestors" is the really correct model for what the author wants. The key point of the visualization is to highlight the deviation from a standard DAG or tree.


The author already says that:

> It is technically a directed acyclic graph (DAG)

But DAG's don't have 'roots', they just have nodes. The concept of roots makes it a tangled tree.


Is there a source code for the visualization?


That's a live notebook! If you click on the cells, you can see the code that was used to create it, like a Jupyter notebook.


Ah, thanks! Wasn't that obvious on mobile.


This looks really neat


I feel like there's a lot of knowledge or information that we're "leaving on the plate". For instance, the sites we visit, the files we edit, the branches and PRs we create, etc etc. All of that is related, but it feels like that context is being lost or discarded.

An example might be: I have to include new AWS resources in a deployment, so I look up information about them, find examples and read about potential problems, security information, etc etc. That then becomes edits in a terraform file somewhere, with a Jita ticket, my own knowledge database (Emacs org-roam files in my case, Obsidian etc for other people). Then the feature branch gets a PR to dev, we might discuss changes in Teams (ugh) or a meeting. All of that seems ripe to be linked together conceptually, but the computer has no way to do that.

It makes me wonder if that could be fed into the right machine learning thing to at least start tracking this sort of work stuff. Heck just synchronizing my Firefox bookmarks (ff lets you tag your bookmarks) with my org-roam instance's tags would be useful. Tagged files in my knowledge base could be automatically linked to similarly tagged bookmarks.


I like these pieces of my digital footprint to not be connected. There is no need to track everything.


Do you not want them connected, or do you not want the connections shared and potentially used against you?


I like them not to be collected or connected. I don't trust those collecting such data.


I typically write up all of that in my documentation somewhere. Stuff like "first thoughts are this approach might work, talked to person who had this idea, looked at this link and found this info, decided to go with this approach because of factors x, y, z". This isn't the primary user-facing documentation but a subpage or something that's helpful a couple of years down the line

It's like a book titled "A History of [Object]" that traces what solved problems before the object, issues with old solutions, the emotional, financial, etc state of the inventor, why they chose this solution over that one, how the object was adopted and improved afterwards, other inventions spawned off the object, etc. Capturing the history of the object requires capturing the context around the object too


My thoughts on this are to slow down and document and explore that knowledge and information. If it is really valuable, the "loss" in efficiency from slowing down will be offset by the gain in skill/utility from really grokking the stuff.

If it's not...then there's really nothing "left" on the table — if ever turns out to be valuable, you'll probably come across it again, when needed.

I constantly get a similar feeling. I'm speeding around from task to task, just grasping enough to get the current task done so I can get to the next one and the next one...

And somehow this is value-creating? Apparently it is, but it seems almost accidental, at that rate.

I'd rather slow down and appreciate the value as it moves through me, into whatever I'm doing.

I usually get more from the process, at the same time.


It's like...if "less is more," then "more is less."

Reminds me of a floating point number. The bigger or smaller they get, the less accurate they become.

If you're chunking on a ton of data and tasks, you're getting less out of it. At a certain point, none of it even seems to enter your brain at all.


Basically, this is what college should be teaching you - how to research. What good does are useless facts? I don't want to walk around cluttered with a dictionary - I want to know where to look in that dictionary. Obviously in the sciences there are facts that you should know, but even with math, its more about how to derive the formula, than actually memorizing it. I mean, their called "Research Papers" right?


Totally agree. I remember the phrase “learning how to think” being thrown around.

I also remember not being explicitly taught that.

It sort of seems like trying to find enlightenment by chopping wood and carrying water at a monastery.

If critical thinking is something that spontaneously emerges in a learning environment, maybe we shouldn’t sell it as a benefit. “Some students experience deep insight into the nature of the mind. Results not typical.”


I've been thinking of something like this since LLMs became popular. I've toyed around with some proof of concepts, but haven't had the time or motivation to work on it lately. I love the idea of tagging everything and showing connections when you're searching for things. Also semantic search would be great, like "blue website with information about databases I read last week" would be super powerful in my opinion.

I really love the idea of digital knowledge bases, but as you said, I think we're leaving a lot on the table. I need to get back to my project of a user-owned-data knowledge base.


What kind of approach did you take? I was thinking along the lines of requiring something like rewind.ai or some program that autoscreenshots your screen at a set interval (or originally a recorded video split into several images later) and having a vision-capable model (particularly specialized in UIs) describe these set of images in order to build a dataset of images-tags-description and the like.


There's also libraries like trafilatura in Python featured here in HN some time ago that could extract content from websites to help augment the data.


I've had similar thoughts but over time you'd just end up with a private copy of the internet. You'll still have to search for the information anyway, so I'm not sure what the benefit is. Searching your knowledge base for "the thing I did yesterday" vs "how to sync Azure to AD" seems basically equivalent to me. You're just creating yet another thing to search.


That's a good point, you'd absolutely want to get away from adding another burden to the human.

Seeing relevant bookmarks when I'm viewing a specific note in my database could be useful though. And finding pull requests related to a subject might also be useful.

So the idea would be to reduce the number of searches performed by the human. Automate and enhance rather than dump and forget.


Yeah, but your private copy would be more like "The internet: The Good Parts" (assuming you had a way to not store what you immediately dismissed as garbage; maybe only include pages with a dwell time of 15-30s or more.) That's enormously valuable (and why I've implemented it before - but in conkeror, which didn't survive the death of xulrunner - so now I use pinboard and text files and logseq, which are pretty good but a lot more work.)


To whatever extent something like this can be done locally, I'd probably pay a monthly sub for it if its good enough. But I wouldn't want any of that leaving my machine, we get tracked and profiled enough as-is imo.


Yeah, this is worth at least as much as Kagi or Copilot is to me right now.


Working on something like that, but there’s still a good amount of work to do


That's cool.

I do find it ironic though that wikipedia is one of the major sites with the least amount of user tracking, and then users decide to implement the tracking themselves.


That is funny, though this is more tracking-by-users than tracking-of-users



tracking for-the-benefit-of users, which only has to be done by the users because no services can be trusted :-)


This is cool, I love how it shows you all the branches you've followed in actual tree diagram.

The concept reminds of https://browser.horse/ a bit, which has the concept of "trails" that track any links you visit. Great for research projects.


Cool tool. Might be cool to make something wikipedia agnostic. Sometimes I manually create such a thing via obsidian but its kind of tedious. It's interesting how sometimes different starting sources read far apart in time lead to rabbitholes which cross paths.

This reminds me of a python scraper I wrote a while back when I was learning to program - Youtube rabbithole: https://github.com/BlairCurrey/youtube-rabbithole

It basically just follows the next recommended video, recording the path along the way. More about tracing the youtube algorithm than tracking your own journey.


Looking at https://github.com/demegire/wiki-journey/blob/main/firefox/c...

It seems likely that the extension could be customized to any Mediawiki instance? As an admin I'd love to be able to use it elsewhere. This looks like it could be a great tool working with test users on stuff like information architecture, to see the path of how they found information. (I know there are better tools for that, but something that focuses tightly on wiki interactions would be useful to me.)


That’s a very cool project and I wish something like this would exist for all websites.

A few years ago I did a university project where we looked into (internet) research and how information discovery and gathering could be improved. (https://www.kaimagnus.de/projects/halo)

There we had the concept of a similar looking tree. Users could then come back to their exploration and take notes, prioritize and sort.

It was only a concept back then, so it’s nice to see it in action.


Similarly, per chance, is there also an extension for the browser to show a tree graph or a directional node graph like in Obsidian for the sequence of websites you visit in your browser history to see your whole rabbit hole on the Internet? I'm pretty sure the tech is already used by the advertising industry.


Suddenly I am reminded, for the first time in maybe 2 decades, that "surfing the internet" was once a term used specifically for this kind of rabbit-holing


This is really cool! It would be super neat if the nodes were more interconnected, forming a fully connected graph rather than just a tree.


This is tracking the user's trajectory through the site, necessarily a tree, not the network structure of W itself.


How is the user journey through the site necessarily a tree? What prevents the user to create loops through their journey?


Not an absolute statement, just that it resembles a tree more closely as you branch off slicking on hyperlinks.


Oo, yeah, that's a good point! I totally see why it was done this way now.

Though I do still think it would be cool to have a toggleable overlay or something that shows the cyclic connections!


That would be technically more "accurate", but it doesn't yield more useful information and ends up being harder to read.


Interesting. I’ve been using the Wikipedia iOS app (which saves history by the day) to keep track of my personal rabbit hole journeys…


This is one of those ideas where you think "why the hell didn't I think of this?"


POV: It's 4 AM and I still can't fall asleep


This is great! Will try to give a try later


Narrator: they didn't.


Have to admit I'm slightly disappointed that the FF version only shows two users still and one of them is me.


I didn't know I needed this.


A graph of Wikipedia rabbit holes


wow, this is great


Obligatory xkcd: https://xkcd.com/214/


My Wikipedia searches are like my porn searches: no one needs to know about them, least of all myself. They bring only shame and remorse.


This is fantastic. Great idea!


I'll mention that I made what could be described as a AI-generated Wikipedia alternative, where you can generate articles on anything with text links on terms that link to new articles that get generated considering the context of the the article path that got you there. I reckon Wiki-enthusiasts won't be disappointed: https://anylearn.ai


Awesome!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: