It's essentially a simple web server that sits on top of a bunch of markdown files.
The frontend renders the markdown using markdown-it and supports KaTeX for simple inline mathy things, along with the extended markdown stuff like tables etc. I've even made it so that you can drag and drop files (including images) into the edit box and it will upload them to the server and render the correct markdown syntax so they can be rendered when you look at the note.
Alongside the files, the data is also stored in a SQLite database file with some metadata, and I'm using the Full Text Search (FTS5) engine to support search which seems to work ok.
If the database gets corrupted it can just be rebuilt, it's really just there to augment the notes. If I stop developing it or want to move on, the notes are there as text files.
It works well enough in a mobile browser, although admittedly a bit rubbish if you need offline access.
Works well enough for me. I might open source it one day but I think I'd need to clean up the code a bit first :)
EDIT: the core of the tool was mostly inspired by this article https://golang.org/doc/articles/wiki/
Some nice features I've added over the years: bookmarklet and automatic page-screenshotting, tags (and smart auto-tagging), everything markdown supports, file upload and attach, media embed (YouTube link becomes player, eg). Oh, I can also attach email reminders and make to-do lists (with little checkboxes and everything). It started out very simple and has grown over time. Sqlite is a great foundation for projects like this. Strongly recommend.
Sourcegraph is a web-based code search tool that automatically syncs and indexes many repositories from your organization's code host(s). It's intended for every developer at an organization to use for searching across all of the organization's code (and for navigating/cross-referencing with code intelligence). It's self hosted and usually there is 1 Sourcegraph instance per organization. If you love local+personal code search, I bet you and your teammates would love organization-wide code search, so give Sourcegraph a try (https://docs.sourcegraph.com/#quickstart). :)
I still wish it was easier, it's such a cool tool :) In theory it should be possible to set up inotify watches on local repositories and reindex on changes (perhaps with some throttling logic if it's too heavy), although I understand it's harder than it sounds and my usecase is probably somewhat marginal. I might set it up anyway if my personal infrastructure ever settles.
Hound falls short on access control front (we wrapped our instance with a saml proxy), but it's still a 'you either can search every piece of software for \'password\'' or you don't have any access at all. Having to index a specific branch instead of all of them kinda stinks too; for those two specific reasons we have been eyeing sourcegraph, esp. as the gitlab integration matures.
I can't emphasize enough how fast hound is and how pleasurable it is having a regex based code search that doesn't make me wait.
Anyway, for now I'm at a small enough org that everyone still just sees everything, and it's been super valuable.
As far as competition with other tools, the infrastructure team at my org has their Elastic instance plugged into our GitLab, but most of the engineers agree that Hound is better— it's faster, it does regex, and it doesn't do goofy stuff like return pages of the same result from everyone's fork of the same repo.
After reading about your masterplan I would love to know your thoughts on the question presented regarding phase 2.
Will coding in the future be more like writing a novel or like knowing how to read+write? I feel the latter will eventually be true as the the human-machine interface becomes more 'native'.
That advanced use case you mentioned isn't supported, but it sounds very cool. It's in the realm of things we'd like to offer someday. If anyone's interested in hacking on that (and making a PR to https://github.com/sourcegraph/sourcegraph), I'd be happy to screenshare with them and give them some pointers.
Thank you for updating the documentation to clarify the use case though!
$30/person is almost double what Stack Overflow charges, and that product can act as a frontend to search not just code but any type of documents, with voting, tagging, analytics on what confuses people the most and more.
It would be hard for me to justify even $10/person for something like Sourcegraph in my company (a Fortune 500 ecommerce brand), for the highest enterprise tier of functionality.
$30/person per month for the lowest tier? Boy, I wish I knew of companies willing to pay that. None in my experience ever have been.
My strategic advice is to get whatever's best in class, and not worry about $X0/month. Compared to what you should be spending on devs that rounds to free.
It’s amazing for how long Emacs’ Org-mode has been largely unparalleled! Apart from the revered desktop setup, there are now a bunch of mobile offerings including Organice — not quite slick, but definitely useful.
I‘m sincerely rooting for more experiments in this area. I would love to be able to write by hand or speak to my memex (multi-modal interaction). Vannevar Bush’s “As we may think” has languished uncourted for pitifully long. In some ways, this was supposed to be the first “killer app” for personal computing.
I use org-mode all the day but frankly OneNote is great too!
If OneNote would save in plain text and have a cross-platform gui I would use it (even if it's resource-sucking electron)
Basically, onenote is almost there but I would love to leave it
I especially love how it automatically cites and links to whatever you copy and past from the web. That alone is so valuable for documenting workflow and how-to write-ups.
However the combination of me using a desktop less and mobile more, plus Microsoft's attempts to turn Office into a web app have soured me to it. That and the limitations mentioned above. I'd love to be able to export to a wiki style interface, but I cringe at thinking about what that html would look like (a la Word's html export).
But I have yet to find anything that I like better. Or will consistently use as much.
After jumping into org last year, I think it because orgmode has a solid foundation for organizing stuff that's infinitely customizable with elisp. Roam, evernote, onenote; they just don't have the flexibility. The lack of customizability is a feature in itself: it's easy to pick up.
On the other hand, orgmode has a fairly vibrant community that will keep improving orgmode for many years to come.
Essentially, it's a knowledge management system that makes input almost frictionless. This is then mapped into a shareable ontology graph on which algorithms can be executed. Valuable data can be extracted from here.
For example: do you need to find a team with a specialized couple of skills? Have applicants send their verified graphs and use those relations to find the best fit.
Or, alternatively, someone who's learned a trade/skill can share their dense knowledge with a community, to direct learning more effectively.
It's on a very early stage, for now purely for the fun of it. But if there's interest or suggestions (definitely some hard problems to solve) we could focus more efforts towards that.
My understanding is that they're throwing their presence away? Maybe they pivoted to enterprise, I don't know, but for at least a couple years all I've heard about them was people talking about what to use instead.
This came with a shutdown of freeusers heaven, and focusing more on the paying customers. Because of which many people seem to be cranky over evernote. Similar think seems to happen at the moment with dropbox too BTW.
-Coda.io (big, more scriptable player)
-Hypernote (super new player, but with a cool new take on inter-note relationships)
-Tiddlywiki (super customizable, really fast -- but also has a fair amount of footguns)
-Airtable (only played with it a few times but it's usually mentioned in the same breath as notion, I notice)
Hopefully someday we'll achieve Alan Kay's dream :)
OmniFocus is more expensive but i gladly pay to prevent my data from being analyzed & sold.
I can't really justify $10 / month for "just for fun" personal projects, and the 1200 records / base is too limited for many ideas (and also 5000 records for $10/month is on the low side as well, even if putting it as a company expense)
Yes, I know, they got to eat and everything, and maybe cost vs income is not feasible for personal accounts.
It's one of those things I try every few years, fail and quit.
They've made massive changes over the past year. They'll even have a Linux app coming out soon!
(2) For physical documents I use a Fujitsu ScanSnap iX500 for scanning. A runtime-licencse of ABBYY FineReader for OCR is included. The resulting PDF has embedded text which I extract using pdftotext. I wrote a python application to search and tag this documents. It loads all the text in-memory which is perfecty fine as I have < 10,000 documents. I use it since 5 years and it works OK.
ok carry on please, diversion over :/)
There are some reasonably good OCR tools on Linux now as well - I've been pretty happy with Tesseract. It was an absolute pain to script everything to "just work" when I press the button on my scanner though.
Recoll works very well for indexing documents for me including my OCRd scans. When that's not enough, I revert to pdfgrep.
My usecase would be scanning multi page documents with minimal effort, and saving to PDF somewhere.
I feel it shares aspects with Rubber Duck Debugging: The effort of taking something you "know" and forcing it back out through other brain-circuits (i.e. language and/or simulating a social interaction) helps to fill gaps that your brain would otherwise skip over. The act of hearing/seeing your output also causes other parts of your brain to analyze it as if it were someone else's thought.
I suspect our consciousness isn't nearly as unified as we like to believe.
Tests in middle school, I could recall writing things down, even the part of the page I wrote them in.
By college I would write TODO's down and lose them, and not be able to recall what I wrote down. Misplacing the note was more likely than forgetting the task, so I stopped writing them down.
I should try to measure this again because right now I couldn't tell you which works better.
One of the most uncomfortable things about getting older is that in your teens and 20's you spent all this time figuring out who you are, what you like, what you're good at and what you struggle with. Age, changes in health, coping mechanisms, changes in perspective all fuck around with this and you can find yourself in situations you should avoid or avoiding situations you could embrace.
It's like a weird mid-life crisis.
Hm, basically it's making real final decision, instead of playing with a bunch of potenial possible decisions which are all somewhat equal, but also kinda fuzzy,
I've found Freemind to work well enough for me. Search not needed as I browse the graph easily enough.
I am now dabbling with reviewing them, although not sure what that will lead to, as they are so unstructured. There are generally a few gems in there to be remembered, but mostly spur of the moment gibberish!
Edit: It also has a very basic security model (private, public, unspecified), and with that in mind, can export trees of notes as html or as outline documents (text), with or w/o indentation & numbering, which I've found very useful. And anything can be in as many places in the tree as is helpful. The export to simple html, I use to generate my 2 web sites.
(I plan to move it to Rust, and maybe sqlite, eventually, as well as add features like anki, internal code attached to entity classes for cheap internal customization/automation, etc, but have been slow lately.)
(Edit: it is currently only self-hosted by each user. Have considered doing hosting for other users, and might some day.)
A little CSS (max-width: 700px; margin: 0 auto;) on the body would go very far.
For me, the big picture is I organize everything in ways that work well for me, which I have tried to mention on the web site (in screen shots and some org ideas somewhere). Like, todos, historical things, documents, contacts I have (orgs and people), calendar + tickler file (so I dont have to think about things until the date I should start thinking about it, but I don't forget, if I check it habitually), habit reminders and other review/study material, and notes by topics organized in ways I can find things. I have a top level list/hierarchy/outline (actually a few of them, and anything can link to anything else for quick reference, depending on the convenience of the moment for lookup), or I can remember some search terms (<x-company> main" to get a phone # for x-company). I also have standard patterns (with some support in the software for making data look like templates) for details about contacts or other things, logging journal notes, conversation notes with businesses or doctors or whatever, and it then becomes easy to refer to history. Then anything is basically available via a few to several keystrokes, to get exactly what I want. There is also text search, or queries by date. It seems like one would have to do that with any kind of mind map, org-mode, or note system: organize things and/or search for them in a way that helps oneself as the user. Maybe some pre-fabricated forms or examples of that would help someone get started though...
(some edits for clarity above, and)
Edit: Also, when navigating in to one's data, one can then hit 0 or ESC to go back out the way you came, even holding down ESC to go back to the top level. I also tried to make it so the UI shows what can be done at any given time, if one reads the screen.
Is any of that relevant, or do you have something else in mind? Thanks again for the feedback.
Edit: I have removed mention of the telnet demo from the site. If there were sufficient real interest I would put it back (or consider hosting the system for others). If so, email me via the mailing list at the site, or via the address at the site footer. Thanks.
I wrote a wrapper function, sbh (search bash history) that allows me to input date strings like "2 months ago", or "last week", which narrows the search. Linux 'date' function with --date string arg is pretty powerful.
1 - https://spin.atomicobject.com/2016/05/28/log-bash-history/
2 - https://www.thegeekstuff.com/2013/05/date-command-examples/
By the way, is there, by chance, a "note taking/indexing tool from photo"? I'd like to be able to take a photo of an title/abstract of computer science paper with my phone. And then be able to find it, by approximate date and keywords. (I use Android. Seems like something relatively easy to hack, actually, on top of Google photos.)
In light of this, I'm biasing toward simple file formats managed by tools I write myself, and optimizing for cost in a way that I otherwise don't, since any recurring costs incurred by the system are effectively a lifelong commitment. I am relying on S3 for primary storage (so that it is accessible anywhere) but with a sync to offline backup.
So far, I've implemented a personal Zettelkasten tool (with built-in spaced repetition, so doubles as an Anki replacement) and a search engine that's based on Presto (via AWS Athena) so that I don't need to keep an Elasticsearch instance alive. I'm planning to build out other repository tools as I go.
It's been very liberating to build tools that are never meant to be used by anyone other than myself, and with the confidence that the tools don't matter too much anyway since the underlying files are stored in evergreen formats.
I want to build one big Backup. Some initial research has pointed me to something like Bacula to manage the data backup process from a machine. With the 3-2-1 rule, I know I also need my Backup itself to have at least 3 copies, in at least 2 different forms (cloud/hard disk), at least one of which is off-site from me.
As an individual, do you or anybody else know the best way to implement such a system? Should I buy one giant hard drive, use many hard drives to create a RAID array, something else?
Basically I'm working on a tiered system. Files/dirs are categorized by size (<10MB, <25GB, >25GB) , and by sensitivity (public, confidential, secure. And importance is usually proportional to security). I have fortunately found that security is usually inverse to size. Github/lab anything which makes sense. Confidential small stuff (sans keys) is just stored in gmail/drive. Big, boring stuff (music, ebooks) is just kept on external hard drives.
Secure, ultra-important stuff, I don't really have a system for.
The system I'm leaning towards is just encrypt archives and store the key/password securely, and store it like you would any boring data, with a local NAS and a cloud backup service of some sort, or just stored on drives offsite.
How did you construct your NAS? Is it a single system, or multiple hard drives/storage solutions connected to your network?
Ideally though yes I would have my own entire backup system but I frankly don't trust myself enough to do it right, so hence some redundancy in the cloud.
The NAS I am still designing actually :p
That being said, one of the reasons I chose S3 vs. other AWS services or other companies is because I expect it to be around for a very long time. (Just because I've preserved the option of migrating away doesn't mean I relish the idea.)
There are lots of tools that do the individual moving parts, but a personal aggregator of everything would be interesting. Basically, a tool that lets you become your own personal data broker—just for your own personal data.
I wrote a post on some data that I collect and have/will integrate: https://beepb00p.xyz/my-data.html#consumers
If you ask me, this is the shape of things to come.
Google has already had 2-3 services to manage your data that they have closed down. Maybe they are the ones that taught me not to trust your data with anything on the web.
Even something like Evernote is iffy, they seem like they are constantly on the verge of shutting down.
Although I do find it sad that that the human race as a whole puts so little value into this type of software, and so much value into sports and politics.
Maybe I could host for others sometime if there were sufficient interest. And/or move it to sqlite.
Is it possible there is a solution that makes the data more permanent and allows multiple parties to backup the same sources, or something similar? Some sort of federation protocol maybe.
...a bit contrarian compared to the WordPress and BlogSpot frenzy at the time, but I've been happy with it.
[rames@...:~/blog/entries]$ find . -type f | wc -l
[rames@...:~/blog/entries]$ find . -type f | xargs -n1 cat | wc -c
It's been very stable over ~15 years, but I think it might be time to adopt SQLite, at least as a caching layer. ;-)
It's a set of unix-style tools that let you treat text files as databases.
It's just plain markdown and syncs to any cloud provider or a webdav share. Butt-ugly especially on iOS, but it works and there is no vendor lock-in.
$30 / month
$10,000 / lifetime
Maybe they'd do better to ease up on the tracking, especially for a "give us all your documentation" service.
I would be open to the idea of a tool which combines the entirety of my digital presence at any point in time in a single platform. Kinda like a dynamically updated list which updates itself - every time a linked account makes a comment, 'likes' a post or performs any activity that may link it back to me.
Here is a bit longer comment on that which I made earlier today: https://news.ycombinator.com/item?id=22160026
It sounds like you're saying that nobody bothered to modify it to use LocalStorage, which is a surprise.
I work in a highly regulated industry...
Any workaround would be grounds for termination. So there's no point to my comment really - just curious if anyone else is in the same boat.
Yes. Spent a few years building a knowledge base in an offline application. Now I have a new job that doesn't allow me to install software. So all my notes stay at home.
Maybe one day I'll make an export to PDF and use that at work. But I will miss the editing functionality.
On one hand I understand the need to prevent data leaks, malware, etc. On the other hand -- am I supposed to memorize literally everything? Or search everything on Stack Exchange over and over again, hoping that the explanation is there, is correct, and is up to date? Figuring out stuff and making simple notes is my strength. Memory is my weakness. This sucks.
I wonder if it would have been better to install some wiki software on my private website and build my knowledge base online. Reading unknown webs is not forbidden in my current job (the web filter apparently uses blacklists). But maybe in my next job it will be, who knows.
Every now and then I find an external link I want to share with myself, but I’ll just send myself an email with it.
Where I find all these systems break down is recall. They're designed for someone who can recall a word or phrase that was in the content. I can usually recall "It was about X" or "The document/web page/image looked like Y". But an actual word? The author's name? Not a chance.
While a more difficult problem, if the tool is to live up to the "Future" section of this page, it's got to go a long way beyond what's in the source data, to what's thought of by the user.
E.g. one software I started to use is nvALT, via: https://www.macstories.net/links/organizing-everything-with-...
But I'm nowhere near a perfect and complete solution yet...
For notes which I mutate I just keep a personal web site and I tried to keep this as cheatsheet and as compact as possible so I don't need to manage it.
So append only log in quip new folder for each task.
Mutative cheatsheet super compact pages in personal website.
Oh and for quick sniper's alfred.
- For notes, OneNote, though I'm always on the lookout for an alternative with decent UI and syncing, but using open file formats. Full text search simple enough with this. Code formatting isn't good but there's an addin where the free version formats it as it was copied.
- To search local files, Voidtools Everything is great. Searching instantly by filename is a real time saver.
- If I want full text search of a large base of documents, I used Likasoft Archivarius which cost me $30 about 10 years ago and is still handy. It's the only local desktop search I've found that supports full text indexing of tons of formats like outlook .ost, etc and can look inside archive files
- For backups I've continued to stick with external drives, mirrored periodically with Freefilesync. 3 copies - one as master, two mirrors ensuring one is offsite.
- Bullets with multiple indents going from 1 to 1) a. etc
- Table handling
- Usual formatting like heading levels etc
And there seem to be lots of flavours of markdown too, just to add another layer to things.
Edit: since there is a new project here is more details years back: http://www.linux-magazine.com/Issues/2014/160/Workspace-Pigg...
Currently using markdown files in git repos.
Input Capture - You’re going to have all-encompassing tracking and recording of all activity, but want configurable privacy on the extent to which you want your daily conversations and observations of external things you encounter and are exposed to. Capturing input needs to be holistic and incorporate all properties of encounters and new information.
Potential sources of input:
Vision — point of view recording, see snapchat spectacles, etc as primitive examples.
Audio (voice notes and multi-party conversations) - voice calls, video, etc. and other forms of audio transmission where there is more than a single party in the interaction.
You will need to keep track of web pages you visit at what times
Conversations you see on Twitter, etc.
Properties and cues must be extrapolated from the information that is captured on input, in the case of audio, transcriptions are sufficient for transcription and retrieval purposes, however since video is a visual medium, it includes significantly more properties that need to be accounted for.
The aim here is to identify sufficient data points (cues) that are subsequently represented in such a way that they are easy to search across things you have encountered but only seem to recall a certain property or cue from. This is because of the fact that human beings tend to remember things in fragments, for instance, you might remember a certain color on a page that you visited within the last 6 months and nothing else.
So long as you are capturing sufficient input and actions then you should be able to go back to any given point in time. How and where are you going to store this information? Storing everything is going to be a large amount of data. The essence of the information and context must be preserved. If you want to wind back to an arbitrary position in time with the original context intact, you want to retain as much as you can in the most efficient manner possible, so determining which data points to retain is essential. (Once the content structure has been figured out, this will be viable).
Examples of Primary Cues:
Time - humans generally keep track of things in a linear time-based fashion.
Color - invokes emotion and is memorable.
Physical Location - the efficiency of information retrieval is highly influenced by the location at which it is originally synthesized, encountered, and stored.
Keywords - the default conventional mode. Can and should be extracted from video/imagery and audio.
Imagery - search for images based on their contents and ambience.
Potential Secondary Cue — Music - see historical associated input and actions while certain music was played.
Meta Cues — Subjects - Automated tagging of keywords/encountered content.
Any combination of these queries is possible, but ultimately the killer feature is the ability to backtrack through time to find a certain piece of information that is made available thanks to the always-on recorded nature of your interactions with the physical and digital worlds combined.
Knowing what to store, and how, + displaying it needs to be worked on further.
For now, I've settled on sphinx because it can be easily exported to dash, and tied in to an alfred workflow for search.