
Buku v3.0 – Command Line Bookmark Manager - apjana
https://github.com/jarun/Buku/releases/tag/v3.0
======
StreakyCobra
It already seems to have some nice features, but for me the dream bookmark
manager would be something really simple with two commands like:

$ bookmark add [http://..](http://..).

That will:

1\. Download a static copy of the webpage in a single HTML file, with a PDF
exported copy, that also take care of removing adds and unrelated content from
the stored content.

2\. Run something like [http://smmry.com/](http://smmry.com/) to create a
summary of the page in few sentences and store it.

3\. Use NLP techniques to extract the principle keywords and use them as tags

And another command like:

$ bookmark search "..."

That will:

* Not use regexp or complicated search pattern, but instead;

* Search in titles, tags, AND _page content_ smartly and interactively, and;

* Sort/filter results smartly by relevance, number of matches, frecency, or anything else useful

Storing everything in a git repository or simple file structure for easy
synchronization, bonus point for browsers integrations.

~~~
elevensies
I've been thinking along these lines, some other features I'd like:

\- ability to have certain sites run site-specific extra processing: i.e.
youtube-dl youtube links

\- ability to have a list of sites to be archived periodically instead of once
only. And the option to be notified when a site updates. even if it were run
as a batch job

\- ability to ingest a PDF or ebook, identify all the URLs, snapshot all the
URLS, present them as a list that links to the original, cached version, the
page location

\- would also be nice if the data could be stored in a human readable
structure in a normal filesystem, so your ability to use the data isn't
dependent on your ability to run the tool.

Overall I think it is an interesting project but the commercial potential is
limited.

EDIT: maybe the document processing and periodic check thing would make more
sense as a higher level tool that depended on the bookmarking tool -- and the
extra processing also might make more sense as a plugin type architecture.

~~~
StreakyCobra
> would also be nice if the data could be stored in a human readable structure
> in a normal filesystem, so your ability to use the data isn't dependent on
> your ability to run the tool.

This is really important yes!

> Overall I think it is an interesting project but the commercial potential is
> limited.

Indeed. This kind of tool is mostly limited to hackers, and it is not a big
enough market to create a business model I suppose.

This would have to be done for the love of open-source :-)

~~~
apjana
There's an option to filter the print output. But yes, currently the data is
stored in SQLite only.

Yes, buku doesn't intend to be a commercial utility. I use bookmarks as a
context pointer for everything I do. So I wrote buku. But it's written as a
library so other projects can use it.

------
roadbeats
he-yo. I'm one of the earliest users of del.icio.us and also pinboard, made
similar tools to buku for my Linux desktop, and added a custom tabless web
browser on top of it to make bookmarking as convenient as possible. it was a
very productive setup, but I don't use it anymore;
[https://github.com/azer/delicious-surf](https://github.com/azer/delicious-
surf)

Storing bookmarks locally is definitely cool but it's still not convenient
enough for us to bookmark any page that we found value. If I'm browsing 30
pages about my upcoming trip to Patagonia, I won't bookmark most of it just
because it's not convenient enough. If I google a solution for a problem for
hours and go through tens of pages to find information, it's likely that I
won't bookmark most of the pages.

You can keep bookmarks in Chrome, Safari, Pinboard, Firefox, whatever. But
they are all not innovating bookmarking and likely won't.

And this is exactly why I'm building Kozmos currently. It has a desktop and
mobile client, and will bring completely new perspective on bookmarking. You
won't need to organize anything, it'll all be done automatically and you'll
easily find your stuff thanks to advanced search engine. My goal is to bring
good design and good tech together, and provide everyone the most convenient
way to bookmark.

You can sign up to the private beta and get an invitation within a week, here
is the link; [http://getkozmos.com](http://getkozmos.com)

------
cytzol
I’ve started avoiding the whole pretence of “tagging” or “organisation” like I
tried to do with Pinboard. My bookmarks are now Safari .webarchive files. I
have access to them offline, I can back them up the same way I back everything
else up, and I just organise them in folders however I want. I even get
search!

A “bookmark manager” doesn’t need to be an app or a service -- it can just be
a bunch of files.

Edit: I should say that Buku looks like a good program for those who like that
way of things, though!

------
dredmorbius
I hate changing command options, but if you're going to do this, do it early.

Please swap the definitions of '-s' (search any) and '-S' (search all).

Rationale: virtually _every_ time I run a search, I'm interested in _the most
specific result_ , and this most especially _when I have created the search
space myself_.

Having to hit the shift key for my _default_ search preference is ...
backward.

I know far too many _online_ search tools which OR rather than AND arguments
(probably because the underlying tools support OR more readily than AND
searches) ... and ... this drives me flipping bananas. Because the _more_
specific my search, the _less_ specific the result.

It's the worst possible antifeature in a knowledge management tool.

I'd also suggest that the capability to distinctly search specific fields be
specified:

* URL

* Title

* Tags

* Metadata (author, publisher, date).

A date-ranged search would be particularly useful.

------
apjana
## What's in?

\- Edit bookmarks in EDITOR at prompt

\- Import folder names as tags from browser html

\- Append, overwrite, delete tags at prompt using >>, >, << (familiar, eh? ;))

\- Negative indices with `--print` (like `tail`)

\- Update in EDITOR along with `--immutable`

\- Request HTTP HEAD for immutable records

\- Interface revamp (title on top in bold, colour changes...)

\- Per-level colourful logs in colour mode

\- Changes in program OPTIONS

    
    
      - `-t` stands for tag search (earlier `--title`)
    
      - `-r` stands for regex search (earlier `--replace`)
    

\- Lots of new automated test cases

\- REST APIs for server-side apps

\- Document, notify behaviour when not invoked from tty

\- Fix Firefox tab-opening issues on Windows

Home: [https://github.com/jarun/Buku](https://github.com/jarun/Buku)

------
a3n
More useful to link to the main/code page, with its README, chock full of
"wtf"-dissolving examples and text.

[https://github.com/jarun/Buku](https://github.com/jarun/Buku)

~~~
joepvd
Just skimmed through the README, and I stored it in my list of well-documented
projects. Well done!

~~~
apjana
Thank you!

------
rawfan
I'm still glad I found pinboard.in. It works great, is a paid service (so I AM
the customer) and even archives everything I tag.

~~~
tokenizerrr
Why are you happy to be a customer of something that could easily run locally
for free?

~~~
ashark
I don't use Pinboard—bookmarks are mostly a place I send open tabs I'll never
get around to reading so I don't feel bad about closing them—they may as well
go to /dev/null.

 _But_ storing data (and backing it up, and ensuring that backups work, and
having some kind of monitoring so you don't discover one day that everything
silently broke and all your data is now gone, and buying the hardware to
support all that, and taking time to research the purchase of that hardware,
and the extra, constant, low-grade stress associated with all the above) is
never free, provided you actually care that said data survive and be
accessible/useful.

------
subbz
Bookmark managers should happen in the browser, not in the command line.

I recommend Shaarli:
[http://sebsauvage.net/wiki/doku.php?id=php:shaarli](http://sebsauvage.net/wiki/doku.php?id=php:shaarli)

~~~
apjana
Author of Buku here. In fact Buku can store bookmarks directly from the
browser. You have 2 ways to do that (including a dedicated plugin). It also
has 5 different search options with a powerful prompt to find out just any
bookmark you have stored (we have users who imported even ~40K bookmarks from
Delicious and are happy with Buku), extensive flexibility of editing and
manipulation, encryption support, multithreaded full DB refresh and a lot
more.

In addition, Buku is also developed as a library and Shaarli can use it as a
powerful python backend over REST. ;)

Yes, one of our contributors did want to add a feature to generate a full
webpage with thumbnails but we decided not to add it as it seemed simply
ornamental when you think about the raw potential of Buku.

~~~
apjana
Oh and one more thing about Buku... it is designed to simplify your workflow
beyond imagination. You search something in Shaarli, get 10 results. You want
to open results 4, 5,6,7 and 8. What do you do? Click 5 times? Not with Buku.
You enter:

    
    
        o 4-8
      

terminal bliss, yeah!

------
mmjaa
Who uses Bookmarks any more? I don't.

Instead, if I like a page I want to re-visit, I simply Print it to PDF. Then,
I move all the PDF's from my Desktop every week, into their own permanent
storage location .. meaning that I have every interesting web page I've ever
read since 2000.

Trouble is, now I have a large PDF collection to manage. I get along fine with
"ls -l | grep <something>" this and "pdf2txt <blah.pdf> | grep <something>"
that .. but of course, this is not as 'clean' as if I had a Bookmark Manager
to do all my searching/grep'ing/grok'ing/etc.

~~~
dorian-graph
I still use bookmarks (Pinboard) but I like your approach. I'm slowly trying
to remove all reliances on third-parties as I can as they're too ephermal. I'm
guessing you went with print to PDF because saving the page from the browser
would result in broken pages? Have you found that the PDFs don't look very
good for some websites?

Alternatively, you could use a tool or extension that does full-page
screenshots and then run image optimisation on them. I do this a lot for local
Pinterest-type inspiration store. At the moment I use
[Nimbus]([https://chrome.google.com/webstore/detail/nimbus-
screenshot-...](https://chrome.google.com/webstore/detail/nimbus-screenshot-
app/aecjogkncpbkjfobfnoaiepipllcadhe?hl=en)) but it seems like evert few
months the extension I'm using starts to fail with certain websites
(scrolljackers, mostly), and I switch to a different, newer extension.

Alternatively again, but back to saving websites, surely someone's created a
nice tool that will download a page to store it as an archive that won't be
broken? Pinboard for example has archiving for an extra fee, so I wonder how
he does it at his scale.

Related to your problem grep'ing, I'm slowly working on a small idea to have a
local tagging/metadata approach for finding things.

~~~
r3bl
> Alternatively again, but back to saving websites, surely someone's created a
> nice tool that will download a page to store it as an archive that won't be
> broken?

Wallabag[0] does that. It's a self-hosted Pocket-like read-it-later service
that strips the page and saves the text of the article in a local database,
therefore allowing you full text search right from your own server. It even
adds some additional neat features like adding notes to the articles.

The only downside: damn those two currently available themes are awful.

[0] [https://wallabag.org/en](https://wallabag.org/en)

------
hhandoko
> Hence, Buku (after my son's nickname, meaning close to the heart in my
> language).

Coincidentally, `Buku` translates to `Book` in Indonesian...

~~~
apjana
Thanks for sharing!

------
orschiro
As someone using Chrome Bookmarks, can someone please explain to me in simple
words what this is?

~~~
r3bl
Same thing, using the command line and giving you full control over the
bookmarks (because they're staying on your machine).

~~~
orschiro
Thanks!

> (because they're staying on your machine).

What's the advantage over exporting my Chrome Bookmarks as HTML?

------
pacuna
is anyone using Buku more like a read-later system instead bookmark manager?

------
dredmorbius
The demo which I'd really need to see to judge this is here:
[https://github.com/jarun/Buku](https://github.com/jarun/Buku)

I'm still not sure this fits my needs or workflow, though that's more useful
than the project link itself.

I'm tremendously interested in this or related tools, as I've got an exploding
research problem that nothing I've seen yet comes close to addressing, and
most of which introduces numerous additional problems. See:

[https://ello.co/dredmorbius/post/fj5rzi8zmouyrmvg8yzzva](https://ello.co/dredmorbius/post/fj5rzi8zmouyrmvg8yzzva)

Short version: I've got a library of a few thousand articles, plus another few
thousand books, plus another few thousand online references, which I've
gathered, am continuing to gather, am trying to assess, prioritise reading of,
and generate a number of outputs from, as well as use in what's likely to be a
several-decades-long research and writing project.

Online services simply don't offer sufficient longevity, even should they meet
my other requirements, which they don't.

Assigning metadata is a significant pain point. _Coming to some aggreement as
to what metadata to assign is a signficant pain point._

I'm coming to see librarians and library cataloguing as essential domain
knowledge and experience. In all seriousness, I suggest any project looking to
make use of categories and classification look to the US Library of Congress
Classification System: it's extant, expert, unencumbered, comprehensive,
hierarchical, extensible, has a change management process, and is applied to a
store comprising 164 million works.

[https://mammouth.cafe/@dredmorbius/56485](https://mammouth.cafe/@dredmorbius/56485)
[http://www.loc.gov/catdir/cpso/lcco/](http://www.loc.gov/catdir/cpso/lcco/)

There's also a top-level reduction to 21 distinct categories, and the
possibility of, say, coming up with a short-list of frequently-used
classifications, as well as of assigning multiple classifications to works.

The rationale for _only_ storing bookmarks is ... generally not valid. There
are a few types of online resources, generally:

1\. Interactive or volatile pages.

2\. Static pages.

For the first, search engines, web apps, landing pages, etc., storing a static
instance isn't _tremendously_ useful (though it can be more useful than you'd
think). For the second, _a locally-stored version is almost always more useful
than the online instance._

And space for text is now beyond cheap.

I'm looking at this problem in terms of desired outputs, workflow, various
states of resources, how to (reasonably) uniquely _and_ persistently identify
a given document, managing media (images, audio, video, other interactive
elements), etc.

And yes, this starts to look rather much like Memex, for similar reasons.

