Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What projects are you working on now?
809 points by sakopov on March 21, 2020 | hide | past | favorite | 1236 comments
With the quarantine being placed in effect in a lot of cities across the world, we all likely have a little bit more time to focus on personal projects or learning something new from the comfort of our homes. What are you guys up to these days?

I’m working on a tool that allows developers to record and playback interactive, guided walkthroughs of a codebase, directly from their editor. It’s called CodeTour, and it’s currently available as a VS Code extension: https://aka.ms/codetour.

I built it because I frequently find myself looking to onboard (or “reboard”) to a project, and not knowing exactly where to start. After speaking to a bunch of other developers, I didn’t seem to be alone, so it felt like this problem was deserving of some attention.

While documentation can help mitigate this problem, I wanted to explore ways that the codebase itself could become more explainable, without requiring unnecessary context switches. Almost like if every repo had a table of contents. To make it easier to produce these “code tours” I built a tour recorder, that tries to be as simple, and dare I say, fun to use as possible.

I’ve since found that this experience has value in a number of other use cases (e.g. simplifying PR reviews, doing feature hand offs, facilitating team brown bags, etc.), and I’m excited to keep getting feedback from folks as they try it out. It’s also fully OSS, so I’d love any and all contributions: https://github.com/vsls-contrib/codetour.

Huge problem, interesting solution.

It takes time to derive the high-level from the codebase. I guess, your walk-throughs would be similar to how I'd explore it myself - without the mistakes.

EDIT it's documentation, so can get stale/out of sync as the codebase evolves (not a problem for PRs). Though high-level architecture/APIs rarely change.

BTW In github, I keep expecting the text next to files/dir to be a comment explaining the high-level (instead of the most recent commit message).

Trying to codify the “tour” that a developer would have to otherwise discover themselves, is exactly what I’m trying to help with. This just seems like such an important phase of learning, that is currently way too manual.

Regarding the comment about tours becoming stale: when you record the tour, you can choose to associate it with a specific commit/tag/branch. That way, when someone plays it back, it will continue to make sense, even in the midst of minor code changes/refactorings.

I’ve also been working on making the tour editing experience as simple as possible, so that as you need to revise the tour over time, it’s not too difficult to do.

That said, any artifact that’s a derivative/compliment of code (e.g. documentation, tests), represents an additional “burden” to maintain. So I’m focused on trying to keep the tours “stable” enough to support learning, and reduce the cost to editing them, to hopefully support the continued investment.

I’m semi-hopeful that there’s a nice balance here, that provides an enjoyable enough DX, coupled with the team’s motivation/benefits to retain and transfer such important knowledge. We’ll see how it goes!

Seems flawless for the important niche of specific commits.

I guess navigation uses VS Code to locate functions etc, so it's robust to superficial changes (like ctags for vim).

Keeping docs in sync is a hard problem!... \tangent: maybe a tour draft, of execution trace of a typical use, filtered to only key api calls?

I’d love to hear more about your thoughts on draft tours! Currently, you can record as many tours as you want per codebase, and so you could record one that was scoped to just key API calls, and have n-number of other ones with more detail, alternate flows.

Would that satisfy what you’re thinking? Or were you thinking about being able to record a tour, and mark certain steps as being more important than others? Any feedback here is unbelievably valuable!

I meant a way to make up-to-date drafts: base it on an automatic execution trace, like from a debugger running the code.

You get this 100% up-to-date trace for free. It's crazy verbose, so apply filtering (e.g. to only key apis) to make it manageable. Then edit that draft manually. Like an annotated, abbreviated, step-into debugger.

My thinking is that this makes it easier to get an initial draft. But maybe that basic trace is easy and natural to do manually?

Also, I'm not sure how similar this "execution trace" would be to a tour based on explaining it... I suppose you could find out, by examining your best tours to see how closely they actually follow execution order (if at all). When I try to understand a codebase, I do trace calls manually, so there's probably some similarity.

Ah OK cool, apologies for misunderstanding you. Now that I’ve got the core record/playback experience, I’m keen to explore ways to simplify the authoring and maintenance experience even further. Enabling a “code profiler” for recording/updating tours could definitely be really useful. Thanks for the feedback!

No worries, think your interpretation of just api calls is a good one, a bit like "tests as documentation". Could then have another tour for each module (api implementation). This approach would tend to confine the effect of changes (as in Parnas' On the Criteria To Be Used in Decomposing Systems into Modules https://www.win.tue.nl/~wstomv/edu/2ip30/references/criteria...)

Though multiplicity of tours is also complexity!

Awesome. Please create it for Jetbrains IntelliJ platform. Software tools are trending, you can charge money for it. I'd like to use VSCode but Webstorm is just so much more ahead. Most people and companies (who use them) already pay for Jetbrain tools.

Thanks for the feedback! In addition to CodeTour, I’m in the process of building out a couple of other tools (see below) to support better team collaboration, onboarding and knowledge retention. So I’m focusing on VS Code first, and iterating on feedback, before tackling other editors. That said, any thoughts on the usefulness of these solutions is extremely helpful, as I prioritize my backlog.

Note: Other side-projects on my journey to improve the holistic developer experience:

1. GistPad - Developer library for managing code snippets, documentation and interactive CodePen-like playgrounds. All built on top of GitHub Gists (https://aka.ms/gistpad)

2. Live Share Spaces - A virtual team room for connecting with other developers, and being able to seek and provide in-editor assistance in real-time (https://aka.ms/vsls-spaces)

This is so amazing I'm going to switch from Atom because of it! Thank you for making this!

Thanks! If you get a chance to check it out, please don’t hesitate to send any feedback my way. I need all the help I can get in order to ensure this experience is awesome :)


Going to second the request to bring this to IntelliJ. Everyone I know at my current and previous jobs would use the hell out of this, but so much of JVM development is on IntelliJ now.

Awesome idea!

Thanks! Out of curiosity: in the interim of having an IntelliJ client, do you think your team would use a browser-based client and/or integration that was built into GitHub? I’m working on this right now, and I just wanted to check with you on the applicability of that for your specific team.

Yes, no doubt.

Slightly behind now at least in terms of this plug-in.

In what terms VS Code is behind?

For my work it's PHP support is far behind intellijs and I say that as a massive fan and user of both vscode and intellij (in fact I use both in the same codebase because vscode does typescript better as well as I prefer it's git support and plugins).

Also intellijs support for databases and symfony is vastly far out ahead.

Love it. This is such a good idea.

The only limitation is that it's VS Code only. But, the JSON output means I could easily write a script to produce HTML that could be hosted for other devs to view.

Thanks! I’m keen to explore integration with GitHub in the near future (https://github.com/vsls-contrib/codetour/issues/10), in order to provide a browser-based “player”. That way, the knowledge in the tour isn’t limited to any one editor.

That said, I wanted to record the tours in a simple JSON file, specifically to enable interop with other tools. I’d love to hear any feedback you have on the experience and/or the format!

I would love to see it being available for GitHub/GitLab. Personally I use Sublime Text now.

That's a surprisingly nice approach to the problem.

Thanks! I took inspiration from the way I’ve seen some devs “document” their PRs: submit the PR, add a description, and then seed the PR with a handful of comments that call out the most relevant “markers” for reviewers to look at.

I’ve always liked that approach, and thought it could be cool if you could do the same thing for any body of code (not just PRs!), and also enable the comments to be ordered, so that the “tour” itself is entirely guided.

So awesome! Actually, working on something somewhat similar called Codeflow (https://usecodeflow.com/). Right now, we have a web app where people can create these walkthroughs/tours but wanted to ultimately create an IDE extension. Love what you've done thus far.

Oh cool! Yeah, we are definitely kindred spirits :) I wanted to start with VS Code (in order to scratch my own itch), but I’m now working on a browser “player” and looking into GitHub integration (https://github.com/vsls-contrib/codetour/issues/10).

I love seeing other folks investing in this space. Thanks for sharing!

Sounds like a great idea! I'd love to use this from the terminal with vim.

I’ve never built any tooling for Vim, but I’m really inspired by the challenge :) Thanks for the suggestion, and stay tuned!

Note: If anyone’s reading this, and has experience in Vim “extensibility”, I’d like to collaborate: https://github.com/vsls-contrib/codetour

Love the idea! Maybe a suggestion with the "bit rot" problem (getting out of sync with code). Code tours will likely cover the most important parts of the codebase, which should be well tested in large projects (and large projects are the ones that need the tours). You could link the comments to those tests, and if any of those tests are changed, you will be advised to take a look at the tour comment. These tests can be also discovered automatically, as there exist code coverage tools.

Ah I really like this idea! Currently, a tour can be associated with a specific commit or tag, to enable it to be resilient to code changes over time (e.g. minor refactorings that don’t fundamentally change the value of the tour).

That said, there isn’t a way to automatically know when a tour should be updated, after a significant enough code has changed. Being able to bind a tour to one or more tests is a really interesting idea, and something that I’ll try to explore this upcoming week. Thanks so much for the feedback here!

How would you associate with ruby tests vs go tests vs my random test library that doesn’t have a great framework?

Great question! I’m not currently sure :) I’ve been fairly deliberate about building CodeTour in a language agnostic way, in order to ensure it could be applied to any file type within a codebase. This is partly why I haven’t based the tour definition experience on code comments: https://github.com/vsls-contrib/codetour/issues/38.

That said, I’m trying to keep an open-mind when it comes to increasing tour resiliency, since it may require language/platform-specific solutions. I’m not sure. If you had any thoughts, I’d love to hear them!

Another potential solution is to have a CI task, that could check to see how far the code has deviated from the original commit that the tour was recorded on, and notify you when the deviation crosses some threshold. Maybe that’s a terrible idea, but something like that would have the benefit of being language-agnostic.

Lots of exploration to do here!

For somebody who has to do "company" or "engineering team rescue", this is awesome and hugely important to me.

I’d love to hear more about “engineering team rescue” :) Are you referring to being suddenly dropped into a project that needs help (e.g. because it’s behind s schedule), and having to quickly ramp up?

this is great idea. you need guided tour when you visit a museum as vast as louver

The analogy of a museum tour was exactly what inspired the name! When you want to really learn about something, it’s hard to beat the value of a guided walkthrough, that can provide you with just enough focus, while still allowing you to explore ideas on your own.

Cool. Having this for open source projects would be great way to learn about that codebase and also programming and patterns in general.

That’s my hope! I read a lot of OSS code, and so I constantly struggle with understanding how to get started. Assuming folks find something like CodeTour helpful, I’d love to get support for advocating it and bootstrapping the ecosystem with tours for popular OSS projects.

This is pretty cool. Hope GitHub implements this idea!!

Me too! I started this discussion with Nat Friedman (GitHub CEO), and I’m excited to see what can be done here. Any feedback/interest from the community might help make this a reality :)


Looks really cool.

Why can't you create a documentation file which explains the code and how to get started. Tell me an idea more simple than this

Definitely not alone - great idea

Thanks! This HN thread has definitely provided me a major boost in confidence that this is a worthwhile problem to solve.

Awesome tool!

Very cool!

I'm working on a morse code iOS app that can optionally use Force (3D) Touch. My dad had a stroke recently and is quarantined in a care facility. He can't talk but remembers morse code like a boss. He can't lift his finger off the screen to "tap" and there were no other morse apps out there for people with physical impairment. With this he's able to communicate... I've been coding for 15 years but this was life changing when it worked!

FOSS: https://github.com/zackb/forcecode

only TestFlight right now but AppStore soon under the name "Force Code"

Congrats on the awesome project. In case you haven’t come across it ACAT [1] might be of interest, especially if word rate becomes a limiting factor in communication for you dad, or if he’s interested in using a computer. ACAT is highly flexible around how it accepts inputs. To that end, you might be able to take what you’ve done and open it up to a broader audience by writing an extension that turns the iPhone with force touch into an input sensor, with or without the use of Morse.

[1] https://github.com/intel/acat

Thanks, and thanks for showing me ACAT. I was not aware of it but looks interesting.

Hi, so many cool ideas in this thread but this one really caught my attention. As it happens I've been investigating solutions like this under an open source basis. Unfortunately I didn't pay enough attention to "marketing" so I was forced to quit due to lack of funding. Now my hands are sort of tied by a greedy copyright clause in my contract so I'm not sure how much I personally can do as I really want my work to be in the hands of the public.

One of my ideas is exactly using morse but more as a means to demonstrate a way to make a new input method learnable via a HUD.

Apart from that I'd think you might be interested in Dasher and that you should check out the solution that Stephen Hawking is using which happens to be open source.

I've put in quite a lot of thought into this and would love to discuss this with you. If you are interested please use contact form in link below or let me know how I can contact you.


This is fascinating. Dasher looks incredible. I was wondering where this thing was going to go (if anywhere) after this initial "use case". I will definitely look into this.


Hi, thanks for sharing your website. You are doing interesting research.

hi thank you please fill in contact form and hop into the discord

Consider an Apple Watch app/extension?

It could be incredibly useful to not have to pick up a phone to use, and probably would not be that difficult to port over.

If you need help, testing or advice feel free to contact me at claire at theoic dot me. I’m a bit of a Watch nut and do iOS programming for a living.

Really interesting use of force touch, not sure if the repo is meant to be public already, would suggest you add a README with a short description of the project, installation instructions and so on. Best of luck!

What a wonderful thing to read on another day in covid isolation. Very sorry to hear about your father. He must be beyond delighted with what you've made.

This is incredibly creative!! I wish you and your father the best of luck!

This actually sounds like a great way to learn it too, will be looking out for it in the App Store!!!

That's so awesome. Is he a ham?

Yes he is. Thanks!

Sorry, but what is a ham? a HAM radio operator or something else?

The 'ham' in ham radio seems like it should be an acronym or something, but it's actually just short for 'ham-fisted' and doesn't need any special treatment. :)


Yeah, IIUC, HAM dedicates its lower-fidelity bands to Morse.

Correct (a HAM radio operator)

This is so cool! Love the name too.

Thank you all so much for all the feedback. This community is amazing.



I'm making a new tool for writers. With it, you'll be able to write your essays on "layers"

The problem? Tweets are easier to read than long-form essays, as they require less time commitment. If the content is not good on a long-form article, you'll find out way too late. With this tool I'm developing:

Layer 1 is the shortest version of your essay, the 1 min read — like a tweet. The idea boiled down to the shortest version

Layer 2 is the same text from layer 1, but with extras added here and there. What's already read by you is in black ink. What's new is in blue ink. This is the 2 min read version

Layer 3 shows everything from Layer 1 and 2 in black ink, but what's new is now in blue ink. and you keep doing that until you get to the full version.

I can post some screenshots here of my mockups, as I'm a designer. PM me if you find this intriguing!


Edit: Since people are showing interest, here's how I see it happening — https://invis.io/GQWINO2YKU2#/410298082_1_Min_Verison

The first thing that you see is the first layer (1 min version). Go right for 3 and 5 min version!


Edit 2: since I'm seeing the upvotes and the emails, I quickly made this sign-up form for the people who want to be updated when the product is done: https://layered-ink.webflow.io/

I would put up the https://layered.ink link but the domain hasn't been propagated yet.

@Admins — please do let me know if this is not permitted so I can take it down. Apologies if so.

Can anyone vertically scan 5 paragraphs of a long-form article in a 1-2 seconds and ambiently detect keywords that signal relevance? By ambient, I mean in your peripheral vision, without even knowing which word you're looking for or actively reading anything.

I acquired this "skill" about a year ago and now I use it when I suspect there may be filler text or introductions to concepts I already understand. I know it works because I uncannily land on interesting but otherwise nondescript passages. When I scroll back up I find that I did indeed skip the filler. Of course, this isn't voodoo or unprecedented. It's just funny that I've read so much that my brain basically has a regression model for various semantic characteristics that signal novelty by subject area. I suspect I also switch into different modes of scanning based on the writing style detected.

Because of all that, I actually prefer no layering at all. Reuters articles are Layer 1 compressed and I find them annoyingly curt.

Isn't that called skimming, as in "I skimmed the text"?

I was also thinking about how useful chdaniel's layers would be when it's more practical to just skim.

Maybe, but I never did this before I tripled my reading volume while attending grad school. I thought skimming entailed some degree of horizontal scanning and actively reading clusters of words. In contrast, I don't even read clusters of 2 words when I do this and I don't aim to comprehend the skipped passages. I just automatically detect that they're irrelevant with basically zero reading.

Experiences may vary, but I've always thought of skimming as something that could be done to various degrees. Some examples:

1. going through pages looking for interesting textual structure, for example dialog, numbers, capitalized words, etc.

2. scanning vertically or in a zig-zag fashion, using peripheral vision to look for interesting words

3. reading the first few words of a paragraph before deciding if it's interesting or not.

I normally use 2, then try 3 on a paragraph that passes 2's test. Only when a paragraph passes test 2 and 3, do I decide to read it with my (temporary) full attention. 1's more for when I'm searching for something in particular or when what I'm reading follows a specific format and I'm interested in a particular section with distinctive structure.

Wow. That is next-level stuff. This makes me think of how varied the human experience is, even for the most routine activities. I've heard the word "skimming" hundreds or maybe thousands of times before, but clearly it is an inaccurate abstraction of the range of things someone can mean when they say "skimming."

Here is a dictionary definition: "The action of reading something quickly so as to note only the important points." If you showed me that definition before I started skimming effectively, I would have never extrapolated the behavior you described from that word. I didn't even know people did that.

-----------Idea ----------

I could see value in an open source "Verbose Dictionary" where people from all ages and walks of life would be able to add their own definition of a word. There would be two general rules when someone makes an entry:

1. They would need to write their definition in a verbose manner. I'm thinking at least three sentences, usually more, but also not as long as a Wikipedia entry.

2. They would need to be as open and honest as possible, so that there's minimal translation loss between what they're communicating and what we take away from it. Importantly, the word needs to be described with both intellectual and emotional cues. The author would also be able to let the reader know basic facts about themselves, and there would be some mitigations against trolling or fakers.

Over time, we would converge on a more universal language to maximize our mutual understanding of what each of us truly feel and think about words, concepts, current events, people, etc.... There would also be optional tags and slider scales (i.e. 1-10) that can attach to each definition so that you can correctly communicate the breadth, depth, and magnitude of your thoughts and feelings on the word. Critically, the goal would not be to achieve groupthink and converge on the same definition. That would be contrary to the premise of the Verbose Dictionary.

Not only would I find such a dictionary highly captivating, but I also think that it can serve as a useful tool for conceptual mapping or things like artificial neural networks. What are your thoughts? Would you use the Verbose Dictionary?

> What are your thoughts? Would you use the Verbose Dictionary?

Isn't that Wikipedia?


> also not as long as a Wikipedia entry.

I don't get the issue. What would this Verbose Dictionary provide that Wikipedia doesn't? You seem to want to differentiate it for no reason.

It would have a different aim than Wikipedia. Wikipedia, is, above all, a source of knowledge. That's where I go when I want to chain-read about the German Revolution of 1918. It doesn’t have the emotional and sociological theme I’m describing.

VD on the other hand would be a real-time, K-clustered map of the human experience. If you go to the entry for Love, you’d see people from all over the world sharing what loves mean to them, anecdotes and all. Crowdsourced feedback would surface the best entries to the top. Political topics would solicit good-faithed micro-blogs with the specific aim of humanizing each other and learning why we believe in what we do. You'd be able to go back in time to see where people's minds and hearts were at on a certain day. Algorithms and strict moderation would ensure that diverse viewpoints are shared and treated in good faith. Maybe I’m wrong, but I think if there’s a dictionary for swearwords, there’s certainly room for a dictionary of the human experience.

Check out the below study published in Nature in 2019. It makes the claim that cultural values of openness are what give rise to democracy, rather than the other way around. We need more of that openness right now. https://www.nature.com/articles/s41562-019-0769-1

The purpose of a dictionary is to define words. A word with a thousand definitions is not defined. It is the opposite of defined. If a word has several diverging meanings, perhaps it's time to create more words.

Super valid point, especially if you're doing what keenmaster does (wish I could tag him). Frankly, I've got FOMO when reading long stuff that interests me

I wrote something similar for HTML:


Fun. Quick user experience report: expanded "atoms"; the effort to find the boundaries of the insert was a thing ("where does it start? ah, next sentence.", "oh, I read this bit already, so I'm past the end"); with an unmoved mouse, I tried clicking "atoms" again to toggle the insert away (didn't). Perhaps color the insert?

Thanks for the feedback, there should be a fade-in when the text expands, and there's an option to click again to close, but IIRC it's not enabled on the demo. I think it's unnecessary, but it bothers some people to not be able to close them back up.

Fantastic Stavros, I love it. Really close concepts we've got here. I thought of exactly what you've made a few weeks ago as... some words may or may not be jargon on a certain "layer" of my essay format. So what I barely-envisioned was what you put into code here. Will bookmark it!

Yep! I was thinking of your layers idea while making this, but I wanted something that would allow people to drill down in a more targeted way (rather than generally). You're right that it usually ends up being used to define terms, but you can also use it well for elaboration.

I think the ideal solution will be some sort of middle ground, where much of the drilling down will be done by clicking on specific words or sentences, but a slider would allow you to generally expand more text in the article, for things that aren't exactly related to other sentences.

It does take a bit of doing, and drilling down to another layer should also be able to delete text, so the wording flows better.

If you want to talk about this more, feel free to send me an email, I find this problem very interesting.

I would love if it was possible to further drill down into topics (expand further in already expanded parts, up to many levels deep). I long ago hoped to one day try and write an explanation of Shor's algorithm in such a format, with many layers of drill-down available, to cater to readers of different level of knowledge. And also with possibility to collapse back the areas, and bookmark a specific expanded state of the text.

You already can, just put a link in the expanded text.

Is collapsing back also allowed, as I asked above? :) didn't see any clear UX indication to that effect :( Not 100% essential (I guess could try adding it as a contribution if the others were there), but how about the bookmarking aspect I also mentioned?

You can collapse back, but you can't bookmark right now. I guess you could do bookmarking with anchors, but what's the use case?

This is a super neat tool—thank you for creating and sharing!

Thanks, I'm glad you like it!

Neat. This layered approach reminds me of the Youtube series where they explain concepts such as Quantum Physics at 5 different levels of difficulty (children, highschool/uni students, phds, field experts).

I didn't know that exists — will look it up. Yep, similar concept... I believe we always learn on layers (when we actually learn).

This is great!

Two quick thoughts...

- Maybe limit the layers to 2 or 3. The example you've got in Invision has eight layers which gives me decision paralysis. Do I want the "7 minute read" or the "9 minute read"?

- Consider an onramp from Twitter. People have big audiences there, and linking out to Layered Ink might work better than those 50 tweet "tweet streams".

Thank you DCP!

1. Hmm, the concept is: the more layers, the better. This way, you can easily "pick your pace" — if you're short on time and you want easily-consumable content, you'd skim through 1-min versions. Ideally if at a certain point you say "holy shit, I love this", you'd jump to the full version. The same way you don't just skim your favourite author's new article — you just go to the full version because he's proven time and time again that his style fits you

2. Oh? Tell me more? Did I get you right: showing it to ppl who bypass the 280-char-limit by having tweet storms/threads? P.S: I love Dynasty's logo!!!

Regarding (1), I think the parent still has a point. Too many choices can be stressful to the user.

How do you expect users to interact or choose between "7 minute version" and "9 minute version"? Are they expected to keep going up the layers one-by-one until the full version is reached? Are they expected to intuitively know which one to pick right away?

Too many possible choices also delegates the responsibility of picking the number of layers to the writer, which can be an additional stress.

If possible, I'd try to figure out the most common use cases. The interactions and thought processes of users should be intuitive and dead simple to describe. Other than, loving the idea! Keep it up. :)

Valid point, thanks for pointing it out. I'll keep thinking about it — if you come up with a solution, please PM me!

What if the options got progressively longer? Eg: if you select the 5 minutes option, only then does the 10 minutes option appear.

Nice idea. Thanks!

So you think people would open another website and then go through the "expand" rather than just do it on the website they are currently on? I don't think so to be honest.

What you propose as the system for writing makes sense and probably reflects what many people already do.

However, I am curious to know what the tool does. In other words, someone can write a “thesis statement”, and the bullet points and an outline and so on in any editor. What does the tool do?

For what it's worth, a lot of good software doesn't actually "do" many new things, other than support a specific use case with a better UX than existed previously :)

It'll be a writing tool. A text editor. There are two ways to approach it:

1. Let people brain dump — write everything they've got on their mind. Click "new layer". Cut down on words. Click "new layer". Cut down on words. Repeat ad nauseam.

Every time you write a new layer, you're "editing" your article so as to make it shorter. In other words, you're purifying the idea

2. Let people write the shortest version, then write a bit more, then more.

I'm doing way #1 because I feel like road #2 prevents people from having a brain dump... I'd lose all my ideas if I'd start with the short verison

My usual approach tends to interleave the two approaches. Brain dump, then distill, then reconstitute in a more crystalline form. Repeat as necessary.

Mine as well. And the process is often lossy, so I'll keep old variants easily at hand. Eg, "this old variant had nice property X, or dealt with Y well, or suggests a direction worth exploring".

Brainstorming, well, "everything is a graph", but perhaps picture versions as columns, with each column showing all three levels, with transclusional editing between them. So you do a pass in whichever direction, create a new column, and easily grab material from previous variants, and repeat.

I'm sorry, your second paragraph is too ahead of me. Can you explain it in a different way please? I believe what you're saying is interesting

Very np, sorry... I was picturing a screen divided into columns. Each column is a stack of the various distillation levels, say most brief on top, then less brief, and so on.

Within a column, editing text in any level, updates that text in any other level it appears in. That's the transclusion. Since all levels are said to have a copy of the most-brief text, if you edit that text at any level, it's updated at all levels of that column.

When you start on your next column/version, you can easily grab text from previous version(s). And you might do that in different styles. If you copy the entire column, then the old column serves as a checkpoint, and the new for continued editing. Or copy just the most-distilled level, and then work downward, to "reconstitute in a more crystalline form". Or copy just the least-distilled level, and work upward, attempting a new distillation.[1] Or do these from a previous column instead of the most recent one. And the old columns/versions are easily accessible, to browse for inspiration, or to grab stuff from.

[1] Hmm... I guess copying just the least-distilled text "forgets" what the embedded more-distilled text regions of it were previously? So one can try playing with the "brain dump" text again, without the distraction of the previous distillation choices. Then when things settle down again, one can highlight "this region is now distillation level n".

For UI, perhaps highlight-drag-n-drop to more distilled levels? And if each level say had an "X" - delete and forget this level, then the different styles of column copying unify as "fully copy a column, and then X-away the levels/region-choices you want to discard"?

So... All levels simultaneously visible, and easily edited together. And multiple versions of them easily accessible, in support of doing multiple exploratory passes, with each pass able to easily draw from past attempts. And working in either direction: towards distillation, or building out from a more distilled seed.

That was the brainstormy vision anyway. I've no idea if it would work out. And for whom - I work in both directions (back and forth), and like a visual record as external memory (so I don't have to remember things or take notes), and like to just see it all (without view switching). Someone who doesn't want those, might find multiple columns to be distracting clutter.

Recommend: Gingkoapp

https://gingkoapp.com/ Saas, desktop in sign-up beta.

Thanks. The bottom-up journaling style of use [1] looks tempting, even as someone who generally prefer graphs to trees.

PayWhatYouWant[2], but it seems a pity one can't easily play with it non-persistently, without giving them an email? I wonder if that's the right choice, funnel wise, for something with a "I can't imagine using it... oh, that's neat" dynamic.

[1] https://www.youtube.com/watch?v=sT_nlmlbhBE [2] https://gingkoapp.com/p/pricing/

This sounds fantastic and is something I've broadly thought about for a little while. A little while back someone else linked a concept of what they had done with the idea -- initially a paragraph was visible, and some of the words had a coloured box around them which could be clicked to expand on that term, all the while maintaining a flowing prose. So, I think a slightly different goal to what you describe (it was technical writing I think?), but similar enough. Unfortunately I didn't save it and have previously (frustratingly) spent at least an hour looking for it to no avail, so I'm curious to see what you come up with!

If you ever find it again, send it my way please! Indeed it would be very useful for complex concepts explanations. Definitely not do tabloids or anything like that — though it might... help?

It's essentially the reverse way of what writers do when they write the "tweet version" of their essay — plus all the steps in-between.

@ your last sentence: Check out the InVision link I've added to the OP!

I believe what you’re looking for is: https://www.telescopictext.org/ (demo is at .com)

Reminds of https://getcoleman.com/ and this other portfolio site where each word could be expanded until it became a large essay, can’t remember its name.

FWIW, this is how news articles are classically structured. They are initially high-level, then dive more and more into detail as the article goes on.

Sure! Or how TLDRs work on Reddit. The problem with news and reddit's TLDR is... there's only layer 1 and layer max. What about the in-between?

Here's how I see what I described earlier: https://invis.io/GQWINO2YKU2#/410298082_1_Min_Verison

What do you think?

Actually, I think news articles don't have clear-cut layers, but actually gradually dive deeper.

The original motivation were layout considerations for printed newspapers with limited space. With all articles pre-written like that, an editor could arrange them on the final layout and cut off at the end at discretion while keeping the most relevant information given the remaining space.

That being said, I don't think this is too relevant for your idea, just an interesting anecdote.

Have you seen axios.com? They used to start off with short paragraph bites of news articles, that had a one-click go deeper to an in-depth piece.

I didn't but I just checked it out now. Interesting I'd say. They have the short-version and the max-version. The tool I'd like to make would allow you to do all the in-between versions

I like this idea! I'm building something similar at www.sivv.io. This is a community for knowledge sharing where users share / discover summaries of useful knowledge, ideas or advice that are structured into sections (e.g. background, key point, examples) that can be hidden or revealed depending on the preferences of the reader. The idea is that this forces authors to remove any 'padding' and allows readers to consume the key points as quickly as possible, in theory learning more while reading less. We are currently focusing on the topics of business, behavioural science, personal development, professional development, science & technology and wellbeing. You can sign-up to the beta version at www.sivv.io - any feedback would be much appreciated!

I've built something like this called "stretchtext" 3 years ago :)

Here is the demo (use the slider displayed after the first blockquote in the article) https://blog.fgribreau.com/2014/05/fr-stretchtext-retour-sur... or click on "[+]" to go deeper.

The code is open-source here: https://github.com/FGRibreau/stretchtext

I actually prototyped something similar a long time ago. It allowed for authoring sections that could be composed into a number of views. The only difference is that I wanted to capture the different "facets" of an article as well - so you could have a tab that was just the code, or statistics, etc.

You could possibly prototype with a coda template: https://coda.io/@michael-joseph-rosenthal/essayist

I had a similar idea for wikipedia.

Example: a short, by words, explanation for a mathematical problem. Then, the possibility to expand to a more rigorous proof and so on. This would enforce a top/down type of writing that could benefits audience and creators. I believe it would easely apply to the great part of knowledge sharing.

Neat implementation.

This would be really good for scientific articles. They are usually filled with (important but noisy) details that make skimming them for the important ideas difficult.

This is how pretty much all writing online should work IMO, even if the 'deeper' version is just a citation or link to respected expert work. Some combo of this tool for writers and community-annotations a la Genius.

Tell me more about the a la Genius idea? I know, love and constantly use Genius.

Here's how I see what I described earlier: https://invis.io/GQWINO2YKU2#/410298082_1_Min_Verison

What do you think?

When I read your description I immediately thought of the Minto Pyramid principle. Your tool is like a technical way of solving the same problem :)

I've never heard of it but I looked it up for the last minutes and indeed, Layered Ink would be one of the ways to respect the pyramid principle! Thanks for pointing it out. Where have you heard about it?

Sounds really cool and reminds me of Jason Fried's writing class idea[1]. I think many people still view Twitter's brevity as stifling nuance instead of forcing users to really refine their ideas.

[1] https://m.signalvnoise.com/the-writing-class-id-like-to-teac...

Wow. I love JF and, obviously, this idea as it's pretty much on the same basis. I'd love to send it to him as soon as it's done — if anyone can help me with this, I'd be more than grateful

Indeed, there are problems with Twitter, people take stuff out of context and like to be polarized. But just as you say, the first time I interacted with Twitter I loved the fact that I had to "distill" my idea.

Kinda like Vodka — you keep distilling alcohol until you purify it (or close to it). What you're saying about "refining the idea" made me think about this.

“Oops! Something went wrong!” When trying to sign up.

Can layer 0 be sources supporting the content?

The way I'm making the tool, yes, you could do that

Works for books? Tolkien too long a read.

It'd be an immense effort to do it for a book but it surely could work

I'm working on promnesia, a browser extension to enhance web browser history.

It allows you to answer different questions about the current web page:

- have I been here before? When?

- why have I bookmarked it?

- how did I get on it? Which page has led to it?

- who sent me this link? Can I just jump to the message?

- which links on this page have I already explored?

- which posts from this blog page have I already read?


I had this exact same idea and built a very simple tool that lets me input three values: link, difficulty/rating and addendum. The uniqueness is based on url and I do keep a count of if I have been there before. I update the timestamp to last visited. I built a simple UI for that as well though looking at your project it seems much more exciting to have the full visiting history as well.

I love this idea and I would def use it! It would be also great if it would work backwards aswell, eg.: "the link I found on twitter last week"

Yep, that would be fairly straighforward to support as well!

Why do you need this information? What's the use case?

Vaguely -- being better at 'information processing'.

If you think about a typical person's browser bookmarks -- it's a mess with no hope of ever catching up, prioritizing and sorting. My project could help with that :)

A timeline?

I've just finished off (within the last hour) my version of Asteroids, which I started with enthusiasm two or three years back, then did very little after getting the basics of the game working:


Now I've finally added all the stuff I wanted to (black holes, satellites, power-ups) so it's time to pick up another project I started a long time ago and haven't really done much on: my very own version of Space Invaders:


(WARNING: this one is barely functional - e.g., no levels, no shield damage, no scoring, invader firing pattern is all kinds of wrong, invader movement isn't quite right, etc.)

Enjoy! (And feedback welcome!)

The Asteroids game!

The graphics, music, and fluid motion are awesome.

But, I cannot stand the controls. I play games like this laying down in bed at night, which automatically disqualifies this, due to its use of my phone’s orientation. Also, the modal disconnect between throttle and direction controls (one orientation and the other button press) is too much to tie to muscle memory.

If you added an “old fashioned mode” with a dynamic joy stick (appears where ever I put my thumb within one bottom quarter of the screen) for direction and throttle (touch to engage and drag a small amount in any direction to turn and go that way), that’d fix the issue. NOVA 3 for iPhone got this 100% right. Then, a small section on the other bottom corner could house a few small buttons to engage hyperspace, etc. Just let the user pick right or left handed.

Thanks, that's a great suggestion. I had tried pure touch controls before but it was a complete disaster. Never even occurred to me to anchor against a dynamic touchpoint. That'a a really interesting idea. I'll have to check out how it works in NOVA 3.

Tbh, I've never been satisfied by the tilt controls: they work OK if you're stood or sat still, but if you need to move at all whilst playing it can end up going sideways. I probably haven't yet put in enough effort to get them to feel right whatever the ambient orientation of the phone.

I think you're right though: I need to offer an alternative control scheme that's touch only.

Hey, thanks for the reply. Awesome, I’ll be excited to give the new controls a try!

Really like what you did with little red enemy ships starting as 1 whole and then splitting, very cool.

Thanks, appreciate it.

I'd love to take credit but sadly not my idea. Very similar ships featured in Asteroids Deluxe back in the day: https://www.youtube.com/watch?v=GGODay8YUio. I thought it was a great idea so decided to add them to my version just because they add a bit of variety.

  All my lonely days are over

Haha, I know, right?

I licensed that from PremiumBeat. In the early 2000s a friend introduced me to Cowboy Bebop, which I've been a massive fan of since (along with its excellent soundtrack), so I wanted some music with the slightly whimsical, slightly melancholy feel some of the tracks in the show had for Game Over.

Gonna finish up work on my 3D marble madness clone now. Thanks for the motivation!

Yes! Do it! And please send me a link when it's ready to play test.

They effects are quite eye catching. Can you tell us how you built them?

Thanks, and for sure.

It's all Canvas 2D at the moment. There's no WebGL because when I started this project, three or four years back, it simply wasn't ubiquitous enough. Now I might be able to get away with WebGL (using something like Pixi.js) but the last couple of weeks I really just wanted to finish what I started rather than do a relatively substantial rewrite.

Another quirk is that I'm not using (think Phaser.js, or whatever), because the two main motivations for starting this project were that:

1. I wanted to get better at JavaScript, which I'd always been weak at.

2. I guess 2015/16 was a time when the front-end world felt like it was really going nuts with new frameworks coming out all the time, so I wanted to become familiar with and see what I could do using only APIs built directly into the browser.

Getting back to the point, in terms of effects I've implemented services to keep track of and render lists of particles and "clouds" of vapour. The latter are the translucent circles you see used to render rocket exhaust, and some of the explosion effects.

In fact I just use one service for both and swap in a different renderer for particles or vapours in each service instance, since both behave substantially the same.

Since we're throwing around hundreds or thousands of particles/vapours at a time, and, well, a lot of particles throughout the duration of a single game, and we want to avoid much garbage collection to keep the frame rate steady[1], I implemented an object pool so we're not constantly creating loads of garbage.

Particle types are defined declaratively: colour stops (including transparency), duration, expansion rate for vapours. Instances of particles also have properties like position and velocity.

Explosions are also defined declaratively with an arbitrary number of stops/events at timed intervals which define number and type of particles spawned, angular range through which they should be spawned, initial spread, velocity spread, any sound effect(s) to be played at each stop.

There's a service that manages explosions that you tell call with something like:

    explosion.create(explosion.playerExplosion, player.position, player.velocity);
This will create an explosion of type "playerExplosion" at the player's position and apply the player's velocity as a drift to all particles and vapours created.

So once I've defined my explosion types it's easy to use them in a variety of different scenarios throughout the game without cluttering the logic too much. The services take care of rendering and particle management so these days I don't have to worry about it too much.

Hopefully that goes some way to answering your question but if you'd like to know anything else, please shout.

[1] GC screwing the frame rate is a genuine issue I ran into with this, particularly on mobile devices.

Very nice!

Thank you - appreciate it!

Nice work.

Thanks - very kind of you!

super fun!

Thanks - glad you enjoyed it!

I can never get into games but within 2 seconds I was super into this game. I used to play asteroids as a kid, the version that came with Windows, but yours is way more intense.

That's great to hear - thanks. I've not played the Windows version but if this is it - https://www.youtube.com/watch?v=ym7dohuXRd8 - I think it's meant to be a pretty faithful port of the 1979 original.

nice work!

Thank you! Much appreciated!

Meta: This thread is fun to read, it's cool to skim through such a large variety of ideas and projects. I wouldn't mind seeing it as a monthly thing like the "Who is hiring?" posts. There'd probably be some overlap with Show HN, but I personally wouldn't mind if it's just once a month.

So... Maybe Ask HN next month, would you?...

I was going to write a snarky comment like: "This is what, Show HN is for..."

But then I realised this thread is different. This is for projects that are In Development, and people are more likely to post things that aren't finished, so I think that's a great idea!

lobste.rs does “What are you doing this week[end]?” posts every week or weekend, which are nice for that smaller community. I think once a month would be perfect for a community the size of HN.


and upvoted

I am developing a modern wikipedia interface - a Vuejs powered modern, single page, progressive, offline capable web application for Wikipedia. I have been working on this for last several months and have working version available at https://wikipedia.thottingal.in

Source code and more details available at https://github.com/santhoshtr/wikivue

It is a fully client side PWA application using wikipedia web apis, installable in desktops and mobiles and use like a native application. It has offline support - With the help of service workers, the application even works when there is no internet, provided, the content is previously viewed. It is a single page application - page does not reload when exploing wiki articles, presenting an immersed reading experience. uIt ses modern UI framwork Vuetify. Adapts to all kind of screen sizes. It presents an optimized reading experience with good typography and optimum page layout. Multilingual by default - All language editions are in single app. Using language selector user can select the language edition.

I wanted to make this as a p2p capable application. Currently it runs on dat protocol as well: dat://25689f3a757853a511474d38f0a6d6be2cd2b0cb161686d75fda5c1619137921(need beaker browser) or wikipedia.hashbase.io

This looks pretty clean! The enhanced readability reminds me of Wikiwand.

If you're looking for a bit of feedback, the search doesn't seem to handle fast typing well. I tried searching for "Frank Chu" and it seems that if I type it really fast, I get either no results or Franks that aren't Frank Chu. If I type slower, Frank Chu shows up.

Is there a way for it to simply follow the links when I click on them? It looks like I have to click twice to "Read Article" after clicking on a link.

here is a bug I just stumbled upon. https://wikipedia.thottingal.in/page/en/Tansu%20%C3%87iller

clicking Political Career on tree view raises exception from querySelector. probably parentheses breaking it.

That is awesome. Wikipedia definitely need a better editor too.

Props on using dat too. We need less centralization.

I really liked it, well done!

The layout reminds me of Everipedia, but cleaner. Well done.

Log storage and search system for structured logging data in Rust.

i.e a database optimised for logs and log-like data and nothing else.

Existing solutions are too inefficient for the use case of logs (TB+/day), suffer under high field cardinality, are based on costly and unnecessary full-text-search systems that aren't well optimised for logs data or just plain and simply can't handle structured data and degrade to simply storing lines.

Design goals are super efficient/fast, extremely fast distributed regex matching backed by trigram bitmap indices, columnar storage for compression and cardinality reasons.

I have a prototype of the indexer and lowest levels of the query engine and regex syntax to trigram query optimiser. Will be adding the ingress and query frontends hopefully have something to show soon.

I don't know if I am going to go OSS or not but definitely designed to be run on-premise though I could easily run it as a multi-tenant service if people are interested.

I founded my own startup in the past and have been putting off actually doing a real side-project for the last couple of years but could never get away from the itch so this is going to be my swing I think.

If this is something you find interesting hit me up, or if you are just frustrated with ELK for some reason or another let me know what you think sucks and I'll try build something that sucks less at that.

I have a similar goal with this project, a multi-tenant log storage and retrieval system: https://github.com/notduncansmith/loghive

The idea is to shard the data by logical domain and by time segment, so that queries only apply to relatively small and efficiently-read data, and to exploit the embarrassingly-parallel nature of the problem.

Yeah splitting by time is incredibly important.

I'm implementing the segmenting based on time + a sharding key to group together records of the same domain on the same shard. By sharing the shard with other domains it prevents having an overly large number of segments per time interval allowing them to be bigger and get better index density. Which is in important factor in my indexer design which amortizes the cost of the indices over large number of rows.

Storing logs in sqlite is definitely a neat way to go for smaller scale stuff, hope you make something cool out of it.

For me though I have faced this logs problem at very large scale numerous times in my career and have tried all manner of commercial and OSS solutions and have yet to be satisfied so my project is definitely geared to solving the sort of problems you have when everything else just either stops working or costs more to run than your actual app.

Not saying it won't scale down very well. I think my software on a single machine should easily handle atleast 20k+ logs/second (prototype is much faster atm but lots of features need to be added) and be able to serve queries concurrently with that on just a few cores and ~8-16gb/ram for say at 200-400GB dataset.

I think the 3 deployment sizes I will optimize for are single node for demo and benchmark purposes, 3 node for realistic small deployment and 6-10 nodes for high volume logging environments like my $DAY_JOB.

I'm definitely interested if you OSS, I wrote a little about it here: https://jonathanotto.com/linear-search-benchmark

Linear search approaches fall down when you have a lot of data and you only want to select a very small portion of it.

A linear based approach can get you to about 1GB/s or so per core with Rust.

A medium-ish size startup probably logs around 200GB/day of logs if they aren't very tight on their log volume. If you only want to search the last 24 hours that is maybe ok, you can search that in ~10-20 seconds on a single machine.

However this quickly breaks down when a) your log volume is a multiple of this and/or you want to search more than just a few hours.

In which case you need some sort of index.

There are different approaching to indexing logs. The most common is full text search indexing using an engine like Lucene. Elasticsearch (from the ELK stack) and Solr explicitly use Lucene. Splunk uses their own indexing format but I'm pretty sure it's in a similar vein. Papertrail uses Clickhouse which probably means they are using some sort of data skipping indices and lots of linear searching.

Of these approaches Clickhouse is probably the best way to go. It combines fast linear search with distributed storage and data skipping indices that reduce the amount of data you need to scan. (especially if you filter by PRE WHERE clauses).

So why not go with Clickhouse? Clickhouse requires a schema. You can do various things like flatten your nested structured data into KV (not a problem if you are already using a flat system) and have a single column for all keys and the other column for values. This works but doesn't get great compression, makes filtering ineffective for the most part and you now have to operate a distributed database that requires Zookeeper for coordination.

The reason I am choosing to build my own is that logs require unique indexing characteristics. First and foremost the storage system needs to be fully schemaless. Secondly you need to retain none word characters. The standard Lucene tokenizers in Elastic strip important punctuation that you might want to match on when searching log data. Field dimensionality can be very high so you need a system that won't buckle with metadata overhead when there are crazy numbers of unique fields, same goes for cardinality.

TLDR: For big users you must have indices in order not to search 20TB of logs for a month. Current indices suck for logs. I write custom index that is hella fast for regex.

My first thought was also Clickhouse and the problems with schema for dynamic log content.

Also, if this is open source, I will definitely be interested in checking it out/contributing etc.

Here's another thought--why not try to fix this in ClickHouse? It sounds like you are rebuilding Elastic.

I considered that but it's harder than it sounds. Clickhouse s very strongly coupled to the idea of a schema and it also is very coupled to only using indices for data skipping.

If I was to make the changes I want to Clickhouse, i.e schemaless and full indexes per/segment then it wouldn't be Clickhouse anymore.

How is it different from https://www.honeycomb.io/?

Honeycomb seems to be more of a general events database, ala Druid.

This is a more specialised system that makes stronger tradeoffs to achieve really high efficiency for logs data. Stuff like indexing a reduced alphabet and not optimising for pivots and other views that are important for generic event databases.

Additionally Honeycomb is a hosted service.

I definitely intend for this to run in your own environment, on your k8s cluster, VMs, bare metal - whatever makes sense for you. If I do run it as a hosted service it will come secondary to the primary on-premise distribution.

This is a real need right now. Humio has a fairly novel approach and it searches ~10TB data in subsecs.

Humio looks interesting but it appears to be a linear search approach. This is fine as I commented elsewhere and their numbers match what I was able to achieve with my linear based prototypes.

The reason I rejected this approach is it gets very expensive for large data volumes if you want your queries to remain responsive.

Say you want to search a 100 TB dataset (not as large as it sounds when it comes to log data...). You can do about 1GB/s/core assuming you have the data on local disks that can scan fast enough and each of your machines have 16 cores that is 16GB/s/machine. Lets say your query target time is 60s (pretty slow tbh, incredibly generous I would say).

The math then plays out like this. You can scan 60*16GB for each node in your cluster, i.e 960GB. You need to have that ~TB of data on disks that can read at 16GB/s which means you need it evenly spread across 8-16 very high end SSDs.

On top of that you need 100 of these machines to complete this query in 60 seconds.

Now Humio goes on to say you can store all your persistent data in a cloud bucket. Which is a good strategy and something I am employing too but if you have no indices you actually need to scan it all which means you are limited by how fast you can reasonably fetch the objects across your cluster and hence the speed of your network interfaces.

Say if you are on GCP which has a relatively fast network that seems capable of around 25Gbit to GCS most of the time and you actually get peak performance all the time (pretty unrealistic but ok). Then to fully scan 100TB in 60s you would need over 500 machines. If you are able to use Humios tags to reduce this somewhat say by only searching for errors and that gets you down to 10% of your total logs that would represent a 90% speedup. Humio sort of has quasi indexing in this way similar to Prometheus however they don't help you when what you are looking for isn't tagged to stand out.

This is why indices. Yes - indices are hard, yes they can have bad worst cases if you aren't super careful. However lets consider indices for this query.

Say your logs are syslog + some extra structured data. You have things like application_name, facility, level, etc. You have 100TB of logs to search, you are looking for logs by your main application which is say 60% of your total log volume, you are looking for DNS resolutions errors that contain a specific string "error resolving".

With my prototype my indices are approximately 5% the size of the ingested data. The raw ingested data is also compressed really highly, lets assume equal to Humio (it's probably higher due to file format but not important).

So the thing that jumps out here is we now only need 5TB of indices on our machines to reasonably find needles in our 100TB of data. Additionally our indices are split by field so if we know we are searching the message field we only need to load those. Indices for columns with low cardinality are extremely small, those for high cardinality much larger but capped due to various log specific optimisations I am able to use. So lets assume the message field makes up the majority of our index, say 80%. That brings us to 4TB of data we need to scan.

Now usually 60s would be super slow for a query for a system with indices like this, usually you would try make sure that 4TB is in RAM and simply blow through it in <1s across a few machines. However for comparisons sake lets say 60s is still our query budget and we don't have completely stupid amounts of RAM.

So we need to be able to scan 4TB/60s ~= 66GB/s which given our previous machine calcs with local storage puts us at ~5 machines.

However we could likely do this with even less CPU assuming our storage is fast enough as unlike an unindexed system like Humio we aren't applying expensive search algorithms to every row, we are simply crunching a ton of bitset operations in this case a metric ton of bitwise AND.

Anyway this is a long rant. The reason why is that many people always say "Can't you just do this fast enough with linear search" and I always have to reply "it depends on how big the haystack is". This quantifies what is too big of a haystack in a reasonable way.

Definitely interested if FOSS. Also ideally if it's possible to dynamically create & modify filter pipelines/complex searches in it.

What do you envision by filter pipelines and searches?

The query syntax I was thinking of focuses mostly filtering on and selecting data with regular expressions or equality matching on text and equality and range queries on numbers.

i.e something like this:

index_name.filter(message ~= "(hot|cold) storage" && drives.status > 100).select(drives.name, message, drives.status)

Or something more SQL like, haven't really settled on a query language yet mostly just been working on the indexing and the lower level parts of the query machinery that work with the index.

I will try to answer you on keybase, hopefully it will be easier to discuss details.

Would really be interested in the project. We’re building a tool to monitor model performance and data patterns and could see use of this.

How does this differ from Grafana Loki? This is a tool designed specifically for log indexing and searching as well.

It's similar in some ways except Loki went all out in abandoning indexing of the log lines themselves, a quote from the homepage:

> It does not index the contents of the logs, but rather a set of labels for each log stream.

This system is different because not only does it index the contents of your log messages it accelerates regex queries which are great for structured messages produced by machines.

Similar to Loki though it does index data by "labels", though not quite in the same way. Instead every field of a log is treated the same (except for time, it's special). Nested structures are flattened by path, i.e '{"message: "a", "structure": {"nest1": "b"}' is flattened to "message" and "structure.nest1". Under the hood it also stores column per type so if you later send a number at the same field you previously sent text it doesn't coerce like Elasticsearch. Instead depending on your query it will either search "structure.nest1"(text) or "structure.nest1"(number), i.e if you use a regex it will infer you are searching the textual version, if you attempt a range query it will infer number as the type.

Loki is definitely an interesting system and looks like it could handle great ingest volume but I don't think it could match my targets for sub-second queries on ~10TB of logs.


Please don't be a jerk in response to someone sharing what they're working on. If people can't share that without being jumped on, this quickly becomes a shitty place for conversation. Probably you didn't mean it that way, but it's difficult to gauge intent on the internet, so the burden is on the commenter to dismabiguate it.



Did you read anything I wrote?

I have used all of the existing solutions that can scale to the workloads I require. None of them are adequate, all have deficiencies either in cost, scalability, ergonomics or operability.

If you read the rest of my comments here it should be clear this is something I have thought about for some time and have a well reasoned architecture and completely different take on how this problem should be approached.

This is not about features, this is about fundamentally changing the storage architecture from layout of the log data itself, to the indices to the query engine, even the distributed system model.

Most of the ideas are stolen from battle tested systems like Druid which I have worked with extensively and other systems I respect like Pilosa, Ok Log (which was a project in a similar vein).

Have a little optimism, things can be better if we just sit down and make them so.

I bought an old trench coat from a thrift store and outfitted it with 350 programmable LEDs , an Arduino to control them, and a 24 AA battery bank for a music festival last year, it was a hit. I'm currently working on adding another 150 LEDs, fixing the power system (I burnt out the Arduino after a couple hours), and looking into adding a microphone and learning some sound programming to make the suit change colours to the beat for this year. I always wanted a living technicoloured dreamcoat.

Nice! FYI, you can use those USB phone-charging battery bricks for a nice compact 5V output in those sorts of projects. I think TSA-approved ones run up to 100Wh, and they're easy to recharge.

Most of them will cut power after a minute of so if you don't draw more than like 100mA of current, but that shouldn't be a problem with 350 LEDs. It'll save your Arduino's voltage regulator some grief, too :)

Ah that's a great idea for the Arduino itself! My board burnt out because my independant battery source for the board didn't work out so I wired it in parallel with my batteries (3x8 AA batteries in series put through a voltage adjuster). I'll be buying one!

Suggestion - it should light up when you sense someone else within 6 ft (or may be it already keeps people away)

This is super cool. If you haven't seen it already this session on wearables from Strangeloop is definitely worth a watch: https://www.youtube.com/watch?v=Fwnt2NFvBhQ

No way. Show me.

Sorry for the delay!

Nice! Only 5mos til Burningman.

Super sad, but id say Burningman is highly unlikely to happen this year

Did you make an account just to ask this comment? haha :D


Working on simeville, a little 2D Canvas demo that builds a town. You can try it yourself here: https://simonsarris.github.io/simeville/ (pardon the graphics, they're stand-ins right now)

Click to make buildings (above the tree line only right now) and click and drag the sun down to go to night. Drag the moon to return to day.

Gif of night sky: https://twitter.com/simonsarris/status/1235761030996901888

The point is to replace the background town that's currently on https://simonsarris.com (which is animated purely by CSS right now, including the birds) with a much more interactive and playful one. (the current site background gives you an idea of what will be built and why it currently only works behind the tree line). The time consuming part right now is making pretty graphics. I had begun with buildings made from Canvas drawing code, with procedural params and all that, but I'm switching to images because it will be much prettier in the long run.

Looking good: I should have read your comment in full before I tried it because I was sitting here wondering why I could only create buildings in the sky. Neat idea though, and it'll be awesome when it looks as good as the picture you have on your website at the moment.

Cool, would like to check it out, but doesn't seem to work on neither Firefox nor Safari: "TypeError: document.getElementById(...) is null" on line 58 in main.js

Not sure why, but it is very relaxing dragging the moon up going into night ;)

I just checked your demo. Good Job

I'm working on Polar:


It's been out for about 1.2 years now and we're really starting to nail some important features.

JUST about to post a new release now.

COVID19 is having us pivot a bit in that we're going to experiment with collaborative group reading in the hopes that students can work more efficiently with their colleagues without having to leave home.

The core idea is to have a fully-integrated reading platform sort of like an integrated development environment but for non-fiction material (textbooks, research papers, documentation, plus web content).

Right now we support PDF and web content but are actively working on EPUB and improved reading of web content.

The key functionality is built around annotating your documents and taking notes and building flashcards so you can maintain a personal knowledge repository.

It's also Open Source and supports cloud sync. We have a mobile webapp now and working on porting it over to Android soon.

Love the github-style activity tracker. It seems like such a small thing, but it's been a huge help in keeping myself in the habit of making regular progress.

Agreed.. the gamification can be huge. I'd like to keep iterating on this moving forward.

Wow this looks really good. It would be good to have integrated support with Kindle devices then I would definetly subscribe.

Can it be used without external services, like firebase?

You can use it as a standard desktop app without logging in but then you don't really have cloud sync.

Looks great!

I'm working on static analyzer for SQL: https://holistic.dev

It's a useful tool for DBA to identify issues in SQL queries automatically. Only 50 rules for now, but more than 1000 described in backlog :)

Funny, but initially this tool aimed at developers' needs.

I've made a lot of microservices, which only started the database queries. I came up with the idea of making a tool that would automatically generate all the microservices based on SQL queries. The MVP of such a tool was implemented, which reduced the developer workload by at least a quarter. In this tool, it was required to write queries in a certain way that did not make it universal and did not give a connection with the database structure.

The next step was to create a system that relies on the text description of the database schema in SQL format (DDL) and automatically understands the types that will return the SQL query. Such a tool can automatically inform the developer about possible errors on the interface between the application and database when changing the structure or the SQL queries themselves. It can also be built into the CI to provide the automatic code review at the version control system level and prevent the erroneous code from entering the repository.

But the developers did not appreciate all the advantages, as most projects are developed using ORM :(

But at the same time, DBAs expressed interest in implementing a system of the automatic search for bad requests already on the production database.

Any questions are welcome :)

That looks really cool. Any chance you can share the code? Or share more about what you did? Like what does those micro services look like and what do they do? Also what you say about type errors makes me think about static typing. Does it relate to that concept?

I too have been diving into writing raw SQL some while ago and I liked it.

Look at this, I've returned the ability to export types :) In beta now available renderers for flow and ts. You can see alive how the export results change when you add columns or change constraints. If there are no foreign keys using JOINs, fields can become nullable, etc. I made a simple example: https://holistic.dev/en/playground/5596577a-ad0e-40e6-a05b-e... Remove -- before ALTER and see how the result of the type export will change.

Thank you for the feedback! The service will be put into commercial operation as saas in the coming months. I do not plan to open the source code, unfortunately.

The idea is that the tool will only work with the sources of SQL queries, and I had to work hard to implement it.

The work consists of several steps 1) get the AST (Abstract Syntax Tree) database schema (DDL). At this stage, only Postgresql up to version 10 is supported. Soon I will deal with the parser from Postgresql 13. This is not a trivial task. I have to build everything from the source code of Postgresql :)

2) Building a database model. We parse all DDL-commands one by one and apply changes. For example, apply all ALTER TABLE to the table described above, add user-defined functions to the list of built-in functions, and so on. It is necessary to have a complete overview of all types of tables, indexes, and each table described in a DDL script.

3) get an AST query (DML) and build a model of the result. This is the most complex and interesting part :) The task is to get a list of field names and their types, which will be returned after the query execution. You need to consider CTE and the list of tables specified in FROM. You need to understand what function will be called and what result will be output. For example, function ROUND is described in three variants, from different arguments and with varying types of result, function ABS - in six variants. I.e., it is required to understand the types of arguments before selecting a suitable one :) In the process, implicit type casting is considered if necessary.

The same is valid for operators. An operator in Postgresql is a separate entity that you can create yourself. Postgresql 11, for example, describes 788 operators.

Various types of syntax are taken into account, for example - SELECT , t1., id, public.t2.name, t3.* FROM t1, t2, myschema.t3 - will be parsed correctly.

But even this is not the most challenging thing :) The most exciting thing is to be able to understand two things: A. whether each of the fields can be NULL or not. It depends on many factors, such as - how the JOIN of the table with the source column is made, whether there is a FOREIGN KEY, what type of JOIN is used, what conditions are described in the ON section, what is written in the WHERE conditions. B. How many records will return the query described in the NONE, ONE, ONE OR NONE, MANY, MANY OR NONE categories. Again, this is affected by the conditions described in JOIN and WHERE, whether there are aggregation functions, whether there is GROUP BY, whether there are functions that return multiple records.

This function, by the way, is also used in the first step - to get types for VIEW.

This was a brief description of the first part of the service ;). It can already be a service in itself. It is possible to generate types and all code of microservices, including JSON-schema and tests, based on DLL and DML set. But as I wrote above, most people prefer to use an ORM such as Django or RoR. :( For this reason, I've removed this functionality from a playground and will take it to a separate project when I get my hands on it. It will also include various tools, such as - information about all possible exceptions that may be thrown by request, automatic creation of migrations to CI if your DDL files are in the repository, whether there are unused indexes or fields and many other exciting things :)

And the second part of the service that I plan to promote in the first place is a tool to search for bugs, architectural, and performance problems automatically. The target audience here is DBA, who have to deal with forgeries RoR-developers and their colleagues :) This part is entirely based on all information obtained in the previous stages. Part of the errors can be understood by AST (linter principle), but the most interesting rules are based on knowledge about types, understanding of NULL/ IS NOT NULL, and the number of returned records.

There are more than a thousand such rules, but I suppose there will be about 5000 of them in the next couple of years :) Also described are about 200 rules that can lead to runtime errors, but they are not needed by DBA, because their job is to find problems in valid queries :) There will be more than 1000 such rules as well since Postgresql describes more than 1700 runtime errors.

And yes, this is all about Postgresql. After I start commercially using Postgresql, I plan to do the same for Mysql and then perhaps for Clickhouse if there are no special offers of cooperation :)

This is actually something I thought of doing, but you actually did it. It would be amazing if you can find a way to open-source at least part of it.

But my application was to be able to use raw SQL queries instead of an ORM in an application. By statically analyzing the SQL queries, then you can, in a statically typed language, automatically type-check the parameters to and results of the queries. Combining that and an IDE like the Jetbrains ones, which provide intelligent SQL autocompletion/refactoring, then tbh., I think it would beat an ORM in terms of ergonomics. Some people like LINQ or various Haskell/Scala ORMs because they are type-safe. This would be totally natural to those who know SQL without the downsides of being totally unanalyzed and type-checked.

It would be like the clojure library called hugs, but with static type-checking.

Look at this, I've returned the ability to export types :) In beta now available renderers for flow and ts. You can see alive how the export results change when you add columns or change constraints. If there are no foreign keys using JOINs, fields can become nullable, etc. I made a simple example: https://holistic.dev/en/playground/5596577a-ad0e-40e6-a05b-e... Remove -- before ALTER and see how the result of the type export will change.

This is exactly what I did in the beginning :)

But custdev showed that companies do not care about it.

They are ready to take Django/RoR/Laravel developers who know only ORM. Such developers are cheaper, they are easier to hire. Some people also think that it's faster to develop this way :) They prefer to start dealing with problems after they get on production DB.

They shift developer problems to DBA. There is research saying that the price of fixing a bug in production is on average 400 times the price of fixing it at the development stage.

Ok, the boss call the shots :) I had to make a pivot to DBA needs. It was a bit upsetting at first, but then I realized that it was even simpler. You need to do a lot more tools for developers to make them feel comfortable.

A few words about open source. At first, I used this AST parser for PostgreSQL https://github.com/lfittl/libpg_query But it's frozen on Postgresql 10. And we don't know if there'll be any updates.

No official AST parser for MySQL.

There's a hard way to get a bison/ANTLR grammar parser.

Two weeks ago there was a parser like this for MySQL: https://github.com/stevenmiller888/ts-mysql-parser.

There's also a parser from vitess: https://github.com/vitessio/vitess/tree/master/go/vt/sqlpars...

The only problem is that the grammar for these parsers was written by hand and is not related to the official MySQL repository. For example, the vitess parser does not support the syntax of MySQL 8.0, and MySQL 5.7 does not support more than 40%.

There is also such a tool https://www.jooq.org/. It supports some of its own generalized SQL syntax, which does not take into account the specifics of different databases.

Look, there is another tool that uses libpg_query - sqlc: https://news.ycombinator.com/item?id=21765689.

And the author of ts-mysql-parser also offers an analyzer: https://github.com/stevenmiller888/ts-mysql-analyzer There are only four rules at the moment...

Anyway, all hard work starts after you have the right AST parser in your hands. And if you don't spend all your time on it, it will be difficult for you to do something really interesting :(

PS: I removed the types exporting tool from the site a few days ago, along with texts explaining all the advantages of this tool to the developers :) In a couple of weeks, I will transfer it to a separate domain, as a separate project.

I am open to personal communication outside this site, all my contacts are at https://holistic.dev/en/contacts/

Thank you for the write-up that was super interesting. If you don’t mind I have a question about the microservices. What is theit goal? Serving the data from the database? If not, then there is business logic you can’t Possibly generate, so what exactly do you generate?

i suggested building this at a bigco where i work last year, built a prototype using postgres and wrote the design doc.

sadly it wasn't picked up.

Congrats on product, there is a big market for it

Could you tell us a little more about your experience, if that's acceptable? Maybe in personal messages, FB/linked?

Does your roadmap include directly detecting against pg_stat_statements? Most of these tools go against the stats tables in the database directly ('SQL Doctor' from Idera comes to mind, that product is very solid)

Yes, absolutely. You can check requests from pg_stat_statements, slow query log, and any other source as you wish. If the SQL repository stores query in separate files, you can check these queries (e.g., before they are merged into the master).

It is crucial that you add the actual database schema (DDL) before checking.

This functionality will appear soon (client cabinet, API).

Please note that analysis is performed using static methods only! No connection to the database is required. In particular, statistics and other environmental parameters that may affect the query performance are not taken into the analysis. I would like to be able to enrich the data obtained by static methods with data from the database production, but at the moment, this is a very far-reaching plan.

The tool will make recommendations that are necessary, but not enough.

The actual database parameters affect the real query plan. There are a lot of such settings, and the impact changes over time.

Keeping an eye on these parameters and adjust them - there is a significant layer of work that is not a priority at this stage. Excellent tools have already been made for this purpose, such as postgres.ai.

DBA work is divided into two parts - to monitor the quality of queries and database schema and configure the parameters of production servers.

My tool covers the needs of the first part and does not cover the second part.

I talked to product owners of companies producing products or services related to the second part, and they agreed to integrate our products when I come ready.

Working on a major refactor of my elixir FFI interface for the Zig programming language. Lets you write zig inline in elixir, and takes care of all of the fiddly bits around setting up a nif correctly, in such a way that you can't mess it up.


By the end of the refactor, an additional setting will be included that makes the beam vm "garbage collect" your zig for you, that is, you can lazily allocate memory in your nif, and it will be cleaned up for you afterwards. This is a major first step in making "safe nifs" which are OTP-supervised OS threads with, at least, memory resource recovery performed by a BEAM process if the thread happens to crash.

Also, why use zig over rust? because zig is aggresively unopinionated about its memory allocation, so Zigler makes it easy to use the internal BEAM allocator, which 1) lets the BEAM be aware of allocations that are happening for observability and telemetry purposes, 2) lets your memory allocations play nice with the BEAM's fragmentation preferences and 3) leads to potentiallly better performance as you can make fewer trips to the kernel for allocations.

I'm working on building my own function generator!

If you're not an EE - a function generator is a pretty common piece of electrical engineering lab equipment.

I've got all the source files up on GitHub - feel free to take a look: https://github.com/cushychicken/bfunc

I've also been keeping a weekly project journal. I just posted the latest installment today if you'd like to see what I've gotten up to: http://cushychicken.github.io/bfunc-weekseven-log/

I'm already planning a second prototype and board spin with more functionality.

If you need or want a really bare-bones function generator for your home lab, get in touch! Instructions for how are in the post. )

Bookmarked. I like your Tkinter code, a lot cleaner than my typical mess. One suggestion: Add a PDF of your schematic for those of us who don't have the full tools installed.

Thank you! This is the first time I've written a GUI. It was clear to me from reading a lot of examples online that you can get lost in the weeds really quickly if you don't have some sort of system for organizing everything. There are just so many variables needed, and the program flow is so much less linear than what I'm used to.

Yeah, I've been meaning to add a PDF sch for a while. Just gotten swamped setting up a home office in the last week. I'll try and post one up today!

PDF schematic is up if you'd like to take a look: https://github.com/Cushychicken/bfunc/blob/master/hardware/b...

https://www.coronawhatnow.com - a website focused on helping people and businesses impacted by coronavirus.


-Food banks (individual sites or directories)

-Financial aid

-Elderly grocery shopping hours and delivery

-Healthcare (shelter in place info, testing like Verily Baseline, etc)

This info is currently scattered across the Internet and we're aggregating it in one place. It's hosted on Github using markdown so anyone with spare time can be a massive help (https://github.com/coronawhatnow/coronawhatnow.com).

Join us at https://www.facebook.com/groups/coronawhatnow

This is a great idea.

https://host.io - an API to get domain name metadata (scraped web content, backlinks, redirects, dns data, ranking information and more). See https://host.io/google.com for an example.

We've been building out the infrastructure for a while (scrape all domains monthly, resolve all domains, progress the data etc), but have only recently launched the API (see htttps://host.io/docs).

We'll soon be releasing a Top 10M Ranked Domain list, like the Alexa top 1M, but 10M, and based on our own ranking signals, instead of traffic data like Alexa.

If you've got any interesting use cases for our data or any feedback I'd love to hear it! ben@ipinfo.io

That's interesting, especially this, I think:

> Shared IP Address

> There are 68,258 domains hosted on (AS15169 Google LLC) including google.com. Here's a sample: [...]

You wrote:

> scrape all domains monthly, resolve all domains

what does "all domains" mean? All domains worldwide? Or some sub set?

As mentioned, I think your project sounds interesting — still I'm wondering, how did you verify there's an interest for what you're building? (I'd think/guess there is, just wondering if/how you verified this, or knew beforehand)

All domains is literally all domains :)

How does one get a list of all domains?

Rapid7's project sonar forward lookup database is a good start. https://opendata.rapid7.com/

Some corps collect DNS logs from the infrastructure for domains that have been registered very recently.

UPDATE: The free tier is static at 1k and then you have to buy credits or it is 1k / month?

Currently the API is priced at $0.01 per call, with $10 of free credit when you signup.

We'll likely add monthly subscriptions in the future perhaps with a recurring monthly free tier, but we don't have them yet.

If you do run out of the free $10 credit and want more time to see if the API is going to be useful for you shoot me an email and I'll be happy to issue more credit.


Where are you getting the link data from? Common crawl? 3rd party?

Same question for ranking info, assume you're not scraping yourself and using some like D4SEO?

We're doing the scraping ourselves. Just a single page per domain currently (so it's just links on the homepage), but we'll expand to a few pages per domain in the future.

We're also doing the ranking ourselves. We bootstrapped it by doing a regression against Alexa and other top 1m lists using a bunch of our own features (like number of backlinks, some content features, some IP address related features etc). We'll be sharing more details around this soon.

> If you've got any interesting use cases

What are some cases you know about this far?

Cybersecurity research / investigation (example here: https://twitter.com/host/status/1215709472066170880)

Intellectual property enforcement

Malicious traffic / account creation investigation

Domain name or competitor research

Market analysis (eg. find out how many customers publicly traded hosting providers have)

Over a thousand comments? My project is gonna be lost in the conversation! But here goes anyway ...

I've been recoding my HTML5 <canvas> library from scratch[1]. Most of the work is done - I'm now onto the fun documentation bit[2].

Something I'm particularly proud of achieving in the recode? Dynamically responsive bendy images in the web browser - with added animation capability![3]

[1] Scrawl-canvas v8 code base - https://github.com/KaliedaRik/Scrawl-canvas/tree/v8-alpha [2] Scrawl-canvas v8 code documentation (incomplete) - https://github.com/KaliedaRik/Scrawl-canvas/tree/v8-alpha/do... [3] Youtube video showing dynamically responsive bendy images in the web browser - https://www.youtube.com/watch?v=LebxNhVWyOQ&t=3s


Can scrawl-canvas run in a Web Worker and render to an Offscreen Canvas? You'd lose integration with DOM events for interactivity, but you'd be able to perform image processing in a background thread and not block the main renderer thread so that UI elements are responsive.

Thank you!

The majority of the code runs in the main thread; filters use a web worker. OffscreenCanvas currently has less than 70% support (caniuse says), so most of the work is done in regular canvas elements which are created but not added to the DOM, then copied over to the DOM canvas once per RequestAnimationFrame tick

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact