Hacker News new | comments | show | ask | jobs | submit | josephg's comments login

Why scan the table of precomputed values? It seems like the code would be both cleaner and faster like this:

  class AnimatedVariable {
    int numValues;
    std::vector<float> values;
    // ...

  if (!mPrecomputed) {
    float idx = time * numValues;
    float before = values[idx], after = values[idx+1];
    return lerp(before, after, idx - (int)idx);
I guess there's a bit of float -> int coercion going on there, but it shouldn't be too bad.


Because we are interpolating between keys which aren't evenly distributed in time. What you have there is pretty much what we do when looking up the precomputed values from the table.


I heard that result from reading Peopleware, although this (apparently well researched) Stackexchange answer disagrees:



This one right? http://programmers.stackexchange.com/questions/179616/a-good...


(ShareJS author here)

Sorry ShareJS is such a mess right now. I did a bunch of work to make it into a kind of OT-backed database to power derbyjs. Its very much straddling two worlds at the moment and I think its doing a bad job of both.

It'll get cleaned up eventually.


Hi! Joseph Gentle here, author of ShareJS. You quoted me in your article.

You're right about OT - it gets crazy complicated if you implement it in a distributed fashion. But implementing it in a centralized fashion is actually not so bad. Its the perfect choice for google docs.

Here is my implementation of OT for plain text: https://github.com/ottypes/text Note that its only 400 lines of javascript, with liberal comments. To actually use OT code like that, you need to do a little bookkeeping. Its nowhere near as bad as you suggest.

In this tiny source-only demo I do all the bookkeeping, and implement a working network protocol on websockets. The result? I can sync an object between the server & client all in about 150 lines of code total: https://github.com/josephg/appstate

This code has no external dependancies except for the ot type itself (ottypes/json0 in this case - although it could be changed to text or rich text or something with barely any code change).

Nice work making a cool demo - but I think you're using the wrong algorithm ;)


Yes, but that implementation deals only with plain text. The complexity seems to ramp up pretty quickly as you support more types of operations, and since extendability is an important concern for my project, I decided to avoid OT.


If you have insert, update, and delete then you can build any other operation from those primitives as long as you have the ability to batch a composite operation. So there's no need to extend the OT algorithm to support other kinds of primitive operation. For nested structures, addressing via ids or linear addresses has to be taken into account but that doesn't affect the OT transforms, it's one layer above. So it's possible (though not easy) to have an extensible OT system without exponential complexity.

Still, avoiding OT has led to some interesting new ideas and I look forward to seeing how things progress.


Does Sharejs currently support distributed synchronization?

My appreciation of OT (having tried to implement it myself) is that proving that the code preserves the transform constraint is very hard (and the proof one makes is easily wrong).

I believe that the OT proof is equally hard for centralized and distributed versions of OT (I was only ever focused on centralized).

Proving that operations commute is way easier. Defining a global total order for operations is way easier.

Working with an OT base seems more difficult as a result; adding offline or new operations requires more proofs and gets harder.

I too turned towards simpler synchronization algorithms: https://github.com/espadrine/canop/blob/master/doc/paper.md


ProseMirror doesn't support distributed synchronisation so why do you ask?

Implementing OT consists of copying algorithms from academic papers, there's no need to prove anything yourself. Commutativity and global ordering are prerequisites to any OT proof.

Offline support falls naturally out of OT, it's effectively free. New operations can be defined in terms of the primitives of insert, update, and delete, given the ability create a batch operations.

Distributed OT, where there is no central server is seriously tricky and algorithms with proofs of correctness can be hard to come by. But centralised OT is readily implementable, albeit with considerable background reading of the necessary papers.

Personally I think it's good to see approaches other than OT being explored. There's space for more than one solution.


> ProseMirror doesn't support distributed synchronisation so why do you ask?

I wondered whether his quote from the article related to the implementation of distributed OT in ShareJS, or just centralized OT. I always assumed he had said that of centralized OT.

> Implementing OT consists of copying algorithms from academic papers

If you extend them, you need to extend the proof as well.

> New operations can be defined in terms of the primitives of insert, update, and delete

Similar to vector spaces, adding a vector to the base that is orthogonal to all others can give you a much larger space. While in OT, every new primitive requires an exponentially growing number of proofs, you can avoid that with a simpler algorithm.

Sure, OT has text figured out (although it took some time to fix the details). However, I am personally interested in more complex data structures such as matrices (bitmap) and trees (vector graphics, rich text), where you want to maintain the meaning of, say, the rotation of a layer. Even with sound, which can be kind of seen like a list, using the same primitives as with text would yield strange results.

It is harder to tweak an OT algorithm to go closer to the intention of the user (which I do heavily in Canop for text) as it is completely independent from the proof that is needed to maintain convergence.

To me, adding collaboration to an operation-based editor should be as simple as adding an undo/redo system.

I absolutely think that OT was a phenomenal achievement, but in a couple of years, I hope it won't be the go-to choice.


> in OT, every new primitive requires an exponentially growing number of proofs, you can avoid that with a simpler algorithm.

Its not actually necessary to add any new primitives though. Most of the time, composite operations can represent higher-level operations. I'll expand on this below.

> I am personally interested in more complex data structures such as matrices and trees

Trees can be modelled in OT using only the primitives of insert, update, and delete, providing that you support linear addresses (i.e. paths). We know from Lisp that everything else can be modelled from trees, so you effectively get a general-purpose system from just three primitives. CoPowerPoint was implemented using this approach and allows arbitrary graphical editing.


What an amazingly entitled post. Why would outside contributors call the shots? Its Apple's project; not yours.


You misread. I don't expect outside contributors to call the shots. b00gizm referred to "people inside Apple who really want feedback and contributions from the community to improve their language" and I suggested that those inside contributors might not have the power to allow effective outside feedback and contributions.

Incidentally, if Swift is to be widely adopted outside the Apple ecosystem, it'll help hugely if it becomes a community project, not "Apple's project".


In my experience, its more like "get your $100, then offer you another $500". I've been offered several jobs to continue working on open source software I've written at companies where that software is being used. In all of those cases I've been allowed to continue opensourcing the changes I've made. For small projects, nobody wants to maintain their own fork.


Github has a stable business model which depends on their reputation as a host. As I understand it, that isn't something that could ever be said of SourceForge.

I'm not saying github will be around forever, but I highly doubt they'll make the same mistake sourceforge is making now.


SF was highly reputable back in the day - why else do you think so many projects which have roots back in the 90's are hosted there?

Never say never. 15 years ago nobody would've dreamt SF would have gone this way.


Agreed. As a high schooler I loved sourceforge. I would talk it up to people and I had a couple of projects that I put up there. I thought it was the best thing since sliced bread.

Then I saw that famous talk by Linus on git in 2007 (https://www.youtube.com/watch?v=4XpnKHJAok8). Since I had never managed to get SVN working properly for me git was awesome. No server software to install. By the time I wanted to put up another project on the web Github was a thing and I used that. I never looked back, I loved how it was about the code, not how many installer downloads you had.

That for me was the main problem with sourceforge. In the end it was a game (for the devs) to get the most downloads because that was how your projects were judged and ranked. The Github "game" is slightly better and there are multiple ways to play.


I don't think there were decent alternatives back then. Also, their business model has been based on web advertisement. It's a valid point that GitHub has a solid model and might not need to do shady stuff in the long run.


> I'm not saying github will be around forever, but I highly doubt they'll make the same mistake sourceforge is making now.

Github could be sold, just like Sourceforge was sold, and the new owners could behave very differently from the current owners.


It has been studied (first in 1909 by Henry Ford!). As I understand it the results agree with the original post.

In short, you can use crunch time strategically to meet deadlines but you always need a recovery period afterwards. If you work 60 hour weeks, in about 4 weeks your productivity will drop lower than it was when you were working 40 hour weeks - despite putting in 50% more hours. Also many people in this burnout zone will self report that their productivity is higher than it was (and they're wrong). If you've been crunching for a month straight, you're working ineffectively and you're too tired to tell.

Here's a presentation on the topic from Dan Cook, with links to papers: http://lostgarden.com/Rules%20of%20Productivity.pdf

And here's a fantastic write-up of a recent quantitative study in the games industry. They looked at how the success of video games correlates with crunch time and overwork. I suspect that these results would also hold true amongst startups: http://www.gamasutra.com/blogs/PaulTozour/20150120/234443/Th...


This is not quite correct. Generating a random vector using three calls to rand() is the equivalent of picking a random point inside a cube. If you normalize the points, you're effectively squishing the corners of the cube in toward the center to make a sphere. Points near the corners of the containing cube will appear more frequently than the top, bottom and 'sides'.

@ginko's comment is correct - you can fix the algorithm by throwing out any points that lie outside the sphere before normalizing.


As others have pointed out, the distinction between randn() and rand() is crucial here - the former gives you points from a (spherically symmetric) normal distribution, the latter gives you uniform points from the unit cube.

One of the advantages of the normal-sampling route over @ginko's rejection-based method is that in high dimensions almost all of the volume of the unit cube is situated outside of the unit sphere (the volume of the unit cube is always 1, whereas the volume of the unit sphere decreases exponentially with dimension). So the rejection method becomes exponentially slow in high dimensions, while the Gaussian method still works just fine.


Surely there would still be a bias towards the corners?


Interestingly, there isn't for Gaussians (see my comment below).

Other simple distributions tend to give biases towards the corners or axis. Perhaps the Gaussian is unique in this regard? I'm not sure.


In three dimensions (to keep the notation reasonably simple) you need a distribution with density f such that f(x)f(y) f(z) is a function of x^2 + y^2 + z^2 (i. e. the distance from the origin). It looks like the Gaussian (up to scaling) is the only one.


Yes, the fact that i.i.d. Gaussians give a rotationally symmetric joint distribution is a defining property of the Gaussian.


Ah I suspected as much. Can you provide proof?

I think I found one, but I'm not sure:

Without loss of generality, take f(xi) = k1 * exp(-g(xi)) [1], for some g. Then we need the joint pdf to satisfy f(x1,...,xn) = k2 * exp(-h(R^2)), R=sum(xi^2)^1/2 (the R^2 and h(.) is w.l.g. too). So we get g(x1)+g(x2)=h(x1^2+x2^2). Then assuming the functions g and h analytic we end up needing g(x)= k * x^2, otherwise we get cross terms in the Taylor expansion that can't be cancelled out for all xi. Sounds good?

[1] The function f trivially needs to be symmetric, justifying no loss of generality.


Perhaps coincidentally, someone just asked about this at math.stackexchange: http://math.stackexchange.com/questions/1255637/joint-pdf-of...


Yes, that was me :)

I wanted to make sure I got a proof, since I didn't really find this elsewhere.


You have to assume that the one-dimensional marginals are Gaussian for otherwise the statement is not correct.


Hmm I'm not sure I get your point. I'm trying to prove that the joint pdf of N iid RV's is isotropic if and only if the RV's are gaussian. If I assume the pdfs are gaussian in the first place the proof isn't valid?


Oh, I see. I thought you were trying to prove something different.


No, OP is correct; he spoke about an isotropic normal distribution (randn as opposed to rand).


I don't totally understand the math, but davmre is suggesting calling randn(), not rand() -- so sampling from a normal distribution instead of a uniform distribution.


The math is fairly simple. The probability distribution of a multivariate standard gaussian has a simple form that is f=a * exp(-x1^2+...+xN^2)=b * exp(-R^2), where a and b are some normalizing constants and R is the norm of the position , that is, it is obviously rotationally symmetric [1].

But that pdf is also the joint pdf of N i.i.d. gaussians, evident by decomposing f=a * exp(-x1^2) * ... * exp(-xN^2) [2], which is the joint pdf x1,...xN s.t. fx1=c * exp(-x1^2), ..., fxn=c * exp(-xN^2).

[1] Since exp(-R^2) does not depend on direction but only on distance from the origin

[2] The fact that f(x1,...xN)=f1 * ... * fN if x1,...xN are independent follows directly from the fact that P(A & B) = P(A)*P(B) if A and B are independent events.


Great point, you are correct. It's not enough for the distribution of the random vectors to be rotationally symmetric. It must be isotropic, looking the same in every direction from the origin, which a cube does not. (Nor does any other method deriving from a polyhedral solid, as I was thinking might have been possible, through something like selecting a random face from an icosahedron then a random point on that face.)


FoundationDB. Kyle didn't bother running Jepsen against FDB because foundationdb's internal testing was much more rigorous that Jepsen. The foundationdb team ran it themselves and it passed with flying colors:


Sadly, fdb has been bought by Apple[1], and you can't download it anymore. I sincerely hope foundationdb gets opensourced or something.

[1] http://techcrunch.com/2015/03/24/apple-acquires-durable-data...


I believe the real reason was that FDB was not open sourced.

Jepsen tests cannot be used to prove the system is safe, but to prove it isn't.

It looks like very often he looks into source code in order to figure out how the system operates, that way he can find weaknesses and write test to his testing framework to demonstrate the issue.

I wouldn't trust any company that uses Jepsen to show that their product is safe.



He has said [1] that each test takes literally months to do. I am not surprised that he picks the most popular databases, given the amount of work.

[1] https://github.com/rethinkdb/rethinkdb/issues/1493#issuecomm...


Actually, look at some of the things that @aphyr has tweeted about FoundationDB:





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact