Sorry ShareJS is such a mess right now. I did a bunch of work to make it into a kind of OT-backed database to power derbyjs. Its very much straddling two worlds at the moment and I think its doing a bad job of both.
Hi! Joseph Gentle here, author of ShareJS. You quoted me in your article.
You're right about OT - it gets crazy complicated if you implement it in a distributed fashion. But implementing it in a centralized fashion is actually not so bad. Its the perfect choice for google docs.
Here is my implementation of OT for plain text:
In this tiny source-only demo I do all the bookkeeping, and implement a working network protocol on websockets. The result? I can sync an object between the server & client all in about 150 lines of code total:
This code has no external dependancies except for the ot type itself (ottypes/json0 in this case - although it could be changed to text or rich text or something with barely any code change).
Nice work making a cool demo - but I think you're using the wrong algorithm ;)
Yes, but that implementation deals only with plain text. The complexity seems to ramp up pretty quickly as you support more types of operations, and since extendability is an important concern for my project, I decided to avoid OT.
If you have insert, update, and delete then you can build any other operation from those primitives as long as you have the ability to batch a composite operation. So there's no need to extend the OT algorithm to support other kinds of primitive operation. For nested structures, addressing via ids or linear addresses has to be taken into account but that doesn't affect the OT transforms, it's one layer above. So it's possible (though not easy) to have an extensible OT system without exponential complexity.
Still, avoiding OT has led to some interesting new ideas and I look forward to seeing how things progress.
ProseMirror doesn't support distributed synchronisation so why do you ask?
Implementing OT consists of copying algorithms from academic papers, there's no need to prove anything yourself. Commutativity and global ordering are prerequisites to any OT proof.
Offline support falls naturally out of OT, it's effectively free. New operations can be defined in terms of the primitives of insert, update, and delete, given the ability create a batch operations.
Distributed OT, where there is no central server is seriously tricky and algorithms with proofs of correctness can be hard to come by. But centralised OT is readily implementable, albeit with considerable background reading of the necessary papers.
Personally I think it's good to see approaches other than OT being explored. There's space for more than one solution.
> ProseMirror doesn't support distributed synchronisation so why do you ask?
I wondered whether his quote from the article related to the implementation of distributed OT in ShareJS, or just centralized OT. I always assumed he had said that of centralized OT.
> Implementing OT consists of copying algorithms from academic papers
If you extend them, you need to extend the proof as well.
> New operations can be defined in terms of the primitives of insert, update, and delete
Similar to vector spaces, adding a vector to the base that is orthogonal to all others can give you a much larger space. While in OT, every new primitive requires an exponentially growing number of proofs, you can avoid that with a simpler algorithm.
Sure, OT has text figured out (although it took some time to fix the details). However, I am personally interested in more complex data structures such as matrices (bitmap) and trees (vector graphics, rich text), where you want to maintain the meaning of, say, the rotation of a layer. Even with sound, which can be kind of seen like a list, using the same primitives as with text would yield strange results.
It is harder to tweak an OT algorithm to go closer to the intention of the user (which I do heavily in Canop for text) as it is completely independent from the proof that is needed to maintain convergence.
To me, adding collaboration to an operation-based editor should be as simple as adding an undo/redo system.
I absolutely think that OT was a phenomenal achievement, but in a couple of years, I hope it won't be the go-to choice.
> in OT, every new primitive requires an exponentially growing number of proofs, you can avoid that with a simpler algorithm.
Its not actually necessary to add any new primitives though. Most of the time, composite operations can represent higher-level operations. I'll expand on this below.
> I am personally interested in more complex data structures such as matrices and trees
Trees can be modelled in OT using only the primitives of insert, update, and delete, providing that you support linear addresses (i.e. paths). We know from Lisp that everything else can be modelled from trees, so you effectively get a general-purpose system from just three primitives. CoPowerPoint was implemented using this approach and allows arbitrary graphical editing.
You misread. I don't expect outside contributors to call the shots. b00gizm referred to "people inside Apple who really want feedback and contributions from the community to improve their language" and I suggested that those inside contributors might not have the power to allow effective outside feedback and contributions.
Incidentally, if Swift is to be widely adopted outside the Apple ecosystem, it'll help hugely if it becomes a community project, not "Apple's project".
In my experience, its more like "get your $100, then offer you another $500". I've been offered several jobs to continue working on open source software I've written at companies where that software is being used. In all of those cases I've been allowed to continue opensourcing the changes I've made. For small projects, nobody wants to maintain their own fork.
Agreed. As a high schooler I loved sourceforge. I would talk it up to people and I had a couple of projects that I put up there. I thought it was the best thing since sliced bread.
Then I saw that famous talk by Linus on git in 2007 (https://www.youtube.com/watch?v=4XpnKHJAok8). Since I had never managed to get SVN working properly for me git was awesome. No server software to install. By the time I wanted to put up another project on the web Github was a thing and I used that. I never looked back, I loved how it was about the code, not how many installer downloads you had.
That for me was the main problem with sourceforge. In the end it was a game (for the devs) to get the most downloads because that was how your projects were judged and ranked. The Github "game" is slightly better and there are multiple ways to play.
I don't think there were decent alternatives back then. Also, their business model has been based on web advertisement. It's a valid point that GitHub has a solid model and might not need to do shady stuff in the long run.
It has been studied (first in 1909 by Henry Ford!). As I understand it the results agree with the original post.
In short, you can use crunch time strategically to meet deadlines but you always need a recovery period afterwards. If you work 60 hour weeks, in about 4 weeks your productivity will drop lower than it was when you were working 40 hour weeks - despite putting in 50% more hours. Also many people in this burnout zone will self report that their productivity is higher than it was (and they're wrong). If you've been crunching for a month straight, you're working ineffectively and you're too tired to tell.
This is not quite correct. Generating a random vector using three calls to rand() is the equivalent of picking a random point inside a cube. If you normalize the points, you're effectively squishing the corners of the cube in toward the center to make a sphere. Points near the corners of the containing cube will appear more frequently than the top, bottom and 'sides'.
@ginko's comment is correct - you can fix the algorithm by throwing out any points that lie outside the sphere before normalizing.
As others have pointed out, the distinction between randn() and rand() is crucial here - the former gives you points from a (spherically symmetric) normal distribution, the latter gives you uniform points from the unit cube.
One of the advantages of the normal-sampling route over @ginko's rejection-based method is that in high dimensions almost all of the volume of the unit cube is situated outside of the unit sphere (the volume of the unit cube is always 1, whereas the volume of the unit sphere decreases exponentially with dimension). So the rejection method becomes exponentially slow in high dimensions, while the Gaussian method still works just fine.
In three dimensions (to keep the notation reasonably simple) you need a distribution with density f such that f(x)f(y) f(z) is a function of x^2 + y^2 + z^2 (i. e. the distance from the origin). It looks like the Gaussian (up to scaling) is the only one.
Without loss of generality, take f(xi) = k1 * exp(-g(xi)) , for some g. Then we need the joint pdf to satisfy f(x1,...,xn) = k2 * exp(-h(R^2)), R=sum(xi^2)^1/2 (the R^2 and h(.) is w.l.g. too). So we get g(x1)+g(x2)=h(x1^2+x2^2). Then assuming the functions g and h analytic we end up needing g(x)= k * x^2, otherwise we get cross terms in the Taylor expansion that can't be cancelled out for all xi. Sounds good?
 The function f trivially needs to be symmetric, justifying no loss of generality.
Hmm I'm not sure I get your point. I'm trying to prove that the joint pdf of N iid RV's is isotropic if and only if the RV's are gaussian. If I assume the pdfs are gaussian in the first place the proof isn't valid?
The math is fairly simple. The probability distribution of a multivariate standard gaussian has a simple form that is f=a * exp(-x1^2+...+xN^2)=b * exp(-R^2), where a and b are some normalizing constants and R is the norm of the position , that is, it is obviously rotationally symmetric .
But that pdf is also the joint pdf of N i.i.d. gaussians, evident by decomposing f=a * exp(-x1^2) * ... * exp(-xN^2) , which is the joint pdf x1,...xN s.t. fx1=c * exp(-x1^2), ..., fxn=c * exp(-xN^2).
 Since exp(-R^2) does not depend on direction but only on distance from the origin
 The fact that f(x1,...xN)=f1 * ... * fN if x1,...xN are independent follows directly from the fact that P(A & B) = P(A)*P(B) if A and B are independent events.
Great point, you are correct. It's not enough for the distribution of the random vectors to be rotationally symmetric. It must be isotropic, looking the same in every direction from the origin, which a cube does not. (Nor does any other method deriving from a polyhedral solid, as I was thinking might have been possible, through something like selecting a random face from an icosahedron then a random point on that face.)
FoundationDB. Kyle didn't bother running Jepsen against FDB because foundationdb's internal testing was much more rigorous that Jepsen. The foundationdb team ran it themselves and it passed with flying colors: