It is about time considering that Microsoft research has been one of the main funders of work on the Haskell compiler.
This make it pretty "unportable" because its the same dependency with the Java VM, so how can i distribute code with this library, with a dependency like that, asking people to download the whole GHC ?!
Unfortunately for libraries that should be embedded in third-party code, the reality beyond C/C++ is pretty harsh.. for full applications the reality is different.. but for embedded libraries.. despite the fact that i've liked the solution for something im doing, i had to pass because of this small detail.. and im too busy to write a parser in C++ to make this more portable in source code form.. so i had to get back to protobuf :/
Im working on something that can have a lot of dependencies in thirdparty libs, so i need to minimize the dependency side-effects.. so despite the fact i like this more than protobuf, i'll have to stick with it (cap'n proto had to rewrite from haskell to c++ because of this).
PS: Oh no, The haskell inquisition downvotes (as expected)
I think you dont read what i've wrote, or more likely, im explaining it poorly(not a native, sorry) . This can be a binary to compile and create source code, but also and often can be embedded to be used as a library.. im guessing you are using Windows because you've said you dont have a c compiler at hand.. but Windows are most a end-user thing, and end-users probably wont care about compiling code.. otherwise a c/c++ compiler is ubiquotous
Im not complaining about the tool, but about the use as a library, which is something this also aim to be, and C is a better aim at that because can be embedded in any language.. given the compiler is in haskel i cant access the AST for instance, i cant embed in my binary, but have to call another external binary instead.. but at least have a runtime to embed.. this may be ok for some.. but i was just saying that, despite the protocol language being very good, i couldnt use it instead of protobuf because i would have a more limited api and my end program/ goal would lose power and flexibility.
This is a pretty technical explanation, it could be coded in Brainf*ck.. nothing against the lang in itself.. is just that it limits the use case of this tool(as compared with protobuf)
ghc is pretty ubiquitous these days. Any serious linux distro will have a package so it's one line (apt-get install ghc or similar). Even on e.g. a mac it's no harder than installing ruby or python.
> given the compiler is in haskel i cant access the AST for instance, i cant embed in my binary, but have to call another external binary instead
You could write Haskell. It's a pretty nice language.
More to the point, Haskell does have a C FFI and allows you to build a library that exposes a C interface that C programs can link against. I don't know whether the authors have done that here, but the functionality is available.
I don't think you really understand what GHC is. GHC can compile Haskell down to C or Assembly, and has an FFI to make Haskell embeddable in C. The runtime for GHC is not like the runtime for Java or other VM/Interpreter-based languages...Haskell can be compiled and embedded to turned into a shared library to be distributed with your code.
I've been toying with the idea of using something like PB, Cap'n Proto, or now Bond to define and track schema changes and centralize marshaling / serializing logic. I'm not concerned about having RPC. Does this sound like crazy talk? Anyone else happen to track schemas agains schemaless data stores?
(I also like the idea of not having to ship JSON everywhere if I don't want to.)
A few things:
- ElasticSearch is definitely not schema-less, but it can try to generate a schema (aka "mapping") for you if you don't give it one: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/c...
- ElasticSearch has tons of ways to customize the data you get back, so, unless you really don't want the ES cluster crunching things for you, you can do a lot of the transformation server-side. You can go so far as to have your own type + mapping for e.g. a report, which sources data from another type and transforms it: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...
- This covers both why the schema can't, by nature, be dynamic (so the argument of "schema-less / dynamic schema" is BS in practice IMO), as well as how to get data out from one index an into another (e.g. your "report" index which does scripted transformation).
- Another idea would be to use the scripting module to write a custom "view": http://www.elasticsearch.org/guide/en/elasticsearch/referenc...
- You can use Groovy, mvel, JS, or Python for scripts. If you combine this with how ES lets you do "site plugins", you could make a JS + CSS + HTML site which is actually served by the ES cluster, which interacts with it and generates reports or whatever all without additional infrastructure. Example: https://github.com/karmi/elasticsearch-paramedic
What a I overlooking? What information is know at runtime that isn't already available at build time? (And no, "the exact CPU/memory/etc. the code runs on is not a valid answer. This is C# code, so there always is a runtime that handles that stuff)
1) In some scenarios you have information at runtime that allows you do generate much faster code. The canonical example is untagged protocols, where serialized payload doesn't contain any schema information and you get schema at runtime. Bond supports untagged protocols (like Avro) in addition to tagged ones (like protobuf and Thrift) and the C# version generates insanely fast deserializer for untagged.
2) It allows programmatic customizations. If the work is done via codegen'ed source code then the only way for user to do something custom is to change the code generator to emit modified code. Even if codegen provides ability to do that, it is very hard to maintain such customizations. In Bond the serialization and deserialization are composed from more generic abstractions: parsers and transforms. These lower level APIs are exposed to the user. As an example imagine that you need to scrub PII information from incoming messages. This is a bit like deserialization, because you need to parse the payload, and a bit like serialization, because you need to write the scrubbed data. In Bond you can implement such an operation from those underlying abstractions and because you can emit the code at runtime you don't sacrifice performance.
BTW, Bond allows to do something similar in C++. The underlying meta-programming mechanism is different (compile-time template meta-programming instead of runtime JIT) but the principle that serialization and deserialization are not special but are composed from more generic abstractions is the same.
Ad 2): does this mean that one can also do efficient schema migration at deserialization time (rename fields, add fields with default values), or that one can deserialize to something else than the class that got generated when the schema was compiled?
2) You can do both. You can also do type safe transformations/aggregations/etc on serialized data w/o materializing any object.
i got around 300k msgs/s throughtput with msgpack-d-rpc
The current offerings (Thrift, ProtoBuffs, Avro, etc.) tend to have similar opinions about things like schema versioning, and very different opinions about things like wire format, protocol, performance tradeoffs, etc. Bond is essentially a serialization framework that keeps the schema logic stuff the same, but making the tasks like wire format, protocol, etc., highly customizable and pluggable. The idea being that instead of deciding ProtoBuffs isn’t right for you, and tearing it down and starting Thrift from scratch, you just change the parts that you don’t like, but keep the underlying schema logic the same.
In theory, this means one team can hand another team a Bond schema, and if they don’t like how it’s serialized, fine, just change the protocol, but the schema doesn’t need to.
The way this works, roughly, is as follows. For most serialization systems, the workflow is: (1) you declare a schema, and (2) they generate a bunch of files with source code to de/serialize data, which you can add to a project and compile into programs that need to call functions that serialize and deserialize data.
In Bond, you (1) declare a schema, and then (2) instead of generating source files, Bond will generate a de/serializer using the metaprogramming facilities of your chosen language. So customizing your serializer is a matter of using the Bond metaprogramming APIs change the de/serializer you’re generating.
It is interesting to think about how it might work, though...
If you want to follow up, I encourage you to email Adam (adamsap -at- microsoft) or you can ping me and I'll loop him in (email@example.com).
I'll put it bluntly: you have no idea what you're talking about.
Bond v1 was started when Thrift was not production ready. This is Bond v3. There is no conspiracy to make Bond hard to use for technology we "don't care about." In general I'm fine with tempered speculation, but your conclusion here is just lazy, and we both know it. It contributes nothing to the conversation, and spreads FUD for no good reason. We can do better, agree?
Now, to address your comments about customization directly: pluggable protocols are an example. The metaprogramming facilities of Bond are dramatically more rich than those of Thrift. A good example of these facilities: using the standard metaprogramming API and a bit of magic we have been able to trick Bond into serializing a different serialization system's schema types. So, picture Bond inspecting some C# Thrift type (or something), and then populating the core Bond data structures with data it finds there, and then serializing it to the wire.
This is the kind of power you get when you construct the serializer in memory using metaprogramming, and then expose that to the user. The flexibility is frankly unmatched.
> I'll put it bluntly: you have no idea what you're talking about. Bond v1 was started when Thrift was not production ready. This is Bond v3.
Let me put something bluntly. I don't care who started writing code first. Microsoft are well over half a decade late to the party. Thrift and Protobufs have been public domain since, what... 2007/8?
And frankly, at least on the C++ front, there's not much to get excited about with regard to metaprogramming here. The Avro C++ code generator already produces a small set of template specialisations for each record type, and they're trivial enough to write manually for any existing classes you wish to marshal against your schema. std namespace containers are already recognised through partial template specialisations. MsgPacks implementation also does this. Other more general metaprogramming solutions, like Boost Fusion, are also being used by many, in production, for completely bespoke marshalling.
Don't get me wrong, Bond looks really nice, particularly for C# programmers, and I have respect for the work being done, but I can't get excited about it. It's kind of like someone announcing yet another JSON library or some new web framework when what the industry needs is consensus on formats and APIs. Right now there are so many serialization frameworks that the de-facto standard will just continue to be schema-less JSON and robust tools will remain largely non-existent.
I hear you on the fragmentation. I know that this doesn't help the community as an explanation, but big companies like Facebook, Google and Microsoft really have a good reasons to control such fundamental pieces of their infrastructure as serialization. Case in point: Facebook has forked their own Thrift project because, I presume, having it as Apache project was too restraining.
FWIW, we plan to develop Bond in public and accept contributions from the community.
> Let me put something bluntly. I don't care who started writing code first. Microsoft are well over half a decade late to the party. Thrift and Protobufs have been public domain since, what... 2007/8?
I think your technical complaints are almost all good and valid. I agree with essentially all of them. I don't want to fork Adam's sibling response here, so I'll leave it at that.
My point is that asserting that Bond was developed so that MSFT could purposefully ignore certain languages is pointedly wrong, and irresponsible considering the dearth of evidence you have to support it. And here you have 3 authoritative comments to say so. I don't understand how you can possibly disagree with this, or be upset that someone would take issue here. It's ok to be wrong.
Your characterization of Thrift is accurate and Bond actually has some of the same architectural roots as Thrift. Those features of Thrift were ones that I wanted to preserve in Bond. But we also wanted to expand that plugability to allow for even more flexibility than the core Thrift architecture would allow for -- for example, the ability to support Avro-like "untagged" protocols within the same framework. I believe that the core innovation is in how that gets implemented. Also, we believe that performance is a feature -- our testing has shown that Bond noticeably outperforms Thrift and Protocol Buffers in most cases.
There is no conspiracy or intent to "ignore languages" -- we will release additional languages as they are ready and as we can support them as first-class citizens. We also welcome community involvement.
"By design Bond is language and platform independent and is currently supported for C++, C#, and Python on Linux, OS X and Windows."
"language bindings - Thrift is supported in many languages and environments
We have support for a few more languages that we are using internally but after having a hard look at the implementations we decided that they weren't up to par for the open source release yet. I hope that we will release more soon. And needless to say, we are open to contributions from the community.
Main content has horizontal scroll on portrait monitors, which underlaps the transparent fixed div they used for navigation.