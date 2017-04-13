So one thing that I've learned about programming: Don't overthink it. How many different object types do you want to serialize? Maybe a hundred? So just write those 100 lines of boilerplate. Not a huge amount of work, even when there is the the occasional modification. An additional benefit is that you're not painting yourself in a corner - for example, you can leave out that additional bit of redundant information that you hacked in as an optimization.
What's really huge for maintainability is this thing called aspect-oriented programming: Don't litter each of your classes with a line of boilerplate for this and another line for that. Make a file "serialize.cpp" that contains the serialization code for all the types, and nothing else.
In other words, object-oriented is the wrong tool, for almost everything (sorry if that's old news).
Frankly, the idea that data should be human readable is complete crap, especially for things which are going to be machine processed far more than human processed. AKA use XML for your config file, not for the gigabytes of data you send over the wire/store on disk/etc, that will never be read by a human. Its funny that HTML/etc has everyone in this strange mindset that somehow its more efficient to build human readable markup, which 99.9999% of the time is never read by a human, and has to be compressed with algorithms that further burn up tons of CPU time in order to squash the resulting data. The untold tons of carbon wasted transfering more data than necessary and/or compressing/decompressing it is insane for the possibility that someone will want to read it as it flys over the wire, rather than decoding it at the endpoint and viewing it in a debugger...
It is also a bad idea if reading a complete structure into memory is not possible, or a poor use of time. For example, if you need to read a single part of a gigabyte data set.
Also, if you happen to need to pick out a tiny bit of data from some huge tree/whatever data structure then map it with appropriate madvise() and simply let the kernel page in the portions you need to access. The whole dataset doesn't need to be read to pick out a couple bytes here and there. Further if this is a common use case you probably shouldn't be using this method anyway, rather storing the index separately from the data in a manner which allows them to be accessed independently. This mechanism holds up for multiple TB sized datasets, and can be tuned by playing with the amount of physical ram in the machine vs the speed of the disk. Then when storage class memory takes off, you don't even have to rewrite your application.
Finally, for a single GB dataset, doing a bulk read is likely well under a 1 second operation on most reasonable storage mechanisms available for server use. Compared with picking off a couple hundred thousand fields from an object storage mechanism this is going to seem pretty speedy.
Its far more efficient to assume little Endian and let the weird machines do their own translations when loading the data.
What does that even mean? Depends on what the output should be and how much control you need, I guess.
To comment on the binary vs text digression - computation overhead is probably not that much if the data is converted to a (non-portable) internal DOM representation and then typeset / rasterized anyway. And is bandwidth/storage overhead? I assume a short video quickly outweighs the costs for 100%, or in extreme cases 1000%, overhead compared to a compressed DOM representation. And the human factors are definitely there.
Actually, what's interesting is that both Lisp macros and C++ templates allow this to work pretty easily if you set up the boiler plate well enough. All this article is really talking about is making it easier to set up and use that boilerplate (e.g. put it in the language).
> What's really huge for maintainability is this thing called aspect-oriented programming: Don't litter each of your classes with a line of boilerplate for this and another line for that. Make a file "serialize.cpp" that contains the serialization code for all the types, and nothing else.
And that works if your project is not meant to ever be extended. But generally the point of having object oriented systems is being able to make new objects that can take the place of old ones. Of having abstraction.
And you can easily have them or change to them later. It's just another few lines of boilerplate to make a few method tables. But I don't like to have the implementation logic in the places where method dicts are wired, especially for interfaces with very little coherency, like GameModel3d: render, click, contextString, serialize... Doing that instead of aspect-oriented is a violation of the "separation of concerns" principle.
Although I agree with your ide, I've written some boilerplate code before to do something very similar. As the project grew its become a constant source of bugs where someone has updated the code for one type and not another. At this point they're too far gone to have any hope of merging them, but had I known two years ago how much hassle it would have caused I would have refactored it into common code
But yeah, very seldom starting with a complex abstraction for a code you've never written before is a good idea.
But the article reminds me why I still feel frustrated that my programming work is divided between java and c++. There are so many things that are easy in java and hugely painful in c++, and c++ slowly adds those ideas yet it makes the language more painful to use over time. I basically want to throw c++ away and replace it with something that is more regular and doesn't expose so many complexities to the end programmer like the endless nuances of memory allocation, moving, &&, r values etc. But there's no hope of that. For the rest of my life every year or two there will be additions to c++ that I will have learn about and use and gradually discover their own warts, yet java will still keep cruising along.
An example:
fn add(x1 x2)
{
return <x1+x2>
}
fn <main>(count:int arguments:<const char** const>):int
{
(<printf> "%d\n" (add 2 3))
return 0
}
auto add(const auto x1,const auto x2)
{
return x1+x2;
}
int main(const int count, const char** const arguments)
{
printf("%d\n",add(2,3));
return 0;
}
I use S-expressions for function calls.
Yes, it supports C++ (even templates!): http://docs.cython.org/en/latest/src/userguide/wrapping_CPlu...
All jokes aside I agree with you to some extent: C++ can be frustrating to work with, and there are a lot of quirks that make it really easy to blow your foot off.
These quirks are not without benefit though. Being able to control how memory is allocated is hugely important in performance sensitive applications where pointer chasing is not tolerable. The same can be said about r values/move semantics.
For example, look at C++'s concept of move semantics. The language still runs destructors regardless of whether a value has been moved. This means every movable class has to have a valid "empty" state the destructor can check to avoid double frees. That means you need move constructors which can run arbitrary code. And that means generic code that does moves has to care about exception safety.
Instead, the language could statically avoid running the destructors of moved values. No more rvalue zoo needed to trigger a move, no more mandatory "empty" states because the destructor can assume a valid object, no more move constructors because they all reduce to memcpy, and thus no more exception safety problems and an easier job for the optimizer.
Not being able to keep track of what cases call a destructor and which ones do not sounds like a much more huge problem than the one you are talking about. Doesn't seem too hard to avoid the temptation throw in a move constructor.
Keeping track of when a destructor gets called is not a huge problem at all. From the user perspective it's airtight, proper uninitialized value analysis in the compiler make it impossible to screw up. From the implementation perspective all you need is the occasional flag on the stack (whose value is already calculated) for situations like `if ... { use_by_move(obj) }`, and a rule against moving things out of fields that you're otherwise not responsible for destructing (you can still swap there).
You do to e.g. default construct the temporary you'll be swapping with in the first place. Problematic if you're trying to construct e.g. a reference-like type with no null state.
class ifstream {
FILE* f;
ifstream(const ifstream&) = delete;
ifstream& operator=(const ifstream&) = delete;
public:
ifstream(): f() {}
explicit ifstream(const char* path): f(fopen(path, "rb")) {}
~ifstream() { if (f) fclose(f); }
ifstream(ifstream&& original): f() {
std::swap(f, original.f);
}
ifstream& operator=(ifstream&& original) {
std::swap(f, original.f);
return *this;
}
// ...
};
EDIT: Added deleted copy ctor/assignment ops because I'm not a savage.
[1] http://dlang.org/
Edit: After thinking on it for a few minutes. All I want is a generic function that takes a class as the generic type T, and a string as an argument, and returns me a completely filled in object of type T, or throws an error on malformed input. I really don't think I am asking for too much here.
Not only is it nicer to the person that's going to maintain your code and may not know about your smart trick, but with generic code there is always an exception that makes you need to write some code somewhere that breaks the abstraction.
What if type T contains a pointer to type U ? Or a list of pointers to a type U, with some duplicates in the list? What if T contains a floating point number, and the application is compiled with ffast-math?
What if the object contains a reference to some unknown type (a file, or socket)?
There are lots of ways of solving these problems, but each of the solutions has trade offs which may be acceptable to you, but aren't acceptable to me and vice versa.
Well, what you need is a way to enumerate the members of a class. Everything else is already done by something like Cereal or Boost.Serialization.
Reflection is always magnitudes slower.
you have got to be kidding me, right ?
"$" adding this character too to the C++ ? What left we haven't add ?
Why not use only something like "reflexpr" which is more C++ish than $ character. Why make C++ more difficult to parse and read with every release ?
TBH I genuinely don't understand why we need both ? why not only have "reflexpr".
There's also the @ sign and the backtic `.
I think that's it for ASCII.
struct Foo {
int i;
int inc()
{
return ++i;
}
};
int Foo::*i_ptr = &Foo::i; // Pointer to member
int (Foo::*inc_ptr)() = &Foo::inc; // Pointer to method
Foo my_foo(0);
(&my_foo)->i_ptr; // Error: my_foo has no member "i_ptr"
(&my_foo)->*i_ptr; // OK
my_foo.inc_ptr(); // Error: my_foo has no member "inc_ptr"
my_foo.*inc_ptr(); // OK
// This is also why you can't create references
// to methods or other class members
int (&print)(const char *) = std::puts; // OK
int (Foo::&inc_ref)() = Foo::inc; // Error
int &my_int = my_foo.i; // OK
int Foo::&i_ref = Foo::i; // Error
Let's say you have
class Foo
{
int m_bar;
int m_baz;
}
int (Foo :: * ptr) = &Foo::m_baz; // Basically offsetof() but safely typed.
int (Foo :: * ptr) = &Foo::m_baz;
std::vector<Foo*> foos;
... // populate foos;
for(Foo* foo : foos)
{ // For each foo, set whatever member ptr points at to 10
foo->*ptr = 10;
}
