I just noticed that there are new Ubuntu updates today, 57 of them totaling over a 1 Gig. It looks like one large item is libwebkitgtk-3.0-0-dbg weighing at 461.9 MB, the other is also for webkit around the same size (massive). Is it just me or does this seem like a really huge library?
Is around 4-5% larger with debug info, and that's a single trivial C program with a single call, and only build in types.
WebKit is largely C++, so a fairly trivial C++ program (that has no function):
template <typename T> struct MyFoo {
T m_field;
template <typename U> void add(MyFoo<U> rhs)
{
m_field += rhs.m_field;
}
};
Has a 10% size difference in size between deployment style builds and builds with debug info.
As the code size increases that amount of information needed in debug builds starts getting very large, as the description of classes and functions starts getting even larger than pure C versions.
The size of a debug build can vary quite a lot, as there are different forms of debug info, most commonly stabs or DWARF (and DWARF2). But even these are fairly bad in terms of space efficiency (stabs is evil), mostly as there isn't any real reason for focusing on that - fairly "trivial" space optimizations aren't implemented at all which means there are huge amounts of duplicated information, and the debug formats aren't really C++ aware, so each instantiation of a template results in a complete duplication of all the data describing it.
It would be nice if these problems could be fixed (eg. debug info that's aware of templates is the biggest problem in webkit), but other places where information is duplicated needlessly is also fairly crappy. But there's stuff pushing back against such optimizations:
Debug symbols are primarily used in _debug_ builds, while debugging. When you're debugging you want the build to be as fast as possible, and any work the compiler+linker do to reduce debug data size is going to add to compilation time, which everyone will then complain about.
(Of course without SSDs we'd have probably reached the point we're writing such large amounts of data out would be taking longer than reducing the size and writing less data in the first place...)
Exactly, -dbg packages only contain the debug symbols. This is to ensure that the actual library package is quite small while allowing people to install debug symbols if needed, without compiling from source.
I use apt-cacher-ng as a caching proxy for Ubuntu packages. Highly recommend it if you're updating multiple machines.
As other commenters have noted, your particular system's bloat is likely due to a -dbg package (with debugging symbols). It might be time for a fresh OS reinstall to purge the accumulated junk. (A new Linux Mint -- my personal distro of choice -- will probably be about a month after the Ubuntu 12.10 release.)
Even something trivial like #include <stdio.h>
int main(int argc, char argv) { printf("Hello world\n"); return 0; }
Is around 4-5% larger with debug info, and that's a single trivial C program with a single call, and only build in types.
WebKit is largely C++, so a fairly trivial C++ program (that has no function): template <typename T> struct MyFoo { T m_field; template <typename U> void add(MyFoo<U> rhs) { m_field += rhs.m_field; } };
int main(int argc, char argv) { MyFoo<float> f; f.m_field = 1.5; MyFoo<int> i; i.m_field = 1; MyFoo<double> d; d.m_field = 0; d.add(i); d.add(f); }
Has a 10% size difference in size between deployment style builds and builds with debug info.
As the code size increases that amount of information needed in debug builds starts getting very large, as the description of classes and functions starts getting even larger than pure C versions.
The size of a debug build can vary quite a lot, as there are different forms of debug info, most commonly stabs or DWARF (and DWARF2). But even these are fairly bad in terms of space efficiency (stabs is evil), mostly as there isn't any real reason for focusing on that - fairly "trivial" space optimizations aren't implemented at all which means there are huge amounts of duplicated information, and the debug formats aren't really C++ aware, so each instantiation of a template results in a complete duplication of all the data describing it.
It would be nice if these problems could be fixed (eg. debug info that's aware of templates is the biggest problem in webkit), but other places where information is duplicated needlessly is also fairly crappy. But there's stuff pushing back against such optimizations:
Debug symbols are primarily used in _debug_ builds, while debugging. When you're debugging you want the build to be as fast as possible, and any work the compiler+linker do to reduce debug data size is going to add to compilation time, which everyone will then complain about.
(Of course without SSDs we'd have probably reached the point we're writing such large amounts of data out would be taking longer than reducing the size and writing less data in the first place...)