More

bjerun · on Sept 7, 2017

Are there any real life applications for this?

neunhoef · on Sept 7, 2017

Author here: In mathematics, in particular discrete mathematics, there is a genuine need to store long integers accurately and to compare them numerically. This encoding is useful to do this for any database that can store UTF-8 strings and has a sorted index for these doing string sort. The person in the talk wanted to store data about finite groups and large group orders are one such application.

CJefferson · on Sept 7, 2017

Many catalogues of mathematical groups contain things like the monster group, which has 808,017,424,794,512,875,886,459,904,961,710,757,005,754,368,000,000,000 elements. That won't fit in a 64 or even 128 bit integer.

It is common to want to ask a database "tell me all groups of size greater than X, with properties A, B and C". Now, if you have arbitrary sized ints, no problem. If your database (or language) doesn't support big ints, you need to figure out how to do "bigger than X", when you are storing big numbers in some other format, probably strings.

logfromblammo · on Sept 7, 2017

Surely if you're storing numeric values as strings, it would be better to use something like base64 encoding to use as much of the available symbol space as possible?

If you can sort a numeric string in base 10, you can sort one in base 16, or base 64, or base 256.

CJefferson · on Sept 7, 2017

I think you could use this method with other bases, although be careful with base 256 -- if your strings are stored in UTF8 or something, then base 256 isn't actually that useful.

However, the important bit (to me) is that you can use your database's sorting function to compare the numbers.

The advantage of base-10 is it's easy to display the number :) Others bases do provide a constant speed improvement of course.

infogulch · on Sept 7, 2017

If we're in a database context you could store the length of the number in another number (64 bit would be plenty, nobody is storing a number with 18 exabytes of digits) then compare/sort by that first then the number itself, bases are irrelevant, besides the space savings of storing it directly in binary.

In fact I'd make an index on (len(N), N), then include len(X) and X in all your comparisons and range queries (wrapping for ergonomics as needed). You can use any base storage (including compact varbinary base256).

jcwayne · on Sept 7, 2017

I'm genuinely curious, does anyone actually want/need to look at numbers larger than will fit in a 128bit int? I understand there are applications that require use and storage of such numbers, but how often is there are real need to display them?

neunhoef · on Sept 7, 2017

In discrete mathematics this happens a lot. Group orders, sizee of conjugacy classes, semigroup orders, numbers of isomorphism classes, character degrees, etc.

Retra · on Sept 7, 2017

"It happens" doesn't imply there's any need for it to happen.

bjerun · on April 21, 2015

You can also checkout http://db-engines.com/en/ranking/graph+dbms which gives a ranking of various technologies (graph, document, search).

bjerun · on Nov 7, 2014

In principle, ArangoDB behaves similarly to MongoDB here. Both are essentially "mostly-in-memory" databases in the sense that they hold the data in memory and persist it at the same time to disk via memory mapped files. This approach is good for performance and if you run out of RAM you ought to shard your data.

However, MongoDB often uses a lot of memory for the actual data, since its BSON binary format stores the names of the attributes with every single document. ArangoDB detects similar shapes of documents (see https://www.arangodb.com/faq#how-do-shapes-work-in-arangodb) and thus avoids this particular problem.

wormit123 · on Nov 7, 2014

I have been bitten by this using MongoDB as well. The shape recognition of ArangoDB sounds very useful. If this works well, it would alleviate a problem that NoSQL solutions so far have in comparison to classical relational databases.

bjerun · on June 26, 2014

I agree. It is extremely hard to select the correct technology. So, sometimes it is even more valuable to learn, what has not work (even if it looked nice from the description), than what has worked.

bjerun · on June 23, 2014

I hope it works better then owncloud - never got the replication to work reliable.

bjerun · on June 18, 2014

Unfortunately, the link to the Green-Marl compiler "paper from CGO 2014" is broken. Anyone knows where to find this?

14113 · on June 18, 2014

It should be available under the publications tab here: http://ppl.stanford.edu/main/green_marl.html

Here's a paper about compilers: http://ppl.stanford.edu/papers/a134-sujeeth.pdf

And one about DSLs: http://ppl.stanford.edu/papers/cgo14-hong.pdf

daureg · on June 18, 2014

Could it be this one? http://dl.acm.org/citation.cfm?doid=2544137.2544162

Lerato · on June 18, 2014

COuld be. But you have to buy it. I must say I do not like to pay for research articles.

bjerun · on June 18, 2014

There is also this PPT http://cgo.org/cgo2014/wp-content/uploads/2013/05/Simplifyin...

daureg · on June 18, 2014

http://ppl.stanford.edu/papers/cgo14-hong.pdf according to Google Scholar

bjerun · on March 17, 2014

I'm using 8.8.8.8. Does this mean I might have a problem now (virus or similar)?

MertsA · on March 17, 2014

You wouldn't have been affected at all if you aren't in Brazil or Venezuela.

davb · on March 17, 2014

That's not how I read it. It seems as though all traffic (globally) to those net blocks was rerouted to Venezuela, for a brief period.

thejosh · on March 17, 2014

The rerouting affected networks in that country and Brazil for 22 minutes, BGPMon said.

bjerun · on March 11, 2014

Ist was stable for a long time now.

bjerun · on March 11, 2014

awk is missing. Best language ever.

bjerun · on March 10, 2014

    -Wl,-O1 Did you know that you also have an optimization flag for the linker ? Now you know!

Ages ago the linker on SUN used to compile templates. What is GCC doing/using this flag for?

pja · on March 10, 2014

See this LWN article: http://lwn.net/Articles/192624/

bjerun · on March 10, 2014

So it is not really optimizing the code - more like optimizing the symbol table?

pja · on March 11, 2014

Yes, looks like it. The default symbol lookup can be really slow in some situations, especially in large C++ projects with can contain many symbols that share a long common prefix thanks to C++ name mangling.

zimpenfish · on March 10, 2014

"Optimising the linked output" seems a bit vague but that's about as clear as the man page.

    If level is a numeric values greater than zero ld optimizes the output. This might take significantly longer and therefore probably should only be enabled for the final binary. At the moment this option only affects ELF shared library generation.