Hacker Newsnew | past | comments | ask | show | jobs | submit | bjerun's commentslogin

Are there any real life applications for this?


Author here: In mathematics, in particular discrete mathematics, there is a genuine need to store long integers accurately and to compare them numerically. This encoding is useful to do this for any database that can store UTF-8 strings and has a sorted index for these doing string sort. The person in the talk wanted to store data about finite groups and large group orders are one such application.


Many catalogues of mathematical groups contain things like the monster group, which has 808,017,424,794,512,875,886,459,904,961,710,757,005,754,368,000,000,000 elements. That won't fit in a 64 or even 128 bit integer.

It is common to want to ask a database "tell me all groups of size greater than X, with properties A, B and C". Now, if you have arbitrary sized ints, no problem. If your database (or language) doesn't support big ints, you need to figure out how to do "bigger than X", when you are storing big numbers in some other format, probably strings.


Surely if you're storing numeric values as strings, it would be better to use something like base64 encoding to use as much of the available symbol space as possible?

If you can sort a numeric string in base 10, you can sort one in base 16, or base 64, or base 256.


I think you could use this method with other bases, although be careful with base 256 -- if your strings are stored in UTF8 or something, then base 256 isn't actually that useful.

However, the important bit (to me) is that you can use your database's sorting function to compare the numbers.

The advantage of base-10 is it's easy to display the number :) Others bases do provide a constant speed improvement of course.


If we're in a database context you could store the length of the number in another number (64 bit would be plenty, nobody is storing a number with 18 exabytes of digits) then compare/sort by that first then the number itself, bases are irrelevant, besides the space savings of storing it directly in binary.

In fact I'd make an index on (len(N), N), then include len(X) and X in all your comparisons and range queries (wrapping for ergonomics as needed). You can use any base storage (including compact varbinary base256).


I'm genuinely curious, does anyone actually want/need to look at numbers larger than will fit in a 128bit int? I understand there are applications that require use and storage of such numbers, but how often is there are real need to display them?


In discrete mathematics this happens a lot. Group orders, sizee of conjugacy classes, semigroup orders, numbers of isomorphism classes, character degrees, etc.


"It happens" doesn't imply there's any need for it to happen.


You can also checkout http://db-engines.com/en/ranking/graph+dbms which gives a ranking of various technologies (graph, document, search).


In principle, ArangoDB behaves similarly to MongoDB here. Both are essentially "mostly-in-memory" databases in the sense that they hold the data in memory and persist it at the same time to disk via memory mapped files. This approach is good for performance and if you run out of RAM you ought to shard your data.

However, MongoDB often uses a lot of memory for the actual data, since its BSON binary format stores the names of the attributes with every single document. ArangoDB detects similar shapes of documents (see https://www.arangodb.com/faq#how-do-shapes-work-in-arangodb) and thus avoids this particular problem.


I have been bitten by this using MongoDB as well. The shape recognition of ArangoDB sounds very useful. If this works well, it would alleviate a problem that NoSQL solutions so far have in comparison to classical relational databases.


I agree. It is extremely hard to select the correct technology. So, sometimes it is even more valuable to learn, what has not work (even if it looked nice from the description), than what has worked.


I hope it works better then owncloud - never got the replication to work reliable.


Unfortunately, the link to the Green-Marl compiler "paper from CGO 2014" is broken. Anyone knows where to find this?


It should be available under the publications tab here: http://ppl.stanford.edu/main/green_marl.html

Here's a paper about compilers: http://ppl.stanford.edu/papers/a134-sujeeth.pdf

And one about DSLs: http://ppl.stanford.edu/papers/cgo14-hong.pdf



COuld be. But you have to buy it. I must say I do not like to pay for research articles.




I'm using 8.8.8.8. Does this mean I might have a problem now (virus or similar)?


You wouldn't have been affected at all if you aren't in Brazil or Venezuela.


That's not how I read it. It seems as though all traffic (globally) to those net blocks was rerouted to Venezuela, for a brief period.


The rerouting affected networks in that country and Brazil for 22 minutes, BGPMon said.


Ist was stable for a long time now.


awk is missing. Best language ever.


    -Wl,-O1 Did you know that you also have an optimization flag for the linker ? Now you know!
Ages ago the linker on SUN used to compile templates. What is GCC doing/using this flag for?


See this LWN article: http://lwn.net/Articles/192624/


So it is not really optimizing the code - more like optimizing the symbol table?


Yes, looks like it. The default symbol lookup can be really slow in some situations, especially in large C++ projects with can contain many symbols that share a long common prefix thanks to C++ name mangling.


"Optimising the linked output" seems a bit vague but that's about as clear as the man page.

    If level is a numeric values greater than zero ld optimizes the output. This might take significantly longer and therefore probably should only be enabled for the final binary. At the moment this option only affects ELF shared library generation.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: