Hacker News new | past | comments | ask | show | jobs | submit login
Oracle Cranks Up The Cores To 32 With Sparc M7 Chip (enterprisetech.com)
36 points by mariusz79 on Aug 14, 2014 | hide | past | favorite | 46 comments



I like how they're adding hardware specifically to speed up databases, to a CPU based on a RISC core; it looks like there is a trend toward more CISC-like hardware, because designers have figured that operations that would've been done in software can be done faster with dedicated hardware.

My experience with using older SPARC servers for testing various things is that they're rather disappointing, both in terms of value and performance - "more cores" seems to be their guiding principle, and while this makes for impressive benchmark results and aggregate numbers, the speed of a single thread is pretty horrible; it's only in specific multithreaded applications that all the resources on the chip can be fully saturated. Meanwhile x86 servers cost far less and can handle different workloads better because per-clock, each core is several times faster.


On the contrary, I wonder if we're likely to see a resurgence in RISC, as it comes into its own with extremely large numbers of cores. I don't think this Oracle chip is necessarily an indicator of that, but there are other signs indicating it could happen.

What you are saying is very true. Or at least it was true, 20 years ago when RISC was supposed to be the next big thing and then it wasn't. But that landscape is finally starting to change.

What's different now is that after dominating the server and desktop market through brute force, Intel hit a saturation point about 10 years ago and since then, clock speeds haven't increased much, nor have instructions per cycle, both of which were increasing so fast year-on-year through the 90s and early 2000s that nobody could keep up. Now even Intel have been forced to start scaling out horizontally by adding more cores: that's now virtually all that differentiates their fastest CPUs from their slowest.

Because of this, developers are increasingly having to think more about writing concurrent code. Languages like Go, F#, Scala and Clojure are seeing a huge upsurge in popularity lately; it's not happenstance. Functional programming has been largely confined to AI and CS research since the 60s, but it's suddenly becoming popular with pragmatists for scalability.

The thing is, once your code is no longer bound by single-threaded performance, it starts making sense to just add more cores if you can, especially if you can do so without a significant transistor cost. With x86, adding cores is expensive because each one is exceptionally complex, but RISC can be simpler, use fewer transistors and operate at lower wattage. Huge numbers of cores can also be exceptionally power efficient if each core can run in a low power mode until needed.


The idea that once code is rewritten to take advantage ofother cores then more wimpy cores would be better than a few powerful cores is wrong. Most threaded code today can scale to a small number of cores (2, 4, 6) but can't scale to a ridiculously high number of cores (e.g. 32) due to various bottlenecks that are hard to remove.

Also the simplicity of RISC cores vs X86 does not significantly affect chip cost since these days the vast majority of the space on a chip is taken up by on-chip caches (L1, L2, L3). The space taken up by instruction decoding circuits is really, really negligible next to that.


Intel hit a saturation point about 10 years ago and since then, clock speeds haven't increased much, nor have instructions per cycle

I don't think is quite true - have you looked at the results? The difference in single-thread performance between a P4 and a Core i7 at the same clock rate is mind-blowing. We're talking like 500% over 10 years.

Yeah, single-thread performance improvement has slowed down, but it's still there.


With x86, adding cores is expensive because each one is exceptionally complex

Not all that expensive... look at the Xeon Phi for example; it's not a main CPU, but it does contain 61 (no idea why it' not 64...) Pentium-level x86 cores with 16GB of RAM: http://www.amazon.com/Intel-Xeon-Phi-7120P-Coprocessor/dp/B0...

but RISC can be simpler, use fewer transistors and operate at lower wattage

Fewer transistors in the core is offset by the need for more cache due to lower instruction density, and especially in a system with many cores, cache misses can be far more costly due to all the cores needing to be fed with code and data. Also, that Xeon Phi mentioned above has only 5B transistors with 61 cores whereas the 32-core Oracle chip has more than double that.

Huge numbers of cores can also be exceptionally power efficient if each core can run in a low power mode until needed

This is true, but the caches also consume a significant amount of power too - Intel's mobile CPUs have had dynamic cache sizing for several years now, where parts of the cache can be turned off to save power.


> My experience with using older SPARC servers for testing > various things is that they're rather disappointing, > both in terms of value and performance..

I vaguely recall a sparc running solaris that could hot-plug cpu and memory boards (as long as you kept at least one in). I could imagine some customers at the time found that useful.


"Lots" of enterprise systems can do stuff like that. E.g. IBM has systems that can do it as well.


Only by shutting down the partition and its os instance, last I heard. Not really much better than having separate physical boxes.


No, you can offline any CPU/memory board using svcadm and do that. No need to shut down the instance.


trend toward more CISC-like hardware, because designers have figured that operations that would've been done in software can be done faster with dedicated hardware.

And since clock speeds stagnated a decade and a half ago, but Moore's Law still allows more transistors to be packed onto a chip, adding dedicated hardware for specialized operations is about all designers can do.


It's not like we reached any sort of limitation on individual transistor switching speed, though. The problem is that you can't evolve a product line by removing features. You've always got some feature or instruction that is the limiting factor for clock speed; some longest chain of gates that need to settle on the output value before the next cycle. You might be able to shorten that chain by breaking up an operation into several instructions or by reducing the number of registers or addressing modes, but that breaks compatibility. On the other hand, anything that can be added without lengthening the critical path and can be powered down when not in use is fair game.


Individual transistors are getting faster, but wires are not. Wire performance has not been keeping up with the transistors since something like 90nm (2004), hence the "telescopic metal stack"

Clock speeds are in part chosen based on optimal path length (measured in gate count). The longer the path, the less is wasted on flop overhead. The shorter the path, the faster your clock and the more flop overhead. Features or instructions are not important to this equation, because you can break a feature into multiple stages divided by flops. So the fact that frequency has stayed about the same should tell you that path length (gates per cycle) has stayed about the same, which in turn means gates have probably stayed about the same speed for many years!

Next, let's not forget about the power/heat envelope. Modern transistors are mind-numbingly fast if you juice them with 5V, but 1kW CPUs are generally frowned upon these days. We continually squeak down the supply voltage to save power and meet the heat envelope.

Now, you might be thinking about all the extra transistors that are added to newer designs. "Forget about path length," you might say. "If we cut crap features and got rid of those extra transistors, we could afford the power cost of turning up the supply voltage, and get faster!" Alright, now you're thinking smart! But 90% of modern CPU area is spent on graphics and cache, not x87, MMX, and AVX512.

I could go on, but I cannot properly cover all the aspects in one comment...


Given Oracle's history in marketing vs technical competence, that's probably why they're adding it: people like how it sounds. And we'll probably never see any credible evaluation since they NDA all their customers.

From a real computer architecture standpoint the idea sounds pretty far fetched. (People in academia have been looking at FPGA and GPU acceleration in this domain for years. It's really a problem for which a CPU is a pretty good fit, with lots of branchy control flow etc.)


The SQL hardware sounds like a marketing trick.


[deleted]


And you won't get numbers from anyone who owns one, since Oracle expressly prohibit publishing performance figures without permission.


Microsoft do this as well with the CLR.

I decided a couple of years ago that I won't deal with companies that have this policy.


I wonder how much it would cost in licensing to run an Oracle DB on that kind of hardware.


Oracle are increasingly moving to hugely punative licensing schemes for running their DB on non-Oracle platforms, up and down the stack; run a single vCPU guest on a VMware farm? Fuck you, license all your hypervisors. Unless you want to buy SPARC, the you can do sub-capacity licensing.

Time will tell whether customers will prefer to get reamed for Oracle hardware to avoid some Oracle software costs, or simply switch to DBs with less tied pricing models.


Switch to what? We send a seven figure number every year to Oracle but we currently see no alternative.

SQLServer only runs Windows, their licensing model gets more and more Oracle like every year (for us currently "buying" SQLServer from internal IT isn't much cheaper than Oracle anymore). And you have to maintain you own JDBC driver.

DB2 is just the same thing as Oracle.

PostgreS does not compare.

Then there's the migration itself. Firstly your application will be "tuned for Oracle" over several years. So you'll use every proprietary Oracle feature that gives you additional performance (subpartitions, parallel query execution, optimizer hints, manual column statistics, connect by, listagg, …). Which makes porting fun. You can't assess the performance of your application on the new database until you're done porting. Likely the most performance critical parts will have the most Oracle dependencies. You won't have any experience performance tuning the new database. And you won't have any experience operating the new database (monitoring, trouble shooting, backups, HA, failover, maintenance, …).

Take all this together and we're pretty much locked into Oracle. So I see us go with Oracle hardware rather than switch to some other database. Which has the additional benefit of getting rid of the "engineered" SAN by the internal IT which is slower than a single SSD.


> DB2 is just the same thing as Oracle.

DB2 supports sub-capacity licensing on a wide variety of virtualisation platforms, unlike Oracle. This may or may not matter to you. It does to me under a variety of circumstances.

DB2 also has much more granularity in licensing before you have to ramp up to ASE, which, again, may make a big difference to you (many equivalent features are available a significantly lower price points).

> So you'll use every proprietary Oracle feature that gives you additional performance (subpartitions, parallel query execution, optimizer hints, manual column statistics, connect by, listagg, …).

More than a few of these are trivial conversions to other DBs. CONNECT BY is hardly magic, neither's parallel query execution or listagg.

Operational concerns are probably the biggest one.

> Take all this together and we're pretty much locked into Oracle.

If you're in a business where running your company on whatever schedule Oracle chooses to offer you, that won't be a problem.


Tuning is one thing. What if you have a lot of business logic in a few ten or hundred thousands of lines of PL/SQL. What will it cost you to rewrite that in Pg/SQL or T-SQL?


I must say that my first reaction on reading the headline was, Gee, 32 cores * .75 * $40,000, and you're looking at most of a million in licensing per CPU up front, well over $100 thousand per year support. Oracle give away the machines and do very nicely on this one. (Calculations not guaranteed.)


Does anybody know what the pipeline depth is? At 4GHz, it can't be short, and the cost of mispredicted branches could significantly worsen real-world results.


The S3 core has a 16-stage pipeline and it sounds like S4 is roughly the same. All the threads can cover a lot of stalls.


Fairly large, then. Hyperthreading is only useful if you can make use of it and if you are batch processing — if you're running a single computation that needs to finish within a certain time, it won't help you.


I would think that Oracle would be investing more in something like flash storage in their hardware rather than optimizations to processor speed. More threads don't make the disc spin faster and a lot of DB workloads are bound by going to persistent storage.


If you've spent any time with Oracle sales staff you'll know their pitch these days is that their on-chip acceleration is the only way to break IO barriers, and that you'll get a shit experience with Oracle DBs unless you deploy on Oracle hardware. It's an old-school full stack play straight from the mainframe era.

SSD, by way of contrast, is relatively commodity stuff that Oracle can't get a lock-in with.


basically vaporware. In the sense that almost nobody will see/touch it. Huge chip - 10B transistors - using new process node - 3D 16nm - will have such low yield that its obviously high price will be even higher. Thus only a handful of systems will be delivered. Thus practical non-existence equivalence.

SPARC CPU division brought down Sun, and i see it still puts a good fight inside Oracle despite Rock cancelation :)

Edit: even if the CPU become realistically (in the High Enterprise sense) available just imagine what [ram-to-cpu] bus should the CPU to sit on for it to be able to feed the beast, especially considering that it will be DB application, not HPC for example


>using new process node - 3D 16nm

The Hotchips slides explicitly mentioned 20nm. Based on a bad assumption, the author for this story decided to freely reinterpret '20nm' as 16FF. (There's a connection between the two, but that's mostly for pedants on internet forums)


Vaporware means it won't exist. You mean it will be exclusive. This is not the same thing.


Beyond this I wouldn't trust anything about a platform whose sole future & trajectory is governed by Oracle. Not a good boat to be in.


I'd love to see the SPARCStation make a comeback with that thing.


What would you need it for that couldn't be done better on an iMac?


Apparently it does Java better?

Mostly it would be nostalgia. I was assigned to a Sparc at at my first real developer job. It gave me an appreciation for exotic workstations. Plus, I love the Sun style keyboards (I use an HHKB 2 Pro today)


One of the better write ups, since the Hot Chips presentation.


It's a shame we've got to wait so long until the publish the videos of the presentations.


“Decompression can be driven at memory bandwidth rates, so there is now downside to using compression,” says Fowler. The on-chip compression leaves the S4 cores leftover capacity to do useful work.

I assume he meant to say that there is "no downside to using compression" ?


This is a big thing that GPUs have been doing for decades. I'm honestly surprised this has not been done until now.

Scientific, and many other data processing workloads use compression because memory bandwidth is the slow part. Mostly once you have the data in memory, doing some extra instructions on it is almost free, considering how slow a memory load is in comparison these days. But if all you are doing is compressing the data, then I guess it's a definite win. It'll be interesting to see what sort of speedups this gives when you're also processing the data.


Yeah, he surely means no disadvantage.

This has been true for fast LZ e.g. LZO on normal CPUs for ages, and the only reason hardware doesn't try and transparently compress RAM is the RA in RAM.


Still don't understand why businesses buy this instead of scaling-out. Cost? Complexity?


I'm not saying that Oracle hardware or software is the solution, but "scaling-out" is incredibly difficult in transaction processing. I worked at a mid-size tech company with what I imagine was a fairly typical workload, and we spent a ton of money on database hardware because it would have been either incredibly complicated or slow to maintain data integrity across multiple machines.

I imagine that situation is fairly common.


Generally it's just that it's really difficult to do it right. Sometime's it's impossible. It's often loads more work (which can be hard to debug). Furthermore, it's frequently not even an advantage. Have a read of https://research.microsoft.com/pubs/163083/hotcbp12%20final.... Remember corporate workloads frequently have very different requirements than consumer.


Both, I suspect. When you consider the cost of engineer time needed to create high performance scale-out systems in a problem domain that requires consistency, scale-up can start to look fairly attractive.


Oracle is incredibly well crafted for managing a transactional database in an enterprise setting. If it's $200k more and the option is to take a massive risk what would you do ?

At some point the massive risk will become worth taking; this has happened for ETL and is on the way to happening for BI; there is a skills / tools gap but that will change.


The third and most important reason:

They have someone to blame when shit goes tits up. Response time may still be abysmal, but "dammit we payed a million dollars for this!"


and one more thing with large volume transaction processing is scaling out on I/O and ETL jobs.Oracle (and other DB vendors) offer a lot of utilities that scale out perfectly with least of development time. For e.g. it would rather be a difficult deal to work with parsers to parse a Gigabit-data file with code we have to own up , than to use features these DBs directly offer to parallel-load into a ready to use table within minutes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: