
The Australian Square Kilometre Array Pathfinder hits the big-data highway - dgtlmoon
https://blog.csiro.au/australian-square-kilometre-array-pathfinder-finally-hits-big-data-highway/
======
NamTaf
Fun fact: the CSIRO is the only Australian organisation to have a 2nd level
domain name, lacking the .com and just using .au. This is because they were
the first organisation to use a domain name in Australia and got in before the
regulation specified it to be .{com|gov|edu|etc}.au

~~~
zer0t3ch
That is a _surprisingly_ fun fact.

------
nrki
Actually, it generates 5.2T _B_ /sec.

"Its antennas are now churning out 5.2 terabytes of data per second"

~~~
jankey
Either case, I doubt it is "15% of the internets current data rate" as they
claim. E.g. the newish submarine cable between the US and Japan has 60 Tb
capacity. And that's just one cable.

~~~
virtuallynathan
A large chunk of that capacity will be for private use, not for the capital-I
Internet. 280Tbps is about right for interdomain Internet traffic, it lines up
with Cisco's predictions/measurements:
[http://www.cisco.com/c/en/us/solutions/collateral/service-
pr...](http://www.cisco.com/c/en/us/solutions/collateral/service-
provider/visual-networking-index-vni/vni-hyperconnectivity-wp.html)

~~~
wyager
> A large chunk of that capacity will be for private use,

What private use requires that much bandwidth?

~~~
discodave
Inter region traffic of large cloud providers.

~~~
sandworm101
Id call the inner movements of data within google/facebook private networks
part of the internet. When i do a google search, or watch a youtube vid, i
know i am generating more traffic than the http requests and answer from the
edge of google's network. The inner workings of the largest private networks
often service the internet, if not by name, and therefore should be included
in its total bandwidth. But good luck measuring it.

~~~
billiam
Nah, it is mostly traffic to manage the computing activity of large
enterprises, not directly related to what we consider the Internet.

------
satysin
How do you even begin to deal with that much data per second? What do they do
with it? Storing more than a few minutes of data isn't practical is it? Do
they process it in realtime?

~~~
autocorr
You're right in the last point, the raw data has such a high rate that they
have to process it on the fly, which has its drawbacks. The reason that these
instruments output so much data is that the process of correlating the signal
between each unique pair of antennas is O(n^2) in the number of antennas. Now
36 antennas (when complete) is not very large, that's approximately the size
of the VLA[1], but because of the very unique phased array feeds on ASKAP
mentioned in the article, each antenna has not just one receiver, but a little
camera of receivers. So if it's an 8x8 feed/receiver array, that's 64 times
the data rate of a 36 antenna system. For comparison, I've had projects on the
VLA that are about 10 GB/S, but they have only a single feed/receiver. At
ASKAP I believe they have to process/"reduce" the data on the fly, which in
the most extreme case (likely necessary for the real Square Kilometer Array)
is to not store the raw correlations between antennas at all but to convert it
to images immediately, which is lossy. Normally this reduction to images is
optimized by carefully tuning the imaging parameters and iterating to find the
best, but this can't be done if the "visibilities"/correlations have been
thrown away.

[1] Edit: saying that the Very Large Array does not have a very large number
of antennas is actually pretty ironic.

~~~
autocorr
Whoops! I misremembered in my post above, data rates on the VLA should have
been around 20 MB/s not 10 GB/s, but I can't seem to edit my response above
now.

------
nl
For those interested in the architecture used to process this, take a look at
[http://www.slideshare.net/SparkSummit/spark-at-
nasajplchris-...](http://www.slideshare.net/SparkSummit/spark-at-nasajplchris-
mattmann), especially slide 29

------
valarauca1
Stilling waiting for when we put a radio telescope at L3 and L2 giving mankind
a 14Tm baseline interferometry. The current 12Mm interferometry is nice, but
earth is so small.

~~~
zeristor
Radio scintillation of the Interstellar Medium becomes an issue, I daresay the
radio equivalent of "active optics" could help.

I did mention this at an outreach talk with Astroblack morphologies[0] and Tim
O'Brien[1] who pointed out that ISM scintillation was on his slides

I did think up 100s of radio telescopes in a bird cage orbit at the distance
of Jupiter, impractical of course but Space VLBI does reward thinking big.

Isn't it worth having telescopes well out the ecliptic plane, I seem to
recollect VLBI is about filling in the UV plane. It was a long time ago, and
it was just a second year physics project.

[0] - [http://www.artscatalyst.org/astro-black-morphologies-flow-
mo...](http://www.artscatalyst.org/astro-black-morphologies-flow-motion)

[1] - [http://www.jb.man.ac.uk/~tob/](http://www.jb.man.ac.uk/~tob/)

------
dTal
Note that the actual collecting area will be 4072 square meters [1]. 12 out of
36 antennas are currently working.

That's roughly 31 Kbps per square _millimeter_.

Or 1 bit per second per 33 square _microns_.

[1]
[http://www.atnf.csiro.au/projects/askap/specs.html](http://www.atnf.csiro.au/projects/askap/specs.html)

------
dunkelsten
For Americans, that's 2.00773TB/s per Square Mile.

~~~
semi-extrinsic
Are those Metric terabytes (1000^4), or in freedom units (1024^4)?

~~~
snerbles
Should be tebibytes (TiB) for the latter, though binary prefixes don't have
much adoption outside of the software world.

[http://www.physics.nist.gov/cuu/Units/binary.html](http://www.physics.nist.gov/cuu/Units/binary.html)

~~~
wlesieutre
Or inside of the software world, for that matter.

~~~
semi-extrinsic
Huh? 1024^4 is how Windows defines a terabyte, accordingly that's what most
people think a terabyte is. OS X was the first OS to switch to 1000^4 with
Snow Leopard (2009). Different Linux system tools use either standard.

~~~
snerbles
Referring the abbreviation of "TiB" versus "TB" for 1024^4.

------
rb2k_
I wonder how much of that can be compressed for transfer and storage.

------
privong
Please note, this is for a Square Kilometer Array _pathfinder_ (the Australian
Square Kilometre Array Pathfinder, ASKAP), not the Square Kilometer Array.
Construction hasn't even begun on the latter, AFAIK.

~~~
ismail
References:

[https://en.wikipedia.org/wiki/Square_Kilometre_Array](https://en.wikipedia.org/wiki/Square_Kilometre_Array)

[https://en.wikipedia.org/wiki/Australian_Square_Kilometre_Ar...](https://en.wikipedia.org/wiki/Australian_Square_Kilometre_Array_Pathfinder)

------
ashleysmithgpu
The csiro logo they have looks very familiar:

[https://regmedia.co.uk/2012/10/08/cisco.jpg](https://regmedia.co.uk/2012/10/08/cisco.jpg)

[https://www.csiro.au/~/media/Web-
team/Images/CSIRO_Logo/logo...](https://www.csiro.au/~/media/Web-
team/Images/CSIRO_Logo/logo.png)

~~~
tangue
Cisco went to court... and lost. Here's an article where I 've learned that
there's a stylized version of the Golden Gate in Cisco's logo and that CSIRO's
logo is the shape of Australia.

[http://www.itnews.com.au/news/csiro-beats-cisco-in-fight-
ove...](http://www.itnews.com.au/news/csiro-beats-cisco-in-fight-over-
logo-402571)

~~~
nickspacek
I'm embarrassed to admit that I never realized that "Cisco" was, in addition
to the company's name, short/slang for San Francisco, nor that their logo was
meant to evoke the Golden Gate Bridge.

~~~
Infernal
Seconded. Never really thought about the word "Cisco", and assumed the logo
was intended to evoke RF on an oscilloscope or some other visualization of
electronic signaling.

------
yread
5.2TB/s not Tb/s. BIG difference!

------
sctb
We've reverted the title from the submitted “New Australian Square Kilometer
Array Generates 5.2Tb/s”. One of the reasons the guidelines ask us to prefer
original titles is because it's surprisingly difficult to generate new ones
that remain accurate.

