
Nvidia Jetson TX1 Developer Kit SE - fventuri
https://www.nvidia.com/object/JetsonTX1DeveloperKitSE.html
======
dheera
They should just call it what it is: a "clearance sale", instead of this
wishy-washy "SE" nomenclature to make it sound like something new ;) What's
next, GTX 950 collector's edition?

~~~
fventuri
1\. $200 for the TX1 is a much lower entry point for GPUs than $600 for the
TX2 2\. According to Nvidia (see comparison here:
[https://devblogs.nvidia.com/parallelforall/jetson-
tx2-delive...](https://devblogs.nvidia.com/parallelforall/jetson-tx2-delivers-
twice-intelligence-edge/)), the TX2 is about twice the performance of the TX1,
but since the cost of the TX2 is about 3 times the cost of the TX1, I think
you get more "bangs for your bucks" with this offer

~~~
k_sze
Performance is not the only point of consideration.

There are probably features on the TX2 that are not found on the TX1, though I
haven't actually looked at the difference.

For instance, IIRC, the hardware accelerated jpeg decoding is pretty limited
on the TK1, compared to the TX1/TK2. Basically you can decode and _display_
(m)jpeg really fast, but you have little control over the output pixel format.
It's a black box hidden in nvidia's special hardware. If you try to
programmatically get at the pixel data, you'll find that the pointer is
actually DMA-mapped, which means it's extremely slow to memcpy from, which
defeats the purpose of hw acceleration.

~~~
arghwhat
I don't get your claim about DMA memory being slow to memcpy from. DMA memory
isn't special, apart from being non-relocatable and contiguous within an
allocation. Things would be much worse if the copy to main memory wasn't done
with DMA.

~~~
k_sze
The explanation I got (from somebody else a while back) is this: DMA is only
fast if it's accessed in large blocks, but memcpy is not designed to work that
way.

The hw accelerated jpeg decoding on the Jetson TK1 is backed by something
called NVMM. I suspect it's a small chunk of memory that is closer to the
special hw decoder, not part of the main 2 GiB RAM. When you decode jpeg, you
get a DMA-mapped pointer to it.

When you memcpy from it, memcpy tries to copy the data one byte at a time (or
in very small chunks, if vectorized). So what happens is that in order to copy
data from a large block of DMA-mapped memory, memcpy makes many requests for
single bytes or very small chunks, but DMA ends up fetching and staging the
same large block multiple times for memcpy to read.

The same problem applies to any other operation that reads from the DMA-mapped
memory in very small chunks. There is also no way to make the DMA-mapped pixel
data directly available to a CUDA context, which is what Jetson board users
would probably care about.

I'm not usually a system programmer so my explanation may be a bit fuzzy or
inaccurate, but that's the gist of it.

~~~
arghwhat
DMA isn't something you access. Direct Memory Access means that an external
device can issue reads and writes to main system memory (you call such a
transaction a "DMA transfer"). What _you_ access is entirely normal memory.

A DMA transfer is done entirely by the external device, without any
intervention by you or the CPU (apart from potentially a cache invalidation
before/after, depending on platform). From the point of view of the CPU (and
therefore you), data just magically appears. You don't need to memcpy anything
anywhere, unless you need to make sure the external device doesn't
accidentally overwrite things (who knows, it might be stupid). You can read
and write directly to wherever it appeared as if it was any other memory.
Caching still applies, although a fresh DMA transfer will of course be a cache
miss.

There is a little bit of magic related to memory regions used for DMA
transfers, but that is invisible to the end-user. 'memcpy' does not
discriminate, and unless you're copying a single byte, memcpy will never
operate on byte-sized chunks.

Source: I do driver development for devices that DMA at extremely high rates
(we're desperately waiting for PCIe 4.0 to become normal—PCIe3.0x16's 126Gb/s
is way too slow for us).

------
new299
That actually quite nice. When I benchmarked it, for my application without
using any GPU stuff, it was a little more powerful than my current Macbook.
Arguably the 2016 macbook is underpowered, but I still found the TX1 quite
nice.

I guess it's fortunate that it's US and Canada only, otherwise I'd probably
buy one and not use it. :)

------
awill
I really liked the X1 SoC. Nvidia did a fantastic job targeting a wide range:
The Pixel-C tablet used it, as did the Shield TV and even the Nintendo Switch.
It was far and away the most powerful mobile SoC upon release, and it was
great to have an alternative to Qualcomm (Nvidia opengl drivers smoke the qcom
garbage).

It's such a shame that Nvidia is no longer targeting the mobile market with
their SoCs. I play games/emulators on my Shield TV, as well as h.265 4K HDR
content and it's great, but I'd gladly buy another Shield TV using the X2 (to
better handle GameCube/Dreamcast emulators). Unfortunately Nvidia doesn't seem
to have a followup SoC designed for the 'mobile' market. They are going after
the higher margin (less TDP sensitive) AI/computer vision market.

------
johansch
"Offer valid in the U.S. and Canada only."

------
rasjani
This is pretty decent device. Played around with these when I was working in
automotive stuff in previous job and was playing with the idea that I'll get
one of those to replace my current Kodi box.

Runs QNX atleast and Linux/wayland, no idea about possible Xorg drivers...

~~~
fla
It also runs Ubuntu.

PS: A raspberry PI will run Kodi perfectly :)

~~~
philjohn
Until you want to decode h.265 ... then it will fall over.

~~~
DCKing
The Jetson makes sense as many things, but it's serious overkill to be a Kodi
box. It's a nice development kit that includes CUDA, and for some people it's
even a nice and interesting desktop.

For Kodi and media purposes however it makes far more sense to buy either (1)
the Nvidia Shield TV with the same chipset and also costs $200, if you're
looking for a media player with and don't care about running your own software
or (2) the Odroid C2, which is one third the cost of this, is a much smaller
device (and probably more power efficient), will soon run mainline Linux
(nightlies out already) and plays h265 and 4k just fine!

------
andreiw
When are these things going to become SBBR compliant? I am not even going as
far as asking for an XHCI controller that doesn't require a blob to act as an
XHCI controller (although this could well be hidden by firmware), but adopting
at least UEFI (even without ACPI!) would be a really great start.

~~~
TD-Linux
The Overdrive 1000 (with an AMD A1100) has this. You can boot Linux off CD or
USB and just install it. It's pretty magical.

~~~
andreiw
Yes, along with the 2nd and now 3rd generation of ARM server solutions from
the known players...there's just no excuse for nVidia to make the TX1/TX2
software system so closed and non-compliant.

------
cheapsteak
What would someone typically do with this?

~~~
gh02t
They are [somewhat] popular for embedded computer vision and machine learning
development. Think developing algorithms for autonomous cars or similar as an
example. Stuff where you need an embedded platform, but need a powerful GPU
for computational purposes.

That combination is somewhat rare, especially in an affordable, ready-to-go
format available in small quantities that developers can use to play around.

------
Jack000
"Are you a member of the NVIDIA Developer Program in the US or Canada? If so,
you’re eligible for an exclusive developer discount on the NVIDIA® Jetson™ TX1
Developer Kit SE"

is it easy to join the developer program? Seems like not everyone gets this
discount

~~~
figgis
The signup form is right under the advertisement.

~~~
Jack000
so all you have to do is register? Is everyone who apply approved?

they make it sound like an exclusive thing by leading with that line.

~~~
ReverseCold
It's like signing up for any other online account.

