
Full Ethereum blockchain now available as a BigQuery public dataset - matt2000
https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-public-dataset-smart-contract-analytics
======
zone411
It's not that difficult, just a bit annoying, to get the full Ethereum and
Bitcoin blockchains running locally and create a database-type access. You
will probably spend more time waiting for the download to finish than actually
writing code. Bitcoin is a little harder than Ethereum because of the format
changes and that might require using an older version (at least that's what I
did).

~~~
Drdrdrq
It's not trivial, that's for sure. And it is not download times that kill you,
it is recomputing that is really slow, and you need it if you want full
history. You really want to have SSDs. The export itself is quite fast though.

~~~
zone411
Yes, an SSD is needed for sure. I just called it a download because that's
when bitcoind/geth do their validation/sync.

~~~
Drdrdrq
Sure, just wanted to make it clear, since download of that 1TB (?) could be
much faster.

------
wslh
But the problem is that using BigQuery is not cheap even when it is public.

~~~
swerveonem
And you still have to implement your own "verify against actual blockchain
node" function yourself before actually trusting any query results.

~~~
fastball
Is the purpose not more for analysis than validation?

------
CobrastanJorji
I don't follow cryptocurrency news, but the example query of cost transferred
sums is interesting. What the heck happened around July 2nd?

~~~
TTPrograms
There was a Chinese decentralized exchange that (foolishly) decided they would
include ERC20 coins for trading according to which had the most transfers from
unique accounts to the exchange (I believe?). This incentivised individuals
who wanted their coins included to distribute them among as many sock puppet
accounts as possible and transfer them separately to the exchange. The
resulting transaction traffic had a non-negligible impact on network capacity.
In cases of load like that (or many individuals with strong incentives for
their transactions to get in) people pay higher fees to increase the
probability that their transaction is included in the next block.

------
shazow
In case anyone is curious about the size of the Ethereum blockchain (which the
linked article doesn't even mention but seems to inevitably come up in
comments), I took some measurements recently:
[https://twitter.com/shazow/status/1004506114392838146](https://twitter.com/shazow/status/1004506114392838146)

tl;dr: ~65GB of bandwidth is required to download all the necessary data. The
default node indexing and denormalization of this data takes around 100GB
after compacting.

~~~
TTPrograms
For practical interaction with the network this can be drastically reduced by
using light sync, of course.

------
buhrmi
Wonder how they decide whether or not a contract is ERC20 or ERC721 compliant
to set the `is_erc20` and `is_erc721` values in the `contracts` table.

~~~
Drdrdrq
Probably by checking if contract supports the mandatory functions.

(I did some eth blockchain analytics a while ago)

------
karp773
Reads like a good April Fools' Day joke.

------
ca98am79
haha, the size of the Ethereum blockchain is making it centralized. Less and
less nodes can download and keep up with it.

~~~
Barrin92
>haha, the size of the Ethereum blockchain is making it centralized

This is to be expected. Centralization and hierarchies are the fundamental
ways to deal with complexity, which is why all non-hierarchical systems are
doomed if they grow too large. That's as true for physical or biological
systems as it is for markets and currencies, which is why the whole
decentralised crypto dream is a fool's errand.

~~~
darawk
That's why planned economies work so well, right?

~~~
Barrin92
centrally planned economies do not work well, but the focus here lies on the
word 'central', not 'planned'. All economies are planned to a significant
degree, albeit in corporations (which internally _do plan_ rather than rely on
market mechanisms). The equivalent of cryptocurrencies in the economic sphere
would be turning every company into an army of independent individual
contractors whose only tool is the legal contract. This would lead to
unmanageable complexity and overhead that would be unacceptable and very soon
end in people naturally organizing in competent hierarchies, producing the
dreaded institutions and middlemen that cryptocurrency tries to do away with.

Read for this very question Coase's essay, _The Nature of the Firm_.

~~~
darawk
I've read Coase. The equivalent of cryptocurrencies in the economic sphere
would be exactly what cryptocurrencies are: partially decentralized. Firms
form due to the transaction costs of decentralization. Mining pools form due
to the transaction costs of decentralization.

~~~
cryptobeanbaby

      Mining pools form due to the transaction costs of decentralization.
    

Better explained as a result of the mining protocol being winner takes all,
and is granted write access authority (for guessing a increasingly larger
random number), that by pooling everyone's lottery tickets together and paying
the pool operator, the agree to take a small sum of each successful guess from
the pool (assuming the pool fairly pays out participants).

"decentralized" except not.

~~~
darawk
Variance is a transaction cost.

------
r32a_
This is why Bitcoin blockchain is more concerned about keeping the block sizes
small and efficient as possible than adding fancy smart contract features that
just add bloat to the blockchain.

~~~
bouncycastle
Ethereum's block sizes are quite small, about 25kb per block right now.

Actually, block sizes are not limited by bytes like in bitcoin, but by 'gas'.
This gas limit can be dynamically adjusted by miner votes, and the way the
incentives work, it keeps the blocks not too big, but also not too small.

One feature of Ethereum is that it automatically discourages mining
centralization using the 'uncle rewards' system. When the blocks increase in
size (and thus put pressure on centralization as you noted), the uncle rate
increases too, which is undesirable for miner profits. If the uncle rate gets
too high, miner's interest is to vote down the block 'gas' limit, this ensures
all the blocks can prpagate around the network fairly.

~~~
ghthor
I didn't even know this. That's a great example of how interesting you can get
with the incentives system, makes me so excited for the future.

------
srcreigh
The public access to application data showcased here seems to strengthen my
suspicion that Ethereum will spawn an era of machine learning innovation. It
seems analogous to what happened with TCP/IP: the former lets everyone connect
1-1, while the latter lets everyone access and analyze entire application
datasets. Of course it is possible to provide universal access to an
application dataset with TCP/IP, but this is costly (effort, money) and
incentives don't always align; organizations of today often work to prevent
access to application datasets. The ethereum future, on other hand, is
exciting: organizations having an incentive to share application datasets,
making money with innovative analysis / applications of the datasets.

~~~
jameslevy
The incentive you speak of to share application datasets - is that a Layer 2
application on Ethereum where a token would be used for accessing data?
Intriguing idea but not sure I completely follow what you are describing.

