observatory=> select count(distinct(sha256_fingerprint)) from certificates;
observatory=> select count(distinct(target)) from scans;
I don't have a good way to provide direct access to the database yet, but if you're a researcher, ping me directly and we can figure something out.
For exporting, pg_dump -F c greatly compresses the data so cost-wise you might be able to put on S3 and publish as a torrent.
- People request access and get an API key associated with a given load threshold, or don't use an API key and default to some low threshold
- Anything that SQL EXPLAIN says is over the threshold returns an error
- Successful requests' load costs and execution time (and possibly CPU, if that can be determined) count toward a usage rate limit
- An SQL parser implements the subset of SQL you deem safe and acceptable and forms a last-resort firewall
Obviously this is a complex solution; I'm curious what people's opinions are on whether this would overall be simpler or more difficult in the long run.
I'm curious, how fast can one load data into Postgres? Is it possible to import data directly from CSV files?
Hard to answer considering the number of variables impacting. pg_bulkload quotes 18MB/s for parallel loading on DBT-2 (221s to load 4GB), and 12MB/s for the built-in COPY (with post-indexing, that is first import all the data then enable and build the indexes)
> Is it possible to import data directly from CSV files?
Yes, the COPY command can probably be configured to support whatever your *SV format is. There's also pg_bulkload (which should be faster but works offline).
But to answer your question: yes, postgres can load data from csv files: http://stackoverflow.com/questions/2987433/how-to-import-csv...