Using this, in conjunction with AWS Step Functions, Lambda, and ECS, it became merely cents a month to run a headless scraper task in the cloud.
What does your workflow look like?
In short - this gives you the ability to pass the output of one lambda function to the input of another lambda function.
An example of one that I've written to regularly create a new copy of our Production RDS database in Ireland as a Staging RDS database in Oregon.
1. Cloudwatch Event starts Step Function on the 15th
2. Copy last Production snapshot from Ireland to Oregon
3. Restore this snapshot as a new RDS instance (It will fail until the snapshot is available and retry with exponential backoff - this is a step function feature)
4. In parallel:
- Add tags to the instance (Once it's available)
- Delete the snapshot copy (When finished restoring)
- Modify the new instance with security groups and subnets
- In parallel:
- Run a SQL query to anonymize all of PII columns for GDPR compliance as data has now left the EU.
- Call out to the Cloudflare API to update our DNS entry with the new RDS endpoint.
- Delete the old Staging database instance
If we did though - Fargate is a great solution for it, but you wouldn't be able to feed data back into the next step without some additional complexity - Maybe have the next step pull an SQS queue, or an S3 file, or look for a database entry, etc. as it's next bit of data that it needs - and just fail until it finds it, and once the Fargate (Or whatever) has done it's job and placed it in your method of choice, then it could continue.
Do you ensure the values you replace with 'make sense' in the context of the application? i.e are names turned into fake names?
If so, I would love to hear more about you handle the complexities of this. If not, it's still a wonderful pipeline that I'm putting my ideas box, thanks for sharing.
`UPDATE Users set FirstName = 'FAKEFIRSTNAME', LastName = 'FAKELASTNAME', StreetAddress = '123 FAKE ST.', Zip = '10001', PrimaryEmail = Cast(NewId() as varchar(36)) + '@x.com', Phone = '555-555-5555')`
That being said, it was super simple to get started!
A bit more on my use case: currently using pdf.js for rendering HTML reports as a PDF. It’s been a pain. I’d like to essentially take the HTML of some (say, just a <table>) or all (full HTML doc) of the current page and send it off. The main gotcha I can think of is relative URLs but intercept request would resolve that.
One completely unrelated thing. On Chrome 68.0.3440.84, I noticed the large icons (particularly the Kubernetes one) looked "weird", with jagged edges that didn't make any sense. Some poking with the devtools revealed that 'backface-visibility: hidden' seems to be disabling antialiasing.
Suggest opening the following in new tabs so you can flip back and forth between them:
- As is right now: https://i.imgur.com/nYzsukI.jpg
- Nicer-looking: https://i.imgur.com/GNlvx7Z.jpg
I noticed disabling this has an effect on the animation at the top (the edges of the moving webpage slides don't have constantly-moving jaggies).
There may well be a valid reason you have this enabled, perhaps for added performance. Or perhaps React added it in for you? :P
And you can repro :) cool. Makes a lot of sense you can't see it on retina.
Interesting FF bug you hit. CSS3 GPU-accelerated animations are incredibly complex... heh, adding the rule fixed Firefox ~57, but now Chrome 68 is glitching out because the rule is there. I wonder if Google realizes yet. Ponders complexity of creating minimal testcase, versus waiting for someone else to notice :P
Becauee as far as I know you can already run HC with other FaaS solutions, but having this out of the box would be really nice.
(disclaimer: I work for Google Cloud)
Unfortunately Google Cloud Functions don't natively support golang yet so for my business this is a non-starter.
Don’t know if it works. Going to try this week.
EDIT: We’ve got stuff on GH: https://github.com/joelgriffith/browserless, and startup is under 100ms most of the time. Fonts and other things “just work” as well, plus there’s a slew of REST APIs for common stuff as well. Selenium webdriver support landing soon!
cheap, because we optimized for just 3 things:
pre-rendering, screenshots, or PDFs for $0.000365 per API request
curl https://service.prerender.cloud/screenshot/https://google.com/ > out.jpg
curl https://service.prerender.cloud/pdf/https://google.com/ > out.pdf
curl https://service.prerender.cloud/https://google.com/ > out.html
Under 20,000 monthly requests = $9 flat rate ($0.00045)
Under 100,000 monthly requests = $40 flat rate ($0.0004)
>= 100,000 monthly requests = variable rate @ $0.00036/req
If you do too checkout the petition over at https://www.serverless-ruby.org
https://zeit.co/blog/serverless-docker is an example of what I mean
It's still in alpha, so Google is not advertising it wide. It was announced about a week before Zeit's Serverless docker at GCP Next.
Disclaimer: I work at Google and I've had some involvement in the tech behind this.
(it just redirects to cloud.google.com)
Google Cloud of course has "inside knowledge" and I would love to switch to them for my SaaS https://checklyhq.com, were it not that Google Cloud Functions is just offered in four (!) regions...
sorry but Rekognition rekt it for any type of computer vision on AWS.
Great infrastructure...after all I do have an AWS Solution Architect Associate certification....which means jack shit
Great move by GCP, I'm also very pleased with Firebase and it's integration with cloud functions....
BUT my biggest reservation still in 2018 when it comes to serverless is the cold start up time...
I built a token based API on AWS Lambda and registering, signing up took forever when the app was not at peak. that was 2014 tho.
Latency between Google Cloud Functions and other GCP products improved significantly in the past week, as did start up and execution time.
However we only use GCFs for background tasks and not for any web API/micro services, etc. Still too much latency for that use case.
We run each test in a new VM, running on our own private cloud (dedicated servers).
Note: we use the Selenium protocol for this, not yet Puppeteer.
I've been looking at a non-local solution. I'm using Python and this article hints that Puppeteer is not the only way to invoke this. But I don't see any documentation on the DevTools protocol.
Anyone knows if it's supported ? Or any providers that do ?
Then you can use PyChromeDevtools to connect to that host/port:
I’ll dig deeper soon but this is a bad start.
Do you get the impression there’s some config tuning to do or something you can’t control?
For example, a simple actor to convert HTML to PDF looks like this:
Disclaimer: I'm a co-founder of Apify