Looks interesting, and thank you for sharing this! One common issue with scraping web pages is dealing with data that is dynamically loaded. Is there a solution for this? For example, when using Scrapy, you can have Splash running in Docker via scrapy-splash (https://github.com/scrapy-plugins/scrapy-splash).
Thanks! As mentioned in another comment, currently there is no build in support for this yet.
As a workaround one could use a service like ScrapingBee (not affiliated) as a proxy, that renders the page in a browser for you.
Surely, relying on a service for this is not always ideal. I am also working on a small wrapper that turns Chrome into an HTTPS proxy, which you could plug right into flyscrape. Unfortunately it is very experimental still and not public yet. I have not yet decided if I release it as part of flyscrape or as a separate project.
Not only can you, in my experience it is substantially less drama and arguably less load on the target system since the full page may make many many other requests that a presentation layer would care about that I don't
The trade-offs usually fall into:
- authing to the endpoint can sometimes be weird
- it for sure makes the traffic stand out since it isn't otherwise surrounded by those extraneous requests
- it, as with all good things scraping, carries its own maintenance and monitoring burden
However, similar to those tradeoffs, it's also been my experience that a full page load offers a ton more tracking opportunities that are not present in a direct endpoint fetch. I mean, look how many "stealth" plugins out there designed to mask the fact that a headless browser is headless
But, having said all of that: without question the biggest risk to modern day scraping is Cloudflare and Akamai gatekeeping. I do appreciate the arguments of "but ddos!11" and yet I would rather only actors that are actually exhibiting bad behavior[1] be blocked instead of everyone trying with a copy of python who have set reasonable rate limits
1 = this setting aside that "bad behavior" can be defined as "downloading data that the site makes freely available to Chrome but not freely available to python"
I didn't have enough time to dive into all the information or test out the editor. However, one bit of feedback that I have is just wondering what sets ERD Lab apart from other existing solutions? What motivated you to make your own DB design tool? If a prospective customer was either weighing the pros/cons between several DB design tools or already using another tool, what would compel them to decide on using ERD Lab?
Maybe consider including a table highlighting the differences in features/pricing from other similar tools on the homepage (after the list of features) or a separate page entirely highlighting this information? For example, Render.com has a page specifically highlighting how their product is a better PaaS solution compared to Heroku [0]. This is just a thought.
Personally, I'm not familiar enough with DB design editors to know what I should be looking for in such a tool. Moreover, about ~1 year ago, I was looking for a solution that solved this exact problem, so I am genuinely curious about this.
Finally, just wanted to mention that while I don't need this at the moment, I did bookmark it to consider using in the future.
Probably one of my favorite courses during my undergraduate studies at University of Michigan. Recently, I needed to brush up on statistics and was excited when I found this entire course available online.
All of the course materials are available under the 'Materials' tab, including the labs.
Back when I took this course, they were teaching us how to use SPSS, but it seems like they have since transitioned to teaching R. Definitely worth checking out if you need to review the fundamentals of statistics and probability.
I've read a lot of comparisons between Asana and Jira, but I was curious if anyone can explain the difference between Asana and Basecamp? I don't have experience with either service.
We aren't currently evaluating companies for grants, but are likely to restart a version of that program later in the year. If and when we do, founders that participate in the program starting now will be eligible!
Thanks for sharing! Do you mind me asking what admin template you used for the app? I'm searching for a decent admin template right now with a similar color scheme.
The only other company I know attempting to revolutionize email is Superhuman [0]. It will be interesting to compare these services. One thing I can say is that Basecamp will certainly be competitive if they can offer their service for less than $30 per month, which is how much Superhuman charges currently [1].
This is an awesome post. Thank you for sharing your story with the HN community.
I was browsing your GitHub and played around a little with IsoCity [0]. I really like it! Small projects like these are great and you can learn so much from them. You said it very well:
> I don't do projects to gather attention, I do cause I have fun doing them.
This brings to mind Dr. Greer and the Sirius Disclosure project [0]. The first documentary, Sirius, is available on YouTube [1] and the second documentary, Unacknowledged, can be viewed on Netflix.
To be clear, I'm not sure if I entirely believe in all of this myself. However, I don't regret watching any of it. At the very least I got some inspiration for short stories/screenplays.