Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Sitebulb, a website crawler and auditor for SEOs (sitebulb.com)
120 points by hathawayp on Sept 28, 2017 | hide | past | favorite | 72 comments



This looks really good. My only comment I have is that if I am purchasing a piece of desktop software I expect there to be at least the option of an annual license.

Monthly feels more like a service than a product and I think of desktop software as a product.

But I realize Desktop software pricing is evolving so this may just be old school thinking on my part.

Also, if I'm a Mac user (I am) I expect to be able to buy / pay through the app store. I know it sucks because Apple takes a huge cut but there is a level of trust (misplaced or not) and ease of use that comes with the app store. I am much more likely to use an app if it is in there. Just like on Linux I am much more likely to use an app if it is in a package manager.

Speaking of that, I don't know what desktop framework was used but, is there anything preventing a Linux version?


Interesting comment regarding the App Store. To be honest I'm really surprised no one has ever said this to us before. We have another product - (http://urlprofiler.com/) - for Windows and Mac that we've been selling for 3 years and no one has ever given us this feedback.

We're not wedded to a price structure, although we're rolling with monthly for now. I'm pretty sure through weight of demand that we'll need to add Yearly plans in the next few months.

There's nothing preventing a Linux version (it's built in Electron) other than demand really. We'll do it if enough people want it, but we have a bunch of other features on our roadmap that are currently a higher priority.


The app store has a few benefits. The biggest one, for paid software is that if I format my computer or move to a new computer my license come over with my Apple account. I don't need to worry about remembering an account and license keys for each each.

I can't count how many apps I've paid for but I don't have installed right now because I lost a hard drive and my license keys with it.

Though I'm guessing with monthly billing it is more a username / password thing for you vs a key. (I haven't tried it yet, sorry).

The second benefit is not having to give my billing info to yet another company.

I'm not unlikely to install an app not from the app store. But I'd say I'm... complete guess here... 30% more likely to install it if it is in the app store.

And I can also say that if I have the choice between the two (same software on the app store and off), I will always do the one in the app store because of the previously listed benefits.


Thanks for the feedback, good to know.

Yeah we use a username/password so there's no issue with losing a license key.


that's not very customer-friendly... I wouldn't want to remember username/password for your website, when the App Store can just let me re-download your app without going any hassle.


Adobe Creative Cloud (as well as how streaming services have made people at all levels inured to low monthly subscriptions) has made this type of monthly pricing much more normalized - good for providers who can now point to recurring revenue, not so great for users who usually end up paying more.


It's worth considering what your particular customers want. I make Photoshop plugins, and since the CC subscription came in, I've heard of a lot of customers jumping ship to one-off purchase products, like Affinity Photo and Paint Shop Pro. But some other customers are happy with it.

Also, while Creative Cloud is paid monthly, individual plans have a minimum term of 12 months. If you cancel, you have to pay out 50% of the contract term (unless you prepaid for the year, in which case there are no refunds).

http://www.adobe.com/legal/subscription-terms.html


In a sense, all software is service. After all, what you buy is a license, the right to use the software, you don't buy the software itself. I suppose the US doesn't view it this way, but it definitely varies from country to country.


Technically, that's correct, although it's a conceit that was foisted upon us by the software industry, partly as an artifact of copyright law and partly to gain more control over their product than others could gain.

As futile as it is, I'd rather see the concept go away than see it further enshrined in subscription services.


The concept won’t go away, because users expect updates. The times when software actually was finished are over.

If we don’t expect software to have any updates at all, we can go back to one time fees.


Why would the practice of updating a product eliminate the concept of ownership? We can own a product and get updates as part of a warranty, or a program to purchase new upgrades.

The expectation of updating also seems to be making the software industry even more slipshod and crappy, with a just ship it now ans fix it later attitude.

I'd much rather see properly designed products and software infrastructure, built right the first time, and delivered once. But that would require competence and forethought.


> Why would the practice of updating a product eliminate the concept of ownership?

Regular updates = expensive to develop = monthly or yearly fees = reduced/eliminated concept of ownership.

> The expectation of updating also seems to be making the software industry even more slipshod and crappy, with a just ship it now ans fix it later attitude.

Possibly. Or it's the other way around: Companies don't get all the money upfront and have to continue to deliver. Otherwise they get complains, bad ratings and customer churn.

> But that would require competence and forethought.

And much bigger wallets from consumers. I think most people wouldn't pay a few hundred bucks nowadays for an operating system, or for MS Office, or Photoshop. And surely not for most software these days: apps.

Don't get me wrong. Your points are valid, but I think the market has spoken. People want software that continues to work on the newest OS. They want new features, compatibility and whatnot.

Back in the day, you bought Office 2000 and knew it'd be outdated at some point in time. Your Windows XP Professional you bought for 200 bucks is worthless today. Try using Photoshop 5 – works on your XP machine I guess – but can't open files from other designers.

Software 10-15 years ago was delivered once. Updates were an annoyance. But the upfront cost could still be divided through the number of months you actually used the software – and for most software, it's probably not that much different from the monthly subscription or yearly license fee you have today.


I'll add on to that.. if I download desktop software, I expect an app and not a package file I need to install.. especially if said package only contains an app and nothing else. Strange. :-)


I am personally fine with a monthly fee for desktop software. If I'm paying for software, it's up to the person who sells the software to decide how I should pay, and for me to decide if it's worth it.


I am definitively not fine with that. Of course it's up to the person selling the software to decide on the pricing model, but it's also up to the customers to voice their opinion.

I refuse to buy software that is subscription based. The closest to this that I find OK is PHPStorm which has a hybrid model where if you pay for a year you get to keep the version you installed at the beginning of your subscription (or after a year of monthly payments have been made).

I don't expect companies to work on the software for free, if there is an upgrade worth upgrading I am prepared to pay for it.


How many software licences do you pay a month ? Because they add up, very, very quickly.


Hey HN, Sitebulb is a desktop crawler for Windows and Mac, specifically designed for SEO consultants/agencies. Me and my business partner have been building it for the last 2 years or so, and we're finally launching it today.

It's main differentiating factors: 1. Scale – it can comfortably crawl 500,000+ page websites despite being a desktop program. 2. Reporting – it does a lot of data manipulation and processing so you don't have to. 3. Visualization – it has tons of useful graphs, including the Crawl Maps, which help you visualise site structure.

Our aim was to give it the reporting capability of a SaaS crawler, with the convenience of a desktop crawler.

Looking forward to hearing your feedback on our new product. Thanks, HN community!


Some feedback:

- Why require the email confirmation before using the software? Not really necessary, is it?

- No Umlauts in project name?

- Standard/advanced settings switcher is confusing

- Crawl Maps is not linked from the "Product" dropdown

- "Recent audits" shows finished and queued ones, but not running ones (which also have no menu point)

- Super simple option to limit crawl to "internal" URLs would be nice (or did I miss it?)

- "Filtered URL Lists" is a strange navigation option, above the main selection especially

- Why no endless scrolling in tables? This is what a desktop app should do better than browsers

Nice tool!


Thanks for all the detail! Here you go:

- Email confirmation is required for the username/password, which is how free and trial licenses are controlled, and ultimately how paid licenses are doled out. So we need it for the licensing.

- No special characters at all! Excepts periods. Sorry!

- Agreed, we need to improve the settings switcher.

- Crawl Maps is not linked - you mean on the website right? I'll fix that.

- Running audits show on the main Dashboard, seemed kinda overkill to put it on Recent Audits as well. No?

- You can switch of 'Check external' in the Advanced Settings. Kinda 'hidden away' to keep the main settings UI cleaner (otherwise where does it end?!)

- "Filtered URL Lists" - they are there because people want them ('a big list of all the URLs') and kept missing them in our usability tests!

- Why no endless scrolling in tables? It's not easy to do because the data is written to disk, rather than stored in RAM (which is the reason it can typically crawl more pages), so it needs to go and fetch/filter/etc... every time.


All makes sense, thanks for the reply.

> Email confirmation is required for the username/password, which is how free and trial licenses are controlled, and ultimately how paid licenses are doled out. So we need it for the licensing.

If you are interested in getting more free users into the app to try it, I would suggest to rework the licencing stuff a bit to enable the usage without email, but at least without confirmation. Should be worth the effort, and you can still require login when swicthing from free/trial to paid.

> Crawl Maps is not linked - you mean on the website right? I'll fix that.

Yep, no link in the feature dropdown.

> Running audits show on the main Dashboard, seemed kinda overkill to put it on Recent Audits as well. No?

Maybe. I like structure, so was expecting it to be shown a level down from the Dashboard somewhere as well.

> You can switch of 'Check external' in the Advanced Settings. Kinda 'hidden away' to keep the main settings UI cleaner (otherwise where does it end?!)

Ok, I think I am biased because I usually use a tool that is "internal only" by default.

> - "Filtered URL Lists" - they are there because people want them ('a big list of all the URLs') and kept missing them in our usability tests!

Umm, ok. "Crawled URLs" maybe?

> Why no endless scrolling in tables? It's not easy to do because the data is written to disk, rather than stored in RAM (which is the reason it can typically crawl more pages), so it needs to go and fetch/filter/etc... every time.

If some websites can do it with a request to the server each time the next results are loaded, I am sure you can also do that with whatever local database you use ;)


The problem is for big crawls( and 500k is not large) you probably don't want to use your desktop for example my home adsl is only 3.5 as we are 6kyards from the exchange.

And I would not want to get my works 100Mb banned by google. This is where services like deep crawl come in to play I can set up my sites to be crawled at night and look at the reports in the morning.

And another problem I found is desktop crawlers are very resource hungry at one small agency we had two striped down dedicated machines just to run crawls as the risk of causing a crash was to high


Yeah for really big crawls your probably better off sticking it on a server or AWS, as much as anything so you don't need to leave your computer on for ages.

But Sitebulb is not resource hungry in the same was as other desktop crawlers. It saves to disk instead of using RAM, so you don't experience the same limitations.

I'm not sure what you mean about Google. There is no link between Sitebulb and Google - it doesn't visit Google at all, so there is no risk of banning. Using it on your 100 Mb work line would be ideal.


Interesting that it’s all a desktop app. What problem do you think this solves compared to something that runs in the cloud ? Apart from the cost structure, I can’t think of anything myself.


The other big thing, in comparison to cloud software, is convenience. You can setup a crawl and start it running -and see URLs being crawled - within a minute.

On cloud software that's simply not possible, due to the way that everything is scheduled.

There are a few other small things, such as being able to view Audits offline (what we call 'train mode').

The cost structure can be a big limiting factor though, especially for smaller companies. Sitebulb effectively remove all limitations around number of domains, number of projects, total number of URLs crawled etc...


> The other big thing, in comparison to cloud software, is convenience. You can setup a crawl and start it running -and see URLs being crawled - within a minute.

This depends on implementation. If the architecture is modern and well thought, using dynamic scaling or even AWS Lambda, the result should be available much faster on the cloud software due to ability to parallelization. You can only have so much network bandwith / CPU power locally and if you need to crawl hundreds of pages to get your result, it matters a lot.

Disclaimer: I'm building a SaaS tool for SEO which also involves page crawling.


> On cloud software that's simply not possible, due to the way that everything is scheduled.

As someone who works in cloud software this makes me cringe a little.

I have no doubt this is how existing cloud SEO crawlers work but with elastic scaling, web sockets, and serverless there is no reason why this has to be true.

It is not a limitation of cloud software. It is a sign of devs and/or product owners deciding making instant results is not a priority for the product.

Edit: I hear that a lot from industries that are not intimately familiar with web apps. "You can't do that on the cloud"... a typical web software engineer will not be able to do it but there are people out there who can. They are more expensive than your typical developer but if depending on your product they are worth it.


Sorry, maybe I misread, but I kind of read the comment as 'what separates this from other cloud products on the market?'

So I wasn't trying to argue what is and isn't possible with cloud architecture, simply what is and isn't possible with (our) cloud-based competitors.

The process is along the lines of: 'Click Start', get taken to a screen which says 'Initializing' or similar, then maybe 2-3 minutes later you'll see something start to happen. But there is little to no data on which URLs are actually being crawled.

Sitebulb, and desktop crawlers in general, has a much quicker feedback loop.


> Sitebulb, and desktop crawlers in general, has a much quicker feedback loop.

I wasn't denying that. I'm sure it does. I am confident this is way better than most (if not all) current cloud solutions.

I just think it is unfortunate because there is no technical limitation of the cloud that prevents it from being instant on the cloud as well.

The cloud can't handle spikes well (1,000 customers all unexpectedly try to scan at once) but if the load is predictable, linear, or easily done in parallel which I suspect it is for this use case than it is perfectly doable with no delay on the cloud.


Deepcrawl does that, but again it entirely depends on the website type and infastructure you have got.


The one advantage that I can think of is this can run on websites that are in development and not accessible to something running in the cloud. A lot of enterprise websites have their dev/stage behind a VPN and being able to run this against those without having to find out how to jump through hoops would be really nice (which is looks like this is capable of doing since you just need to feed it a URL). On top of that you also don't have to worry about what they're doing with the data output by the program on their server.


> On top of that you also don't have to worry about what they're doing with the data output by the program on their server.

Why would you care if the site is available in the Internet and you can't control who is browsing it?

This is a valid argument only for internal websites which are not a subject for SEO anyway.


But it can run on sites that will be public but are not yet deployed.


Yep exactly. All crawls are stored locally so there is no data issue to worry about.


We're nobody special. We're not a cool startup that's just secured funding, we're a bootstrapped, 2 man team and we've built both our products from the ground up.

Their honesty bought me! Nice going bros.


:) thanks

You might like these as well: https://sitebulb.com/release-notes/


Ditto on the sitebulb love. It's a well-done product and a welcome challenger to the status-quo of SEO software out there.

I will say the cost gives me hesitation but you've put a lot of work into it so I understand the justification.

For the visualization, does the crawl map limit the connections? I was expecting to see more of a web with pages linked from the entire site. Can you tell us more about that?

Thanks


Price is always a difficult one trying to get the cost/value balance right. We did some pricing sensitivity testing before launch so I'm hopeful we've not got it too wrong.

Regarding Crawl Maps, yeah it does have some limitations on, which I've written about here - https://sitebulb.com/resources/guides/crawl-maps-faqs/

Although from your comment I think you might be thinking it is a link map, rather than a crawl map. So with the Crawl Map it is mapping out how each URL/node was found when the crawler traversed the site. So each node will only ever have one edge/link.

A link map ends up a LOT more messy, although it's on our roadmap to try and build one of these too!


"Almost everything looks like a graph. Almost nothing should be drawn as one."

It's really hard to make sense of full link graph visualisations. I'm talking about this in an upcoming conference presentation. We should share notes :)


Absolutely. In development we tried different ways to make the Crawl Map also represent link data, and they were all just unintelligible. Even the Crawl Maps on big sites are hard to get your head around, and that's with Sitebulb sampling quite heavily.

I'd love for us to come up with some sort of solution for it, I just don't know how we'd do it!

SL presentation I assume?


Yep.

Will be sharing and writing about it too - still a bit of a work-in-progress. Hopefully it all comes together nicely!

I'm leaning towards comparisons between tables of data, metrics etc rather than visualisations for much of this.


Oh - I meant to say - that quote is from this book: http://shop.oreilly.com/product/9780596514556.do


Nice, thanks. Just ping me or Gareth if you want any of our input :)


What's this doing differently or better than screamingfrog? Which is also a desktop program and provides quite robust information. SF has been one of the standard industry tools for those doing SEO for years.


The main difference from Screaming Frog (which is legitimately awesome) is the reporting. Once it has finished crawling it will do a lot of pre-processing for you and build graphs, lists of hints, etc...

I've written a more comprehensive answer to this here: https://sitebulb.com/resources/guides/how-is-sitebulb-differ...


I'm testing Sitebulb right now (trial version). The crawling is kinda slow (I'm on 100mbits fiber). Why did you choose to build eveything from scratch instead of making an application that use the results from other crawlers/spiders (ie: Screaming Frog) and just produce the audit reports?

EDIT:

And after about 3hours of crawl, this is what I got (and no way to resume it):

>Audit Stopped! >The audit stopped early because: Maximum Crawl depth limit of 50 reached

>WARNING: Audit Paused ! The audit is incomplete and did not finish properly.


I left the program running in the background and it resumed itself after a few minutes. I have no idea why as there was no info on the dashboard.


SF is also a lot cheaper. With VAT, you're looking at £705.60 for sitebulb per year, vs £149.00 per year for SF. That's a really big price difference and you'd have to be really sure it's worth it.

In addition SF works on Ubuntu, which is another point in its favour.


I know right, SF is just too cheap for its own good! :p

We think it's a case of horses for courses. Sitebulb has the potential to save you a ton of time when auditing and reporting. If you don't do a lot of that then it might not be a good option for you. If you do, that's where a lot of the value lies.

There's a fully featured 2 week trial to give it a proper go, and the monthly billing means you have the option to switch it on/off as you need it.


Agreed. This seems like SF + Gephi for 7x the price.


We should put that on our homepage.


Have been using during the beta programme - really useful tool for doing site audits and focusing quickly on the areas that can make the most difference. Already using on client work and it is now a key tool for me alongside Screaming Frog and others. https://a.paddle.com/click?said=431&aaid=2812&link_id=380&ch... (my affiliate link)


Sorry, I meant to say, there's a free 14 day trial available to anyone once you download the software (no credit card required).


I keep trying to sign up, but it prompt me with: " you already have an account...", when in fact I don't.


Hey I'd appreciate if you could ping me your email address to support@sitebulb.com so we can see what's going wrong with your account (or lack thereof).


Hmm, I set it to crawl our little Ghost blog (e.g. blog.example.com) and it immediately jumped to spidering our main site (e.g. www.example.com). Now, I imagine if I had properly poked at the Advanced settings I could have limited it to the initial subdomain, but I would have expected that to be the default...


It should stick the subdomain you specify in the start URL, unless there is a redirect or something. Other subdomains won't be crawled, although it will HTTP status check links to subdomains. So possibly that is what you saw in the URL log on the crawl progress page?

If you want me to take a closer look send the subdomain over to support@sitebulb.com and I'll see what's going on.


Avast free WebShield on Win7 is giving me a FileRepMalware error and is aborting the download and connection.

I am not exactly an expert on Avast and this particular error but perhaps someone here would like to know about this.


That's frustrating, sorry. It's a 'reputation issue', that over-protective anti-virus software doles out to smaller software vendors like ourselves. Basically they don't know if it is good or bad because we haven't had millions of installations.

i.e. it's a false positive


Sure no worries..of course I personally knew that, but I thought it might be useful for you guys to know its being blocked by the AV.

It's a really nice product BTW...good job.


I've been testing Sitebulb for a few months now and I'm really impressed, you've done solid work guys :) good luck with the launch


Thank you! And thanks for the beta feedback, it was super helpful.


Nice to use and useful insights. Thanks!


How does this compare to something like Scrutiny, which seems to do a similar thing?


It has a lot more comprehensive reporting and data visualization than the likes of Scrutiny. I have no idea of the scale limitations of Scrutiny, but I'd be very surprised if it can handle ~500,000 URLs.

Also Sitebulb is for both Windows and Mac.


why is the windows download 130mb? that seems excessve


Yeah and that's compressed :) I'm in the process of getting it down, but there's a few things in there don't help its size. For instance, the latest versions of Electron and Phantom a reasonably big.


How does this compete against screaming frog?


Answered this one below already. For completeness:

The main difference from Screaming Frog (which is legitimately awesome) is the reporting. Once it has finished crawling it will do a lot of pre-processing for you and build graphs, lists of hints, etc... I've written a more comprehensive answer to this here:

https://sitebulb.com/resources/guides/how-is-sitebulb-differ...


thanks, i'll check it out


Linux version?


I'm afraid not, at least not yet. It's something we'll work on if the demand appears to be there.

Right now we are focused on other features that appear to be a higher priority to our users.


I'd also be interested




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: