Hacker News new | past | comments | ask | show | jobs | submit login

Looking at how many "https://url.com//feed.xml" there are in the list, I have a feeling that the scraping logic needs some work. Is it just concatenating "https://url.com/" and "/feed.xml"?



I was extracting URL from the alternate link in the HEAD of the blog website. The issue is that some people will do "//example.com/feed.rss", some "feed.rss" or "/feed.rss". So I built pretty stupid URL resolving logic, when I just should have used ResolveReference from go runtime. So the last list will fix that.


It's also not a good idea. On https://andinfinity.eu/ I have a header for that and it clearly indicates that the rss feed lives at index.html. Standard hugo thing I guess.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: