More

renegat0x0 · 2024-06-06T06:40:03

Normalization of abusive behavior is not OK! If one company abuses you for 10+ years, then it is not OK for other companies to abuse you!

If big data were not lucrative, they could not sell your data. If your data was not valuable Facebook and Google would immediately remove it from their servers without hesitation.

Mantra.

- I don't care about privacy

- I don't care big tech has access to all my data

- I don't care that Google has access all my politicians data

- I don't care I am building a worse future for society

- I don't care that I am being recorded while having sex in Tesla

- I don't care I am building surveillance state

- I don't care that my data are being sold to China, to India, to wherever highest bidder lives

- I don't care how my data are being used. I don't care if my data are being used to train military robot dogs that will be used for wars

- I don't care that I will not receive insurance because my medical data are sold wherever

- I don't care about privacy

tomalbrc · 2024-06-06T09:10:34

"I have nothing to hide"

IshKebab · 2024-06-06T09:36:11

Right but I think his point was that we accepted this decades ago, and this latest case hasn't changed anything at all. In other words the baseline hasn't shifted.

dbspin · 2024-06-06T10:49:18

'We' didn't accept it. The internet expanded and the mass media conversation moved on to the next shiny thing. The majority of geeks have vehemently opposed mass surveillance every time it's been revealed (Snowdon, Assange). The majority of normies are completely unaware of the extent to which companies like Palantir and Clearview AI sell their most personal information to 'law enforcement', and Microsoft, Google and Facebook own and manipulate their information to fuel advertising.

renegat0x0 · 2024-06-06T09:48:58

No, you are not correct. The amount of surveillance IS rising. At first search data was recorded, emails, then maps, social interactions, car sensors, smart speakers.

Now literally everything you do is captured, repackaged and sold. Corporations also are more creative when it comes to selling your data. They have more data, they sell more.

Every drama is adding more to the privacy nightmare.

Sorry, but you cannot say that "nope nothing has changed for privacy in last 10 years".

renegat0x0 · 2024-06-05T09:48:40

1) How would you be able to detect that Google quality is slipping away?

2) There was a German research that it degraded [1]. I do not remember by how many percents it got worse, but how would you be able to tell that search engine is worse by 8% year by year?

3) Internet just got worse. There are walled gardens everywhere. Access to data has been more restricted for people who do not want to create accounts. Internet is not so tough for people who do not care about their privacy. It is not necessarily that Google search became worse

4) There always been spam, but it just got worse

5) Corporations are becoming more greedy. I have read about SEOs complaining that what has been previously not allowed, now is. Malvertising is more common

6) If Google focuses on big media companies, then small sites do not have revenue. Small sites are closed, and we have Internet without traffic for personal sites. Internet is getting boring if search focuses on big media companies.

7) Google is focusing on 'content', not on 'quality' content. Therefore what is 'new' is more important. Therefore you will be less likely to find a good article from 2011

8) Personal sites are hard to quantify. How would you be able that small domain with one good article is reliable? You don't. That is why Google prefers vice, or bbc news. It is just easier. There are more investor money there, etc. etc.

9) DYI projects, or game formus are tricky. They may contain Nintendo stuff, they may contain ROM files, or other untrustworthy comments. It is easier for search engines to ditch such sites altogether.

10) I am running a web crawler that indexes domains [2]. Let me assign you a task. Find interesting domains about "Amiga" using google search. How many domains you will find? I have more than 200 domains. Sure "Amiga" is a niche keyword, however that makes me wonder if I am just as inexperienced with Google search, or if Google in fact is a Potemkin Village for folks living in Matrix.

Links:

[1] https://mashable.com/article/google-search-low-quality-resea...

[2] https://github.com/rumca-js/Internet-Places-Database

renegat0x0 · 2024-06-05T09:32:45

Yeah it works, but a button leading to it has been removed. It operates more like a hidden feature now.

renegat0x0 · 2024-06-01T20:25:44

1)

Most of web programs are slow because of ad business.

You need to be evaluated. Are you a robot? Which cohort you fit into? Which ads should it fed you?

They capture every bit of information about you, so you could be tracked more, or to sell your data.

Then they display what you want to see.

Corporations focus not on providing info you would like to see. Therefore X/meta/youtube will not have a good subscription UI / behavior.

The corporations focus on suggestion algorithms so they can spoon feed you with data they want to monetize for advertisers.

2)

Big corporate projects are built by thousands projects, and thousands abstractions tied together by a duct tape. Layers upon layers, built by engineers not happy with outcome, but engineers that met deadline.

gen220 · 2024-06-02T00:26:08

There are lots of technical reasons why web apps are slow, and this is a technical site so, as a community, the discourse tends to over-index on "react/electron/python/whatever is slow".

But these are the messy human + economic reasons why things are actually slow.

I'd add some nuance to (2), that even small corporate projects are often built by companies that are explicitly incentivized to build as quickly as possible, any other barometer of success goes out the window.

If you use software developed by companies or projects that are outside (1) and (2), you'll find it is actually pretty fast, subjectively.

renegat0x0 · 2024-06-01T10:20:53

Certainly yes. I have installed her one. Took me 15 minutes to install. Everything works.

renegat0x0 · 2024-06-01T08:40:08

Some pages do not require your services (already quite long)

https://httpscolonforwardslashforwardslashwwwdotzoltanbalazs...

https://aaaaaaaaaa.org/

http://thebestpageintheuniverse.net/

http://bettermotherfuckingwebsite.com/

ValdikSS · 2024-06-03T02:20:03

You can get @cccccccccccccccccccc.cc email on https://tempr.email/en/, press "agent" icon and paste this domain name.

maaaaattttt · 2024-06-02T20:01:04

There used to be http://twoyoutubevideosandamotherfuckingcrossfader.com/ too but it's now broken (it loads, but the players don't).

ilikeitdark · 2024-06-02T20:27:12

That was one of the coolest sites ever, the other one being where you could make virtual mixtapes and send them to people. We can't have nice stuff anymore....

magicalhippo · 2024-06-02T20:28:54

> https://aaaaaaaaaa.org/

Slightly disappointed that doesn't have a 10hr version of https://www.youtube.com/watch?v=dys8KUnwGGg

teddyh · 2024-06-02T21:18:48

Or a link to <https://en.uncyclopedia.co/wiki/AAAAAAAAA!>

renegat0x0 · 2024-05-31T13:39:25

Will Firefox and its derivatives follow? Is it already official? I would be interested in transition to waterfox if Firefox ditches manifest v2

renegat0x0 · 2024-05-31T04:30:27

No it doesn't

https://www.theverge.com/2024/1/5/24026433/mickey-mouse-stea...

renegat0x0 · 2024-05-30T06:43:49

Very cool thing!

If I was to add something it looks really brutalist. On one hand, the buttons. Some parts looks news - like the menu button. It dooes't look good.

Secondly there is no about link. The page does not provide description. Github project provides.

I would also provide binary blobs of data as a release. We should provide not only open source, but also open data?

aabbcc1241 · 2024-06-01T15:25:22

Thanks for the suggestions, I will add about page, and also provide JSON API.

renegat0x0 · 2024-05-28T10:28:21

The Internet is dying. Everything shifts from standards toward managed corporate walled gardens. There is no place for RSS in the future. This is my personal opinion.

I use only RSS. That is how I obtain new information. I do not know personally anyone else doing that. I ping sources every hour, but I ping at least 400 sources.

From my sources none provided last-modified in headers: reddit, youtube, personal sites (In ff f12, network). My site supports it. It is nice also that you support it, but I doubt there is any impact of that in real world. Most of the attention goes through tiktok,youtube videos, through chrome browser. Either it is supported now, or it doesn't really matter if 40 dudes makes request every minute or so.

We should also provide clean title, description, in open graph protocol meta data, and yet not everybody does that.

We should also return correct HTTP status codes, and yet not everybody does that.

I am disenchanted with current state of the Internet, or maybe it was always a little bit pile of various things/garbage.

0x445442 · 2024-05-28T12:22:14

+1 for shared sentiments. Do you have a blog post(s) that explains your setup with the processes you’ve described?

renegat0x0 · 2024-05-28T13:38:26

I do not have any blog entry worth sharing. I am running an "ethic web scraper". I think I cannot speak about processes. I just may be lacking knowledge. I think it is more about "experience" rather than process.

Web crawling core is in file: https://github.com/rumca-js/Django-link-archive/blob/main/rs...

Some things more project specific are in https://github.com/rumca-js/Django-link-archive/blob/main/rs...

I know that there already are spiders, metadata processing packages for python, but I like having control over the process.

Old man yelling at the cloud. I hate also:

- blocking me with 403 because my user agent is not "mainstream". Why do I have to use chrome undetected to read some RSS feeds? Why can't I use third party clients? Contents can have adverts. I just want my own layout, buttons

- RSS feeds protected with cloudflare, so tools cannot read feeds easily

- not using, or outright blocking RSS functionality in wordpress. Some sites could be more open that way, but no. RSS feeds are closed/removed

- some sites have "/blog" location, but the main domain is empty, or nearly empty, or returns 404. Can I trust such location?

- when HTML meta data are not available. I like YouTube. It allows me to scrape metadata, but it protects video contents, and that is good

- weird redirects. Domain does not have any contents. Does not describe what it is. It just have javascript redirects. From main domain to some weird locations within the domain

- url shorteners, vanity links. You do not know where you will be transported. I understand they are counting sheep, but they sacrifice my security

- google returning links with syntax "https://www.google.com/url", not directly. Youtube does the same with syntax "https://www.youtube.com/redirect". For me again this is vulnerability

My ethic web scraper results are placed in: https://github.com/rumca-js/Internet-Places-Database.

0x445442 · 2024-05-29T11:47:18

Thanks for the info.