1) Give me a native Linux clone of Arq Backup, complete with deduplication and client-side (E2E) encryption, with polished UI. I'm spoiled by Arq on Mac but my daily driver is Linux and I have Duplicity, Duplicati, rclone, etc.
It needs to support AWS S3/Glacier, GCP Nearline/Coldline, Backblaze B2, Google Drive, Dropbox and Box. Azure and OneDrive on the Microsoft would also be nice. Text and email alerts (Twilio API or local SMTP server), granular scheduling, frequent validation and dry run/budgeting features would be awesome.
Probably a business model like Arq's - you buy a license for a major point version instead of a SaaS subscription. I'm really looking for something that takes minutes to set up with sane, fast and secure defaults.
2) Give me a piece of software that automates the process of finding product recommendations online. For example, I really enjoy coffee. I frequently go to a subreddit relevant to the hobby and search through it to get qualified opinions. This is how I found my current bean grinder, French press, milk steamer, electric kettle, etc. This would also work well for running, watches, or other hobbies that include purchasing items.
I'm envisioning a website similar to Product Hunt or MassDrop, where users sign up and select their interests (Coffee, Running, whatever). Then you have an algorithm that uses the Reddit API to automatically map these user interests to specific subreddits, then classify, rank and sort product recommendations from the subreddit wiki and relevant threads. One step further: for each product once it's sorted, use NLP to automatically classify its most common positive and negative feedback. Then present this list to a user to automate lists of product suggestions in tandem with crowdsourced user reviews. Monetize the website with affiliate links, and eventually expand to Twitter.
I'd use that! If I had the time I'd work on it myself :)
3) Mailing lists! I subscribe to a bunch of cryptography, security-announce, tech newsletter and other mailing lists. Do for mailing lists what Slack did to IRC. Develop a platform for centralizing mailing lists, such that I can visit your website, sign up and subscribe to or unsubscribe from all of my mailing lists in one unified interface.
On the server side, what you'll do is automatically subscribe to and crawl every single mailing list you can find (mailing lists won't need to opt in), then return each mailing list in the web application frontend with robust caching and load balancing. Users can browse all mailing lists on the website without logging in and search all of them historically. If they want them delivered to email, they'll sign up and choose which ones they're interested in to subscribe. The value add for users is one location for list discovery, one feed for reading list subscriptions, one interface for searching across all lists (with advanced features, naturally), and one pleasant interface for unsubscribing from any mailing list with authentication that doesn't require email confirmation.
Once you've got this down, start adding new features the way Slack did for IRC. These features could add productivity to mailing list discussion; for example: VCS issues, bug tracking or pull requests could be integrated to pop up in a sidepane for threads. Then introduce a pricing structure. I suppose the ultimate goal would be an acquisition by a company like Slack.
4) I have terabytes and terabytes of data that I need to efficiently find insights in. All the tools exist for me to e.g. find correlations in timeseries, but the management and setup process is slow. Devise a way for me to rapidly test hypotheses in a framework designed specifically for this use case. On the storage side, kdb is the gold standard but it's nose bleed expensive. If you can develop a robust alternative, you can sell it for quite a lot. On the analysis side, I need to automate the process of normalizing data from disparate sources, across batch and stream processing, and load it into a backtesting harness. I need to know quickly if there is a link between seemingly unconnected data.
Ideally what I'd like is a way to store a massive amount of cleaned data from different formats and sources, take a slice of each one for a specific period and performantly run a correlation "fuzzer" that rapidly brute forces signals in unrelated data.