I backed on IndieGoGo (in late 2018 I think), then helped crowdfund on StartEngine. I want so badly to speak well of these guys, and since I invested I want them to succeed, but it's been getting harder to do so.
They deserve credit for being reasonably transparent about their processes though. Their blog has been very interesting to follow. They had a ton of issues trying to work out their original design, then the pandemic hit. So in a way, timing has been bad. In the meantime they spent a lot of time on the software side, then on the fundraising side, then deck shuffling at the top, and now they're finally shipping something that looks nothing like what the Kickstarter showed. Instead of being close to the $200 price point they wanted to be at, it's currently $349, then will go to $500. They're at a point where if they don't get sales at those prices, they're not going to be able to deliver to backers; at least that's how they framed it when they discussed rollout.
They are actually shipping, though! Which didn't look like a guarantee for awhile. However, the reviews have been a bit lukewarm; [this][1] being an example. They had a way to make skills and they apparently changed it in order to better accommodate their final product, so it seems like there's been a bit of fracturing of the ecosystem as a result.
In conclusion: I'd love for you to support them! I still believe this segment needs a player like Mycroft, Mycroft can be that player, and I really want to not have my investments go to waste. But I 100% would not blame you if you looked at the company's arc and said "no thanks".
Those prices are really insane tbh. No way it will take off like that, it's just a non-starter. 200 was already the upper limit of what's doable.
I think they're stretching too much to satisfy the original backers and it's commendable not giving up on them but if it's going to be similar to Alexa, Siri or Google it's just not good enough.
If it were an actual assistant I could talk to, then yes. It would be worth it.
Imagine this.. I'm doing my laundry and mycroft pipes up.
"hey Alice is looking for you on telegram"
"Tell her I'll get back to her after I finish the laundry"
"Ok!"
...
"She says it's urgent, are you sure"?
"Ok call her please on speaker"
Or another scenario.
"hey mycroft I'm going out to the zoo"
"Ok make sure you bring an umbrella because it's going to rain in 2 hours "
Stuff like this. Right now assistants have zero short term memory, don't remember any of my preferences and can only understand one thing at a time. They're also not proactive at all. They don't know my life and habits and don't warn me when things are happening that I should know about. Yet most of those things are easily identified from notifications on my phone! It's not a stretch to expect this IMO. It's all thing I'd expect from a real assistant. All this low-hanging fruit turn on the bedroom light stuff is not worth money.
It's just that there's not much sales to link to it other than the service price (which I'd definitely pay for!!).
I don't think these scenarios are too far-fetched with the current state of AI tbh.
Ps blaming the pandemic is a bit rich. Their project was already down the drain for years before that. First they canned the original plan and then they had this DIY raspberry addon board and that was in huge trouble well before the pandemic. I'm sure it made matters worse but if they'd managed it properly it would all have been fulfilled years before corona meant anything other than beer.
I'm not sure that price is going to be a blocker. The audience may not think of price as the main thing here, they may think of openness and non-vendor lockin and privacy and security as pretty amazing things for that price.
I myself am getting tired AF of Google Assistant and their devices. So so tired of saying "Hey Google". And I have a Lenovo device with Google on it that was decent, the Google pushed an update to it and all video calls have an echo on them now making it entirely unusable. There are multiple threads on the internet about it too and Lenovo support says "contact Google" and Google says nothing on those threads, zilch. I don't trust Google devices anymore, they get abandoned routinely. I wish they didn't but I feel more empowered with an open device than I do with closed source, no public response Google stuff.
I'm definitely curious about trying a Mycroft now, and I think there may be others too. It may not be the masses but it might be enough to keep the project thriving.
I looked into them a few weeks ago because I am tired of the "Did you know.." and "Can I add that to your cart?" from Alexa. When I saw the device I was really disappointed. I can just never see getting one - it doesn't need a screen (cause it should be hidden) and why did they make it look like a 1960s sci-fi movie thing? IMHO it looks terrible.
The original design was great, but it just didn't work. They had too many issues trying to source hardware and decided to pivot to more off-the-shelf components.
That's the short version; here are some highlighted blog posts that document the trials and tribulations:
For what it's worth I'm intrigued by a screen. Theoretically it shouldn't be needed but if they want to have a reference device that can be used in multiple industries, it's better to have it than to not have it.
But that series of posts is why I'm still cheering for Mycroft despite everything: Clearly this stuff is hard, they've been out there trying hard, taking lumps, fighting off patent trolls, putting in the work. If they don't succeed, I'm not sure who else will pick up the reins and do any better.
Some of the comments below are part of the explanation why. It doesn't work as well as people were hoping, and it's a solution in search of a problem with limited application and it seems little monetization. The above article sums it up better from the big tech company's perspective.
Voice assistants which are trying to force engagement to squeeze money out of you are dying.
Most people only use the voice assistants for a few simple tasks, which is perfect for an open source project like mycroft. It is, however, very, very bad for Amazon and Google, because those tasks don't make them money. That's why they're all going so aggressive on "you asked for the time, but by the way here's a 5 minute speech on all the easily monetizable tasks I can do instead"
People like the idea of voice assistants, but by and large they don't like all the problems associated with a voice assistant run by Amazon, Google, and Microsoft.
Yup - I think this is the truth. I'm willing to spend several hundred dollars right this second for a simple voice assistant for things like weather, time, timers, unit conversion, alarms, and home assistant control (mainly lights).
I've actually pre-ordered the Mycroft Mark 2, although no chance to evaluate it yet.
I'm very interested in devices that can do this locally.
I'm not interested in Alexa/Google home anymore AT ALL - I've gone that route, they both work, but they want my dollars all the time, and it's become increasingly clear that if they can't get me making purchases through those devices - they will kill them off, or become ever more scummy in the attempt (Alexa is now including ads in the "did you know" section - "did you know" was already a fucking terrible decision to include, since it's going to marginally increase interaction at the expense of huge user dissatisfaction. But putting ads there has made me leave.)
So basically - I think if anything, we're seeing a speed run of 90s/2000s tech company boom/bust. A huge amount of money poured into the space with no real idea of how to sustainably profit, but the space itself doesn't feel like it's going anywhere.
It's really, really compelling to allow voice control in all sorts of interactions - but it needs to be very clearly working with me, and not trying to subvert my intent for profit. That might even mean it needs to fall back to something like "if this command, then that action" style usage. No more changing commands, no more bullshit ads, no more subversion of what I'm asking it to do.
It needs to obey me, not google or amazon. Otherwise it's a sales rep and not a digital assistant.
I got an echo (alexa) for free and use it for home assistant. It only works when I have an internet connection. So when my internet is out, I cannot turn my lights on/off with it. I understand why, but i too would REALLY like to just have all functionality dependencies for home automation to be local.
I use Mycroft with the home assistant vm running on Proxmox. I’m surprised how easily they integrate. And when the internet goes out I can locally control things from a laptop.
I'll have to check out the specifics - but your full setup does not seem like an improvement for me. If alexa is down, I can open up the app on my phone for each smart device manufacturer and manually control things that way. They only need my local wifi network to be functional, internet access not needed.
Can a raspberry pi handle the server functionality I wonder?
I worked on a project in 2016 that told me all I needed to know about the space. It was an online voice assistant and I couldn't find myself wanting to interact with it. Even though I spent a lot of time on the project, I scrapped it, because it was just lame. It looked kind of cool, but was lame.
I personally don't think there is enough cybernetica to control with voice. At some point there may be, but right now, the internet is just one giant consumption stream with a few searches and purchases now and then.
That digital daemon experience taught me that I care more about physical intelligence than verbal intelligence when it comes to my technology. I'm verbally intelligent myself, I don't need an AI who can't even speak correctly, let alone understand me, be my verbal interface to the world.
Honestly - I don't think anyone really wants an "online voice assistant". The key problem word there is "online".
There is no way that a device can meaningfully parse information from the internet and present it to you right now. At some point? Sure maybe. But it's definitely not now.
What I want, and what I was pretty clear about in my list of use cases is the ability to push a button (or run a function with parameters) with my voice.
- Current weather (optional, I don't use this a ton and it requires an upstream source, although HomeAssistant already gets that info for me): WhatWeatherIsIt(atTime?)
---
I've found basically all of the voice assistants I interact with are actually really damn good about understanding roughly those patterns (Alexa was the best, but only for the first year or so [honestly - it was actually wonderful as a beta product] - it's gone markedly downhill over the last several years as they try to cram in more detection and more features).
They just insist on trying to sell me on other parts of the experience (check out this tv show, there's a sale on, use this product, did you know this? did you know that? etc). And I don't want it.
But I'm more than happy to pay a fair sum of money for a thing that will just reliably do those commands.
It's incredibly liberating to be able to do those things with my hands busy, or my eyes closed, or while lying down.
Tack on an HDMI port or a display so I can view a recipe and I will literally give you money right now for this thing (and I have - since I've tried most commercial voice assistants).
Long term - I'll probably end up just cobbling together my own version using Rhasspy/Mycroft or another text parser/STT engine, and HomeAssistant or OpenHab if I can't get what I want commercially. I just know that I'll end up spending more money and time on it that way (which is not the end of the world, I just have other hobbies at the moment, and a very young child).
> Most people only use the voice assistants for a few simple tasks, which is perfect for an open source project like mycroft.
I'll certainly grant that... but the price point where Mycroft is, is certainly not near what I'd pay for doing those few simple tasks.
Apple is at the upper end of what I'd spend for such a device (the HomePod mini is $99) - and that's because I'm fairly invested into the Apple ecosystem and thus it can make use of the iTunes library, home automation, calendar items, etc...
If I wasn't invested in Apple, then none of the home assistants other than Amazon (because of the price point for the echo) would be particularly interesting.
I've got a echo show - because its a very nice simple clock/weather interface (that's got Alexa behind it) too (I really liked the Ambient 7 day weather clock when it was available). I've got an echo wall clock that is paired with the echo in the kitchen - it makes timers nicely visible (a sibling of mine has an echo wall clock because its an analog dial that doesn't have any sound with it).
The problems with Alexa of suggesting by the way ("Alexa, stop by the way" - give it a try and yes, it is routineable) are tolerable for how much I'm paying for them and the functionality that I use it for.
The Ars article on Alexa's financial crash-and-burn inside Amazon missed a lot of the reason people aren't willing to engage with Alexa as much as they could or would, if things were different. First, the privacy aspects are significant. Secondly, the value proposition is just not there - worse, Amazon has deliberately broken one of the most useful things you could do with Echo products: using them for distributed networked audio, a la Sonos: The new generation Echo Show products ELIMINATED the audio output jack, so you can't even plug the output into a stereo or speaker now!
On top of that, the Echo products are just not well built, not well thought out, and have NOT been upgraded to make them better: They update, but with NO visible benefit to the owner. One example: The Echo Show 8 Cannot and will not keep its display off all night, even if you explicitly command "display off" before going to bed (yes, it does understand and temporarily obey this command!) But sometime during the night, something will wake it up, and the damn thing turns into a lighthouse in your bedroom, waking one of us up.
I'd really like to find Alexa more useful, but like most folks I know with one, it's mostly just useful as a glorified voice-controlled radio - I'd use it more to control lights and such, if I could get the damn thing to actually realize waht lights are in what rooms, and that dimmer switches and smart lights can indeed share a location that should be controlled together. (Yes, this is supposed to work, but it doesn't...)
I would pay $500 to outfit the house with a central voice recognition processor that would be capable of supporting a dozen or so very secure listeners on the local LAN. Mycroft isn't that solution.
> The problems with Alexa of suggesting by the way ("Alexa, stop by the way" - give it a try and yes, it is routineable) are tolerable for how much I'm paying for them and the functionality that I use it for.
So cold comfort since it’s annoying as hell, but it slowly learns you don’t like it and will back off its frequency. Amazon unsurprisingly tracks “dissatisfaction” responses and adapts rate of things (globally and individually) so you do actually have to cuss out Alexa to change it. It’s slow because obviously it’s profitable but it does happen.
> That's why they're all going so aggressive on "you asked for the time, but by the way here's a 5 minute speech on all the easily monetizable tasks I can do instead"
This is a word-for-word description of how Siri originally functioned. "You asked for the top 5 romantic resturaunts nearby; here are the top results from Google Search:"
GP isn't talking about bad fallback answers where it punts you to a search page more like when you ask "Hey Alexa, what is the time" and it says "The time is 5:45 PM. By the way did you know you can buy ribbons for the holidays on Amazon by saying..." i.e. things that are blatantly unrelated to answering your question and often trying to sell you something.
It’s annoying because I kind of understand advertising the capabilities- but for Siri for example I cannot find a documented list of all the commands it can “understand” and so I can’t learn how best to use it. I just have to guess and hope I get close enough.
I don't think the entire space is dying. Amazon is having problems because Alexa has little benefit to them outside of direct monetization. They wanted people to use Alexa to buy things and no one wants to shop like that. So they have these devices and all this infrastructure to run people's kitchen timers, lights and play music. People will buy Alexa devices on Prime Day, use them dozens of times a day for years, and never make a dime for Amazon.
Apple isn't necessarily in the same boat. Siri isn't particularly good, but it does all those things well. Most importantly, it keeps people on iPhones and in the Apple ecosystem, which does make money.
You already have the phone and probably have a device that works with HomeKit so why not try it out. Next, you buy some new lights. Before you know it, you're controlling most of the lights in your house, streaming Apple Music and setting kitchen timers from your Apple Watch. Next time you need a new phone, you're not even going to think about anything else because if you change you won't be able to turn on your lights anymore.
This makes no sense to me. Apple's plan works because... their lock-in is better? I can "control most of the lights in my house, stream Apple music and set kitchen timers" with Alexa, Google Assistant, Cortana and even Bixby. What is Apple's actual advantage here? How is Apple making money from this when Amazon does not?
That only tells me which one will exist longer, not that "Apple's plan works". I've genuinely seen zero people deliberately use Siri (on iPhone or Mac) over the past 5 years. Apple is certainly losing money on Siri too.
If you look at Siri in a vacuum, you're almost certainly correct that it isn't a moneymaker. But Siri isn't in a vacuum. It comes with a device with an average cost of like $1000. It may not necessarily be widely used, but the ones who do use it are highly likely to remain in the Apple ecosystem and use other products and services that are highly profitable.
Look at all those Korean novelas and shovelware on Netflix. There are cohorts of subscribers who remain highly loyal because they're into it. Netflix isn't necessarily swinging for the fences with high brow, popular content that competes with the best studios in the world. Instead, they pump out a wide variety of content that keeps the maximum number of people subscribed.
Apple is similar. They promote features - whether it's Siri, health, privacy, family sharing/controls, etc. - that will strongly appeal to some cohort and keep them on the platform. Then, they incrementally hook you into services until you're buying $1000 devices for the whole family and paying $30/mo for the services bundle. And once you're there, they have you because the switching cost involves turning your digital life upside down.
I was one of those zero people until i realized i can tell Siri "add an appointment with Blahblah next tuesday at 11", it will actually understand that and it takes less time than using the calendar interface.
I'm sure enough people find some small use for the voice commands that it's a good feature to have on the phones.
Apples devices are smarter. This makes them cost more for the "same" hardware, but costs less for the computation.
Apple isn't trying to make money with Siri. It's using Siri to make its ecosystem of Apple Music and similar more valuable to its customers.
The limits that Apple puts on what it can do makes that cloud side computation less expensive.
---
Consider that bit - less expensive. Apple doesn't run its own cloud in the way that Google, Amazon, or Microsoft do. So what does Alexa cost? It costs for AWS cloud time. That's the expense that it's running. Those skills that people use run on AWS compute time rather than a phone's local cpu and battery.
A google search "costs" about 1 KJ of energy. Alexa has similar costs somewhere just for energy and other costs for the maintenance of the additional software and content. It costs something to maintain that joke database.
There is more processing done in the Siri local device than there is in the Alexa local device.
It's not "Siri is smarter" but rather "Apple is working on minimizing cloud costs because it is entirely a cost for them."
Siri is scripted and limited to make it locally "smarter".
With Amazon and Alexa, everything goes to AWS because that's where the entirety of Alexa's processing happens. The hardware devices in the homes are "dumb" terminals for AWS with a voice interface. This allows Amazon to make use of AWS as much as it can and do something with "surplus" computing power that it has available on AWS.
Amazon has been working on on-device voice for a while. Actually everyone is trying to do that. Running large speech models in the cloud is expensive, considering the number of devices, they probably need more than "surplus" :)
> In iOS 15, Apple moved all Siri speech processing and personalization onto your device, making the virtual assistant more secure and faster at processing requests. This also means Siri can now handle a range of requests entirely offline.
> Once you're using iOS 15, you don't need to enable anything for Siri to work offline. The types of requests that it can handle without phoning home to Apple's servers include the following:
Create and disable timers and alarms.
Launch apps.
Control Apple Music and Podcasts audio playback.
Control system settings including accessibility features, volume, Low Power mode, Airplane mode, and so on.
I didnt disclose as I was not gonna promote anything, but I work for a startup specializing in on-device voice recognition. I am 100% biased towards on-device voice processing :)
i just wanted to share my 2 cents, as it's not unique to Apple and the cloud can be costly even if you own it. Big tech has been investing in on-device for a while. besides voice commands, apple and google do transcription locally too. because now you can have local speech to text with cloud level accuracy and of all the reasons you shared - cost, privacy, latency etc. (but again, i'm biased)
I don't know if it's device specific but I use "Hey Siri, turn on the overhead lights and the floor lamp" for example all the time. I have a HomePod though.
Weird, I have a homepod (mini) also. In fact I hardly have any Apple stuff anymore, I only got homepods because it was the most privacy-friendly option out of the big three. I just have an iPad from work which I used to set them up. And most of my automation goes through Home Assistant anyway.
When I give a double command Siri literally tells me she can't do two things at the same time and I have to present them as two separate commands.
Perhaps it's because I have it set to UK English? Perhaps it's smarter in US English. I'll have to try that.
I think the point is that apple produces software that doesn't directly make them money as part of their business model. Pages, Numbers, iMovie, Maps are all given away for “free” with devices. Siri is just another example of that. As you say, being able to talk to your phone is table stakes, not an advantage. But having that offering keeps sales of Apple hardware moving.
Not necessarily because the cost increases through increased use by the customer. The idea was to sell hardware at close to cost and then monetize the customer. The likely assumption was that Alexa users would buy more similar to how Prime customers buy more. Ideally, Alexa customers were supposed to use services like "subscribe and save" and then randomly tell Alexa to order more toilet paper or laundry detergent. Amazon wanted all the household stuff on recurring subscriptions. It would have been great. Increased revenue, better ability to bundle shipments together to cut costs, customers who don't even look at prices anymore. Instead, they sell a device for which they incur an operating cost while producing little to zero increased revenue and subscriptions.
Right, I read the Ars Technica article, I get the original idea. It turns out that monetization strategy doesn't work and the Alexa business unit is losing billions of dollars.
I'm saying, new business idea: give up on the old strategy of Alexa driving induced profit in other business units, instead charge more than the breakeven cost of building devices and running the service. This will likely be substantially more, and fewer Alexas will be sold. But the users that do actually get a lot of value from the device and service they are using will pay more for it.
The product/market fit to test is: How many customers would pay more for a device that's not trying to sell you things? Can you get a solid (but smaller) business just by charging more for the device? What features would you add to persuade the marginal user to pay more? Offline mode for privacy-conscious users? Lean in to home automation features? What about a true AI-powered personal assistant? What about per-user language training with a local model that gets refined by your voice samples, with that data not shared back to the cloud? Etc.
Simple startup-style product iteration stuff here. You had a customer hypothesis and a growth model, and it was proven unviable. So can you pivot to find a viable business?
The "voice assistant space" also includes Siri, Google Assistant, and Cortana, so it's not going anywhere.
I'd contend that it's absolutely not a solution in search of a problem; it's much more of an unsolved problem, and a big part of the "why" is
- voice recognition/assistance tech still maturing
- major players are insisting that the tech supports their walled gardens
- price points are still a problem
The last two creates a conundrum: a lot of times tech prices come down by selling expensive stuff to rich people until the hardware becomes commoditized. But for a good voice assistant, you need a lot of up-front investment at scale. Unfortunately, the companies that are able to do this are also controlling the hardware that can use it, which limits its ability to spread and be useful.
This is why I think Mycroft is important to support:
1. If you can make voice assistant software open-source and plug-and-play, then it frees people up to tinker with form factors
2. Part of Mycroft's pitch to businesses is that they can make custom solutions. There are probably a thousand big businesses that might want to get into this space but don't want to rely on Amazon because they want to control the experience and not give up their data. Maybe Target wants to stick virtual assistants around their stores, or maybe a hospital wants to give tools for surgeons.
I also think there's an opportunity for voice control in home stereo, where someone decouples the speakers from everything else. It's still annoying to work with Bluetooth in 2022, and Sonos is still pricey, and another walled garden. I'd love to have a simple controller that connects a dumb speaker to Wi-Fi and lets me voice-control it to play music from a library of my choosing. That's not a thing yet, right?
> 1. If you can make voice assistant software open-source and plug-and-play, then it frees people up to tinker with form factors
For what it's worth, Google Assistant does have an open API to create new devices. It's not open source, but you can certainly experiment with your own custom form factors. There's even a tutorial:
Really? An Echo Dot is $25 on Amazon right now. Which, if you use it at all, is pretty reasonable. (To be sure, if I were using it for music to any degree, I'd probably get a model with better speakers.)
For music, I have an old phone connected to a stereo receiver. So it has voice control although I mostly pick a playlist or album manually.
That's a pretty weak orbit though if that's all I'm using it for. (I do use mine for music sometimes but it's actually connected to Apple Music.) I could switch to a different assistant tomorrow if I wanted to. I've literally never ordered anything by voice--and can't really see doing so.
It's not dying at all - it's an incredibly useful interaction style.
Those companies are failing to profit because they don't understand that a digital assistant needs to be working with me, locally, and not subverting my intent.
It just needs to be my device, and not a sales rep for google/amazon. I use voice controls all the time at home - it's astoundingly useful in all sorts of situations, and I'm not even disabled (where it's literally life changing in some cases).
A truly open platform stands a chance in the voice assistant space, as it could be adapted into forms that are useful beyond their current limited designs. Such useful forms probably are not as monetizeable as the current incarnations that invasively collect information about you and your family, so I very much doubt the big tech players will ever attempt to build these useful systems directly.
Unfortunately, Mycroft is not very open itself. Sure, most of the code is open and available, but I tried to contribute and found my PRs ignored for weeks. When they were finally ready to merge them, their poor response cause me to lose interest in the project. At that time, they did not seem interested in cultivating a strong developer community around their core technology components; they were doing their thing, and they wanted the community to implement “skills”. I got the impression that community could either get on board or stand aside and watch them work. For that reason alone, I feel fairly certain that this project will fail eventually as well, and their hardware will become yet another high-tech relic of a paperweight.
As a formerly enthusiastic kickstarter backer, I cannot recommend the Mycroft project as the basis for a product; you don’t own and can’t control the platform on any meaningful way (short of forking it). It might be a better choice than a closed platform, but not enough to make me want to put any money in it.
Why do you need to control it yourself in order for it to be valuable as an open-source project? Maybe the team has a specific vision for the product, and reading through PRs from random people online takes away from their limited resources.
Ours is a voice-driven music player for our kids. They love it. We have a YouTube Premium subscription just because of the Nest Minis we have in every room. Sometimes we ask it "What's the animal of the day?" or "Tell a story" or "What year was Abraham Lincoln born?" but mostly the Nest Hub Maxes we have are just photo slideshows, which we love, and sometimes we ask "What's the weather?"
That set of functionality alone makes them well worth the money for us.
Maybe you meant it like, "the voice assistant space isn't going to generate huge profits, and thus giant corporations will lose interest".
But even that is absurd. They will still have to do it as a loss leader. Maybe not Amazon — because they just ship us our toilet paper and protein bars and shit. They don't have an "ecosystem" (although they gave it a halfhearted try a few times).
But the chance that in 2032 people just like... don't have voice assistants? It's literally zero, barring an actual WWIII cataclysm reversion-to-barbarism event.
> doesn't work as well as people were hoping
Nothing does, until it does...
> little monetization
Yep, that might be right. But it doesn't necessarily mean the space is "dying". Just that it might not be amenable to oligopolization.
I will never have a voice assistant unless a completely open and self-hosted solution appears on the market. And with current patent landscape, that seems incredibly unlikely to happen before 2032.
OK, I can't keep reading this website any more tonight, but for fuck's sake you do realize that the submission you are commenting on is a completely open and self-hosted solution that is on the market, right?
Maybe one of the reasons behind this is that people use voice assistants, and search engines too in general, to look for information. Today, all that these products do is suggest instead of catering results. I believe this is one of the reasons people, or at least I, do not wish to use assistants. It feels that a computer is controlling my likes, dislikes and wishes while it should actually be me who controls computers.
Its certainly not a solution looking for a problem. Its a great way to deal with a number of minor daily tasks. Checking the time, setting timers, checking the weather/AQI, playing music, checking news headlines, etc.
Theres a lot of things its really not good for and people have tried them all I'm sure. But where it fails the hardest is being able to increase sales volume for Amazon or increase ad revenue for Google - the only path to monetization seems to be to force it in - and THAT is what is dying.
I always wanted a voice assistant but there's no way I'm having big tech listen in on me and my family 24/7 just to have one. Most non-tech family members I talked with about this share my opinion. THAT is why these assistants are failing.
On the other hand, Mycrodt sounds like something people would actually want to use provided that it can operate locally and doesn't send any data outside the home.
It works a lot better than nothing. I use Siri and Alexa every day. If Alexa goes away, I’ll use Siri more, or find another. Siri was a little slow to catch up.
I think the story that you read simply says that it’s hard to monetize. You are inferring more than what the story says.
Voice assistants are here to stay.
I eagerly await the day when I can simply say respond to this post then begin writing with my voice.
I want a voice assistant that passes my AI turing test if you will. I want it open sourced like Mycroft too though. I don't care for having 17 speakers that start talking when they think I was talking to them. I wasn't.
My needs for a smart speaker are not really passing a turing test. I want them to be automations. They need to get some things that an AI would do right, but there is a large step between an assistant that can do specific things and an AI that can talk about anything.
Would love to read about experiences actually using this (I mean Mycroft in general) — good, bad, or otherwise.
Also, though: why don't we have "text assistants"? Seems to me the process of deciphering spoken text is (or should be) entirely orthogonal to performing the actual task — changing the lighting, cranking up the AC/heat, arming the security perimeter, or whatever.
I think the reason is that voice recognition is hard and so far only the "BIGASS TECH!!!" corporations have been able to make it "mom or granny ready" — and they have no incentive to do that for free and let us make our own mash ups. They want to wall us into their ecosystems.
So from that standpoint, this looks pretty cool to me — even if the voice recognition isn't as good as the big three.
OTOH, to rebut my own point: I got the new Apple Watch Ultra and I noticed that I can map the side button to a "shortcut" (the Apple term for a script you create yourself to automate something) that just transcribes whatever I say, and sends it as text over SSH to any host I want. On my local LAN, the delivery time is well under 1000ms.
So that's getting pretty close to being able to use Siri as a generic voice recognizer, and then piping the input into whatever arbitrary/homebrew system I want.
To do it purely with voice though you have to be like "Hey Siri, do the funky chicken" (after naming the shortcut "do the funky chicken"). And then say the actual command phrase you want your home automation to do.
I played with Mycroft about two years ago. I had been using a couple Google home minis for a while for the usual things (play spotify, set timers, ask the weather, control lights around the hose). They worked perfectly for that. At the time I decided to de-Google my life and take back my privacy so I went looking for something open source that would provide me more control of my data. I found Mycroft and played with it for a few months.
I was pretty excited about it. I bought a ReSpeaker 2.0, which is an embedded device that can run Linux and has a six microphone array. I designed a custom 3d-printed case to hold the ReSpeaker and a small speaker to make my own little "Jarvis" box (Iron-man reference).
My favorite part about the whole thing was the customization. I wrote a couple of skills to do some other things for me. For example, I could say "Where can I watch X?" and it would use an API to search for a TV show or movie to see where it was available on Netflix, Amazon Prime, Disney+, etc and let me know. It's always been annoying to go Google and try to figure out where I can watch something streaming online, but limited to only the services I currently subscribe to. I wrote another skill that tied into my couchpotato instance so I could say "Download the movie X" and it would go find it and download it. If it found multiple matches, it would read off the top few matches and let me choose the correct one. I even tied those skills together so if the first skill couldn't find a movie at one of my streaming services it would ask if I wanted to download it and I could simply say "yes". I also modified the code to use a custom text to speech API so I could configure Mycroft to use a custom voice.
It was all really cool and I had a lot of fun playing with it. The biggest problem I ran into was the wake word recognition. It worked mostly OK for me on the ReSpeaker from close range but I found as I moved away it went downhill. It was especially bad if I had my device playing music, which is possibly the most common thing I was using my Google Home mini for. I had hoped that the ReSpeaker would help with this, because it had the six microphone array and some built-in loopback hardware to try and cancel out any noise that that was being generated by the ReSpeaker. So any sound output to the speakers would be looped back into the ReSpeaker and could be subtracted from the microphone's input. I found that I just couldn't get it to work well, though. I think the music was causing vibrations that were overloading the microphone array and causing it to be unable to hear me through the music. It's possible it could be improved with a better hardware design to help reduce vibration caused by the device's own speaker. Maybe it works better now, two years later. I think I had configured Mycroft to use Snowboy for wake-word recognition so I could name my Mycroft something else (Jarvis).
One day the Mycroft installation just stopped working on my device after I hadn't touched it in a week or more and I never went back to figure out what was wrong. It's still sitting on the corner of my desk unplugged. If I could have got the wake-word recognition working reliably with music playing I think I would have used it a lot, but I wasn't able to at the time.
I just recently bought a smart watch with a built in "Alexa" app that allows you to send voice commands to your phone which get processed through the watch's official app. I'm instead using Gadgetbridge on Android to interface to the watch. Some kind hacker updated Gadgetbridge to add very basic support for my watch's microphone, allowing you to send the raw voice data to an external application. I'm hoping I'll be able to use this to revive my Mycroft instance and I'll just send voice commands to Mycroft from my watch/phone via a custom Android app/service. In theory, I'll be wearing the watch all the time anyway and having the microphone on my person and right next to my face should hopefully help with the speech-to-text and I won't have to worry about a wake word at all. I've only just barely started working on this, though.
I gave up on mycroft after a long wait and built my own with respeaker and picovoice. i have 2 of them with different wake words. imo it's way better and easier than snowboy. i dont understand why people give their data to amazon to set a timer :)
You are using picovoice as the assistant? Is it en entire solution for that? Or are you running a DIY Mycroft device with picovoice as the wake word detector? I'll have to check this out but I've been trying to stick with open source technologies where I can. I don't trust that a free tier will remain free forever, but it may be worth testing out.
Google assistant on your phone can accept text input. If you're on a relatively recent version of Android you should be able to long-press the home button, then tap the keyboard icon in the popup. Works the same as a voice prompt
A lot of assistant functionality is just getting data from the internet, which search engines already know how to present and format in a useful way.
If you need to go to a specific spot in the house to write some text that turns on a light it seems easier to just walk to an actual light switch? For general automation then I think there are some visual block-based configurators to set up triggers for smart appliances otherwise.
This is actually how Mycroft handles it, more or less.
The wakeword ("hey Mycroft") is done on-device, but everything you say after that is sent to a speech-to-text API. That text is then routed to the appropriate skill to handle. So when you're writing the skill you only worry about the content of that text
The recent comments on the Mycroft kickstarter [0], which was funded four years ago, indicate that the company is shipping preorders. However, only 10% of the units are going to their backers. Instead they are selling the units to new customers. If you are backer 2000, they might not fulfill your order for years to come, based on the production rate quoted there by their new CEO.
This is not a viable way to treat your original, most ebthusiatic customers. They will go on forums like HN and bitterly complain, warning other potential customers not to invest in a company that clearly does not respect its users.
The pi isn't really fast enough to process the speech in real time. deepspeech by mozilla was cited as an offline alternative to the Google speech API but it's difficult to set up with Mycroft and doesn't work very well (lack of data and lag - https://mycroft.ai/voice-mycroft-ai/). Because of this, Mozilla set up Common Voice (https://commonvoice.mozilla.org/en) to help build open datasets of voice recordings.
> The pi isn't really fast enough to process the speech in real time.
If you've got an iPhone... put it in to airplane mode so that it is local only. You'll note that Siri no longer works when you do this. However... open up the notes app and tap the microphone. Do some interesting text...
> Mister Smith said that he wanted a two by four and half of a pie.
(if you don't have an iDevice, it transcribes this as:
> Mr. Smith said he wanted a 2 x 4 and 1/2 of a pie
That is without a network and done in real time. We can compare the relative processing capabilities of an iPhone and the RPi, but offline speech to text is feasible on a device of limited capabilities.
Yeah, but this is the closed source Apple implementation of speech to text versus Mozilla's abandoned deepspeech. I'm sure its possible to get it working well on a pi but I don't have the time to create and maintain a personalised speech training set and then optimise the resultant models.
Fair 'nuff... though I was after a "even with an older model iPhone, and no net connection, there the ability to do speech to text (and even with some interesting transformations of "two by four"), it can be done locally."
> In order to provide an additional layer of privacy for our users, we proxy all STT requests through Mycroft's servers. This prevents Google's service from profiling Mycroft users or connecting voice recordings to their identities.
I didn't know the specifics of it, that has a lot more information and is is an interesting read.
One of the bits in there caught my eye...
> We created a language-specific phonetic specification of the "Hey Siri" phrase. In US English, we had two variants, with different first vowels in "Siri"—one as in "serious" and the other as in "Syria." We also tried to cope with a short break between the two words, especially as the phrase is often written with a comma: "Hey, Siri." Each phonetic symbol results in three speech sound classes (beginning, middle and end) each of which has its own output from the acoustic model.
And the British version getting false positives on wake up with world politics.
The specifics of the wake up and that its done with a ML model rather than a low power wake word chip akin to https://www.syntiant.com/post/syntiant-low-power-wake-word-s... is also interesting - and impressive that they were able to get it to be that low power.
DeepSpeeech is very old software. Vosk works just fine https://github.com/alphacep/vosk-api. People even run tiny Whisper on Pi, though they have to wait ages.
Is there a good, affordable, open-ish dev platform with an array of far-field microphones? If not, would it be possible to teardown an Amazon Echo Dot and attach their microphone array to an ODROID Arm SBC? Might depend on whether the Alexa microphone array is using a dedicated audio processor chip for echo cancellation.
Why is this so damn expensive? I have plenty of good hardware available to run the compute for this thing. Just make a daemon/app architecture where I can use my phone as a microphone and run daemons on whatever hardware I need to control.
I just don't see this being worth the money. Hundreds of $ to make switching music slightly more convenient just seems like a colossal waste of money to me.
I think we're spoiled by Google and Amazon losing so much money for so long
For example, I noticed a couple years ago at the store that a regular featureless analog wall clock was more expensive than an Echo Dot
These guys need to pay for their software dev out of hardware sales and can't hope for a runaway success yet, so of course it'll cost a painfully lot more
I use Mycroft for around 2-3 years on a raspberry pi with an microphone array and the quality is still not nowhere near the level were I would give it to my mom or granny.
Depends on what it's used for imo. I've got the same setup and--apart from some sexism in the voice recognition--the basics work flawlessly. Once it's up and running it's trivial to set timers, check the weather, do basic unit conversions, etc...
Maybe once a month it freezes and it just needs to be restarted, anyone can do that who can get used to talking to a robot in the first place.
The more complicated stuff, I agree is not fit for people who aren't comfortable with a terminal. I also use it to control my lights and play spotify, but I'm only able to do that because I'm comfortable messing around with it and have the skills (and desire) to debug it when it breaks every other week.
It's nowhere near as polished as Alexa or similar, but it's good for the basics or for hobbyists who don't want to be spied on.
Yes. It reliably responds to the wakeword ("hey Mycroft") from men, and only responds about 50% of the time to women.
When my sister visits, it almost never responded to her for example.
And in my personal experience, I used to have a very deep voice and it always responded to me. I decided I would rather have a more androgynous voice and did some voice training to accomplish that and use a pretty neutral pitch. Now it only responds to my normal voice about 50% of the time, so I intentionally drop my voice an octave whenever I speak to it.
But somehow it's not just about pitch either. I have friends who are trans men, and speak in a deep voice but they still have trouble getting Mycroft to respond.
OK, I see. Really interesting, but I guess not surprising. We talk a lot about inherent/implicit bias these days, and that's probably an example of it.
I wonder if the Mycroft project people are aware of the issue?
Been looking for an open source tool like this for a while now - but to automate some home security stuff. All I need is good basic functionality anyways. So its good to know it at least works.
The sexism part is horse manure though. A strong accusation on a likely small team on tight deadlines and budgets who cannot cater to everyone all at once, like Big Tech and their massive resources and teams.
I don't mean to say the devs have any ill intent. Just pointing out the reality it has trouble with feminine voices. Like veidr points out in response to my comments, implicit bias is a big issue. Probably they originally trained the wakeword model entirely on American cis men, so naturally it has trouble recognizing any voice outside of that norm. It's not an accusation, this is a very well documented pitfall of machine learning.
I'm very grateful to the Mycroft team because I love smart speakers but am not willing to sacrifice my privacy to such a degree as I would have to to use a google home or Alexa or anything like that. That does not mean I won't point out its flaws.
I installed their self-hosted minic3 tts the other day to add a voice to my home assistant on prem smart home. It sounds unbelievably good compared to the picotts crap I was using before. Pretty stoked. Now I want to try getting the voice assistant hooked up too.
I really liked the idea from an earlier post with continuous recording + Whisper for transcription + keyword based actions. The drawback is asynchronous execution of your actions, but that setup seems very flexible!
There was another post[0] on here today about Amazon losing $10B on Alexa this year. The only other big player is Google who I assume must also run the division at a big loss - at least on the hardware side of things as I've got loads of their devices dotted around the house most were given to me free or at a stupidly low cost (£20 each). Even the ones like this with a full colour screen I've only paid £49 for.
It's an interesting market that I don't think either has figured out too well. Anecdotally, I've not really seen them used for shopping or even really shopping lists and the search model doesn't seem quite as lucrative as if someone used their phone or computer to search.
The one thing they do seem to do well in is as an introduction to - and hub for - the smarthome but I struggle to see how that will make these big subsides viable.
Which brings us back to this. A bit ahead of it's time perhaps but if Amazon/Google pull back their subsidies then this kind of thing might be where we have to go. I'd be happier not using Google Home and have mostly moved to zigbee switches with home assistant to control now anyway. Maybe voice control was a bit of a flash in the pan?
Which is essentially a Raspberry Pi 4 with an LCD display.
However, not wanting to spend time and money integrating the hardware and the software and building an enclosure around it I'd say it's still a fair deal.
Amazon is merely selling devices at cost, which is why they're losing $10 billion a year on Alexa. All this is probably going to stop soon so people end up with useless devices. Hopefully hackers will be able to run MyCroft on them.
He didn't say it's a CRT, he said it's a 'CRT looking thing', which it definitely is with the thick extension behind the monitor.
Personally I do like the design but I'm quite fond of Fallout/Alien style retro displays so YMMV. As a personal assistant type thing the original vertical prototype with eyes seemed better though.
The key feature I haven't seen any of these opensource projects implement is microphone response coordination: If you have multiple microphones and speakers, which one responds?
My google home's are terrible at this: often one in another room responds, but at least it's only one. When I tried to run Genie (https://genie.stanford.edu/) I had multiple devices responding simultaneously. It was a disaster.
For me, this is the core feature that will enable me to swap out my corporate listening devices for an opensource, cloud-free alternative.
I hope this project makes into people's homes, but I really doubt it. Google invests so much resources into their Assistant (a recruiter recently told me they have over 1000 vacancies). Given that Google has it's own very advanced and very efficient cloud infrastructure, their own ML processors and an army of devs and AI scientists, their assistant will always be cheaper and "smarter" than a device build by a small company on top of an open source project.
I have one of these devices. I'm still a bit mystified about what I want to use it for. But, after reading these threads, I've now got a few ideas that make me very excited.
I have come to despise the Google/Alexa/Siri devices. I'll explain why.
I hate that Google Home devices always give you a direct answer when you ask a question. The reason I hate this is because my kids use it to get an answer, without any work, without any thinking, without any consideration that there might be context to the answer. If they were to research and read about the question, they would learn so much more. But, they, like all people, want a simple and compact answer. And, I'm sure Google engineers have their RSUs tied to some KPI that says "make answers as simple and compact" so it will never come out any way other than this from Google.
I hate that Google permits my kids to play the same damn song over and over again. (Cue sentimental music...). In my day, we listened to the radio and it might have been bad for my dad for five minutes and he scowled the whole way as enjoyed some utterly awful pop song, but then that song ended and he didn't have to listen to it for a few hours. Modern radio is worse, but at least you can take a break for an hour before they play (and are paid to play) the same song over and over.
I hate the surveillance aspect of Google. I don't want to have profiles generated of my kids such that when Google revenues dip in a few years they are enticed by an offer from that shady insurance conglomerate that really wants to know whether any of them discussed depression or racism.
So, if I can use a Mycroft device to:
* Permit them to ask questions, but give them answers in a way they have to dig and think and explore, that would be really cool. I'm sure this isn't easy, but it will never happen with Google/Alexa/Siri because they only care about MONETIZING those interactions.
* Give me more control over how media is consumed. The people working at YouTube will never have a KPI for "make sure you can only play one song per hour" and Google Home will never have that KPI, so it will never happen. That will never be something they can MONETIZE. It seems like it will be a lot more challenging to get music onto my Mycroft, but I prefer to play Jazz radio and because there still are live streams, I think you could get off the YouTube/Spotify/Amazon music train anyway. I got rid of so much of my music, but you can play shared files: https://mycroft-ai.gitbook.io/mark-ii/basic-commands#jukebox
* Forget the worries about surveillance. Mycroft right now uses Google for text to speech, but it can anonymize it enough for me not to worry as much.
Mycroft was the first thing I thought of when I read the first press releases from OpenAI about Whisper! Mycroft has historically used Google for voice recognition. Exciting to see self-hosted alternatives like Whisper coming out.
They deserve credit for being reasonably transparent about their processes though. Their blog has been very interesting to follow. They had a ton of issues trying to work out their original design, then the pandemic hit. So in a way, timing has been bad. In the meantime they spent a lot of time on the software side, then on the fundraising side, then deck shuffling at the top, and now they're finally shipping something that looks nothing like what the Kickstarter showed. Instead of being close to the $200 price point they wanted to be at, it's currently $349, then will go to $500. They're at a point where if they don't get sales at those prices, they're not going to be able to deliver to backers; at least that's how they framed it when they discussed rollout.
They are actually shipping, though! Which didn't look like a guarantee for awhile. However, the reviews have been a bit lukewarm; [this][1] being an example. They had a way to make skills and they apparently changed it in order to better accommodate their final product, so it seems like there's been a bit of fracturing of the ecosystem as a result.
In conclusion: I'd love for you to support them! I still believe this segment needs a player like Mycroft, Mycroft can be that player, and I really want to not have my investments go to waste. But I 100% would not blame you if you looked at the company's arc and said "no thanks".
[1]: https://old.reddit.com/r/Mycroftai/comments/yitzzk/mycroft_m...