People hear or read the words Machine Learning and assign a HAL like technology to it.
It’s definitely hard to explain an idea like SVM, it’s applications, and how it works/what it does without a background in some linear algebra.
And then to broach more complicated topics like temporal real time neural networks in computer vision for something like anomaly detection.
From my conversations with people because it’s something that isn’t understood there is almost a cognitive dissonance. One of which expects so much now, but at the same time denies the future application space when the conversation becomes more personal. “Why aren’t self driving cars a thing yet...” and then “A machine couldn’t do what I do.”
It’s worrying that there will definitely be regulation coming to the field, where the people writing the laws and the people the writers represent have a hard time understanding what they are trying to regulate. Add in the fact that the tech itself is somewhat cheap and will only get less expensive. You only have to look as far as the Deep Fakes fiasco on Reddit. The guy creating fakes of celebrities being superimposed on porn is a self proclaimed amateur and was able to accomplish a quite a bit at home with Keras.
I do want to mention that a lot of the papers I’ve read recently on real time anomaly detection rely on automated labeling and not having human labelers.
It’s only important to have human readable labels if a human has to interpret directly.
I don't actually think this is the case. The basic idea is that you can represent data as points in n-dimensional space and draw decision boundaries in that space. I think most people should at least be able to understand this geometrically for n=2/3 and then accept that it possibly extrapolates to n > 3.
Most people can't reliably read a bus timetable, calculate a 10% tip, or multiply 537 by 12. A concept like SVM is absolute voodoo to the overwhelming majority of the population and always will be, no matter how you try and explain it.
I admit it's a failure of imagination on my part, but in what context would a human never need to interpret the result? It seems to me that for anything with human consequences, there's going to still be a human in the loop somewhere.
Also want to add there are more traditional avenues of which non-labeled data can be used.
Unsupervised Learning where you read in arbitrary data and cluster for some sort of probability based system.
Another good thing to look at is a system with “some” labeling like Semi-supervised Learning.
Any situation where you can do binary classification doesn’t need any true labeling as long as you can evaluate attributes of some object, complex or not.
It’s definitely hard to explain and idea like SVM, it’s applications, and how it works/what it does without a background in some linear algebra.
I agree with your first point—there is a general lack of understanding about ML/AI, even among knowledgeable laypeople. But on your second point, I think this illustrates the tendency for technical folks (e.g. ML engineers) to overemphasize the importance of specific algorithms (e.g. SVMs, CNNs) and implementations or frameworks (e.g. TensorFlow). These are the things that are important to us, so we try to convey that when connecting with others. Even when I intentionally intending to simplify things to share enthusiasm and understanding with a non-technical audience, it is easy to slip into unintentionally alienating statements like, “machine learning is just matrix multiplication and gradient descent done on a GPU”. It might be a subconscious way of justifying our hard-won knowledge and its value.
But, I think it’s possible to have demystifying conversations that help people build a genuine understanding of ML/AI. And it’s also possible to do so in a way that instills a sense of fascination and respect for the field and the work, skill, and resources it involves to do well.
The themes of probability, statistics, and linear algebra can be honored and elucidated by discussing their core relevance:
Probability—What does a statement like “there is a 70% probability that this image is of a Fuji apple” mean? How does that differ from a statement about an event in the future like “there is a 70% probability that it will rain in London tomorrow?”. How do those probabilities change depending on factors that we can measure (conditional probability—the heart of statistics and ML)? What is an expected value and how does it relate to the ideas of risk and optimal decisions?
Statistics—What is a statistical model and what does it mean to “build” one? How much data do we need to collect for this building process, and in what format and subject to what assumptions and methodology? What are the inputs and outputs of a model that are relevant to my problem? What is “ground truth”, and how do we get enough examples of it with enough confidence? How do I know my model will actually work in the real world (generalization), and how bad is it if the model is wrong (will my users suffer an injustice or die, or just eat the wrong flavor of apple)? What are sampling bias and statistical bias, and how do they relate (or not) to bias in AI systems? What is a distribution? An anomaly? What are clusters, and how do we define whether two things are similar or not?
Linear algebra—How do we store “unstructured” data like an image or document so that a machine can work with it? How can we use math and computers to (efficiently) transform the data from input to output? What does it mean for a machine to “learn”? What is a tensor and why is it flowing? Wait, you want how much money to spend on graphics cards?
I get variants of the above (non-trivial) questions from interested but largely uninitiated stakeholders in government and business quite often. Approaching these conversations in a “big picture” way that respects peoples’ intelligence and curiosity is tremendously more rewarding and productive than getting down into eyeglaze-inducing technical/architectural rabbit holes.
If I'm a delivery driver roaming all about town with my camera on my dashboard which is tagging street signs and speed limits and other data, do I get a residual if that data is used to make billions of dollars? Do I even get paid .02/mile to collect it, or am I forced in order to have the gig in the first place?
A morally straightforward way to make a more equitable future would be to acknowledge how important it was to collect and classify all of this data, and provide the humans that are doing that work with some part of the insane returns of scale that will (and already have been) achieved by these organizations. Sure, you contributed a really really small amount to the overall algorithm, but that really small amount generates billions / trillions of profit, so maybe it scales, and it will scale over time after training is done.
I hope we see efforts to do this kind of thing.
Waymo released a tagged data set for self-driving:
The FT article is a little better:
"Its data, collected using cameras and sensors on Waymo vehicles in a variety of environments and road conditions, include 1,000 high-resolution driving scenes that have been “painstakingly labelled” to indicate the presence of 12m objects such as pedestrians, cyclists and signage."
Via Reddit user jaxbotme:
“I have concerns about the license on this, which I [voiced on twitter](https://twitter.com/johnluu/status/1164227419919642624) and got a reply from one of Waymo's comms people.
Overall, it's free as in freeware but not free/open the way you might expect a research dataset to be. Not only are you prohibited from redistributing it (not too uncommon, but still limiting), but researchers cannot publish trained models or their weights, partially or in whole, and researchers cannot use the models on a physical car apparently. There are strong guards here to prevent competition, but they also get in the way of researchers and I feel like it's not really honest to call this an 'open dataset'. If anyone has counter examples where this is actually standard practice, let me know. But I was disappointed in this move for a number of reasons. For one, it would literally be illegal for me to provide this data to someone on a metered or slow connection who cannot download the dataset themselves.”
There aren't many reasons an engineer would want to jump into the political fray besides altruism. It's like choosing to join the largest, most corrupt, bureaucratic, change-averse, and valueless company that ever existed. For lawyers, that probably sounds like a dream come true.
My perspective as an engineer: Yes, there are huge important problems I would like to help solve. No, I don't think I'm willing or capable of solving that class of problems in a system like that.
- Terrible consultants, middlemen, and lifers making progress too hard or impossible.
- Constantly changing or unclear requirements
- Making penny-wise, pound-foolish decisions for political reasons or budgeting idiosyncracies
- Resistance from lawmakers since it would give more government influence to people from states that lean towards some of other party (put more bluntly, Republicans don't want to give a bunch of liberal or left-leaning people from California, Washington, and New York more influence)
- The government only seeing the value in surveillance/military software that people are likely to refuse to work on. Not sponsoring software that actually directly helps people domestically
What I think we need instead is something like the US Digital Service, along with some other government programs, rolled into a Department of Technology that will hopefully have a completely different culture, payscale, etc. compared to the rest of the government. And maybe even something like jury duty, but for software people, engineers, scientists, etc. so it's more representative of the general public and not the old-school government culture.
Can you elaborate on this? I hear lots of criticism around inept and ill-intentioned consultants and “govvies” alike, but I think a lot of it is a bit unfairly harsh. There are certainly some adverse incentives (chiefly that it’s easy enough to coast along with minimal effort in these jobs if you’re inclined), but I’d say the slight majority of individuals in this space are qualified, capable, and motivated to improve the world via good work—technical or otherwise. Slowly but surely, projects get done and positive change is made.
> Not sponsoring software that actually directly helps people domestically
This really isn’t true. Every US agency has technology initiatives, some more effective than others. See the work that the Department of Veterans Affairs does, as one example. The US government also spends ~$750B on grants annually, much of which goes to fundamental research as well as tech projects. If you really want to dig in, there’s an API to access data on every item of spending (e.g. contract awards, grants): https://www.usaspending.gov/#/
I can mostly only echo hearsay, however I have been tangentially involved in government-focused projects. The big issue I see with the government getting involved in tech is that they don't know what they really want, they don't know how to filter out bullshit/upsells, and they'll frequently overpay for things they don't need and underpay for things that would be very helpful.
I'm not one of those people who wants to defund the government or privatize everything, however I do think the government incentive structure needs to be redesigned and become less cushy while simultaneously becoming more lucrative, accompanied by a huge culture change. The government isn't inherently inefficient, it's only like that because we let it be, one has only to look at other countries (see: Estonia, China) to see how much more effective we could be here.
Regarding actual spending, I want to measure that by its effectiveness, not by the total amount spent. The government could blow all its money on tech initiatives and EMR upgrades and have nothing to show for it, and I wouldn't call that helping people domestically. To address your examples, a lot of those grants are for defense contractors (which only indirectly helps people, at best, and is itself a pretty corrupt old-boys industry). Others are for things like direct academic research though, which I do think is decently well run in general, but isn't really a direct government relationship to tech.
These are good points (as were your original ones). Thanks for the thoughtful reply—I’m just trying to push back slightly on the “government/contractors are all incompetent” narrative where I see it, since I don’t think that’s the root of the problem (the DMV area is the most highly education region of the US, as one counterpoint). Discussion around incentive structures is a lot more productive.
I didn’t give very specific examples with grants, but I just wanted to draw attention to that as an often overlooked and major source of government spending. The federal grants process does have multiple stages of project performance monitoring/evaluation, though that process itself could be made more efficient. Also, sometimes grants do directly support software projects with public benefits. Just as a single example, Stanford DeepDive (now Snorkel) is a system for knowledge base construction that has been used to fight human trafficking, among other things: http://deepdive.stanford.edu/showcase/apps
Do we need AI engineers passing laws or training courses for politicians?
Can the AI be treated like a black box  and the regulation still be effective?
I think in some cases the answer would be yes. A simple thought experiment demonstrates this.
Consider regulation for AI in relation to job losses. Does one need to know the details of AI tech to develop regulation effectively? I would argue not.
The economic impacts can be analysed. At some tipping point, job losses Lead to a cycle of:
reduced demand -> reduced
Revenue in companies
-> reductions in jobs -> reduced demand and so on and on.
To regulate this, one would have to model the economic impacts as best as one could. Understand what jobs are most likely to be lost, in which industries. Understand the key things to monitor (I.e replacement job rate. Assuming jobs are created in other industries). And pass
regulation to mitigate some of the risks or make sure the number of Job losses never hit the tipping point. None of these items leads me to believe the person passing the regulation requires knowledge of AI at an engineers level.
There is the Conant-Ashby
Theorem or good regulator theory . This states “Every good regulator of the system must be a model of the system” 
In this instance the regulation is to deal with ECONOMIC impacts. Therefore an economic model is needed, while impacted by the tech the tech does not need to be modelled. Thus my conclusion.
This poses an interesting question, could all eventualities be catered for in the regulation? There is other theories that could help us answer this question.
Happy to hear different or contradictory views.
You don't need to know the ins and outs of the mathematics to understand what encryption does, but you should be able to reason about how data is encrypted and what is necessary to decrypt, and what law provides you to request decryption. You sure as hell should also be able to grasp that, like the atomic bomb, once out in the world it's there to stay and you can't really prevent it from being used.
As for economic impact - I find that measure does not really get ahead of the problem. In the case of machine learning if you're regulating economic impact you're decades late. The first-movers already have the data sets, the training has occurred and now you're living in their world. I think it's important to understand a bit more upstream so that you can get ahead of these kinds of issues.
Furthermore, a baseline understanding of these algorithms and their potential biases / impacts of misclassification is important to understand to ensure we don't compromise society by their mis-use, or shift scales in a terrible way unintentionally. The earlier the better, and economic impacts are probably a lagging indicator, or even worse considered a huge boon for those areas that have companies doing this work but using labor from elsewhere to train their models.
Lets play some word-games:
* Doctors : Disease, Ebola, Health Care costs
* Accountants: Corporate Taxes, Tax Loopholes, Economic Stats
* Programmers: Encryption, Websites, "Internet is Tubes", Infrastructure as a Service
* Army Generals: War against ISIS. War against Taliban. War against (any other enemy here).
* Civil Engineers: Bridge design and safety, Urban Smart Growth, Gentrification, Water Rights management across rivers.
* Energy Engineers: Power Plant construction, Nuclear Safety, Solar vs Coal investments.
* Farmers: Patent law applied to GMOs and ownership of genetics, Legal Insecticides and whether or not they are damaging the local ecology, legal limits to water-consumption across various states and locals
Believe it or not: US Politics involve more than just programmer issues. We care about programmer issues because Hacker News attracts programmers. But as soon as we leave the subject of programming, we (as a community) will become blubbering idiots in the face of Hospital issues, Child-care laws, foreign policy and Tax-fraud.
The entire point of a Congressman is to specialize in the ability to talk to a wide variety of experts. Congressmen are "experts at politics": the ability to make a decision WITHOUT being an expert in a subject.
Our job, as experts in our field, is to form a cohesive argument that Congresspeople understand. Well, that's the point of lobbying: hiring specialists who know how to talk to Congressmen, so that our interests are communicated more effectively.
And lets be frank: there are plenty of lobbyists who are actually representing technology's point of view at the moment. Google, Facebook, Microsoft, IBM, and other big-tech all ensure that Congress is keenly aware of the issues affecting our field.
Other fields are not quite as lucky as ours. I don't think we have any right to really complain about the Programmer's role in modern US politics.
Congressmen are experts at getting people to vote for them. The problem is that the people who step up to the challenge of winning this popularity contest aren't necessarily most qualified to be calling the shots.
I'd like to see more STEM workers in general running for office. They'd certainly pique my interest more than yet another lawyer.
All joking aside, I think there is a certain way of thinking that comes from a technical education that could be used more in government.
The Chinese government seems a lot more rational, and it's not just because they have authoritarian control. Our government could be far more rational too.
But that's not the important part of government. The important part of government is:
1. Identifying parties who might be affected by a decision.
2. Informing / talking with said parties.
3. Deciding if said parties have conflicts of interests / biases to consider.
4. Making a decision afterwards.
Expertise in a particular subject is... kind of not needed. There are always more experts out there. The only "expertise" needed is again: the politics. Being able to identify conflicts of interest and account for them.
Sure, a CEO doesn't need to be an engineer because the CEO doesn't do any engineering, but he understands the fundamental forces that shape the success of the engineering projects and he respects the process.
Like Steve Jobs?
> CEO with an engineering background
I mean, Lisa Su and Jensen Huang are great. But I'd argue that Elon Musk is terrible as a CEO.
I mean, you're asking the very basic Bill Gates (Programmer background) vs Steve Jobs (non-programmer) question. Obviously there's a difference, but I wouldn't necessarily say that the lack of engineering background hurt Steve Jobs.
I'm not entirely sure if an engineering background is necessary to be a good or bad CEO. The most important thing is being able to lead your workers. For some, it means being a technical leader, understanding the code and product.
For others, like Steve Jobs, it means being a visionary and making the right business choices. Another great non-technical CEO would be Warren Buffet, a great CEO and visionary: being able to predict the future of the American economy and place investments in the right place.
There's lots of ways to lead and inspire others. We programmers worship technical leaders, because that's our strength. But I'm aware of many other styles of leadership, and respect that too (Steve Jobs, Warren Buffet)
The gap is certainly closing, but it's nowhere near where it needs to be to replace the need for humans. Maybe we'll get there soon, but it won't happen without some new approaches IMO.
Then again, I'm no longer active in the research aspect of the field, so it's hard to gauge how close we might be.
Unsupervised ML is different. From what I can tell (I work in ML and have extensive biology experience), human brains use a combination of approaches that combine unsupervised and supervised learning and oneshot learning and it's not clear that a true "learning AI" requires more technology than what we have today. I think that's an open question in the field but nobody is really addressing it head-on.
Do you think a bird uses its wings any differently than an airplane's to fly?
Also while we are on the subject, is there a general intelligence AI you would like to introduce me to?
In a sense babies do this, i.e. they are cute and smile to engage parents and other adults so they engage with them, which increases rate of learning.
the original idea makes much more sense in the context of the story, it's basically my head-canon now.