Philosophy

Dot product morality

The first time I felt confused about morality was as a child. I was about six, and saw a D&D-style role-playing magazine; On the cover, there were two groups preparing to fight, one dressed as barbarians, the other as soldiers or something1. I asked my brother “Which are the goodies and which are the baddies?”, and I couldn’t understand him when he told me neither of them were.

Boolean.

When I was 14 or so, in the middle of a Catholic secondary school, I discovered neopaganism; instead of the Bible and the Ten Commandments, I started following the Wiccan Rede (if it doesn’t hurt anyone, do what you like). Initially I still suffered from the hubris of black-and-white thinking, even though I’d witnessed others falling into that trap and thought poorly of them for it, but eventually my exposure to alternative religious and spiritual ideas made me recognise that morality is shades of grey.

Float.

Because of the nature of the UK education system, between school and university I spent 2 years doing A-levels, and one of the subjects I studied was philosophy. Between repeated failures to prove god exists, we covered ethics, specifically emotivism, AKA the hurrah/boo theory, which claims there are no objective morals, and that claims about them are merely emotional attitudes — the standard response at this point is to claim that “murder is wrong” is objective, at which point someone demonstrates plenty of people disagree with you about what counts as murder (abortion, execution, deaths in war, death by dangerous driving, meat, that sort of thing). I don’t think I understood it at that age, any more than I understood my brother saying “neither” when I was six; it’s hard to be sure after so much time.

Then I encountered complicated people. People who could be incredibly moral in one axis, and monsters in another. I can’t remember the exact example that showed it first, but I have plenty to choose from now — on a national scale, the British empire did a great deal to end slavery, yet acted in appalling ways to many of the people under it’s rule; on an individual scale, you can find scandals for Gandhi and Churchill, not just obvious modern examples of formerly-liked celebrities like Kevin Spacey and Rolf Harris. In all cases, saying someone is “evil” or “not evil”, or even “0.5 on the 0-1 evil axis” is misleading — you can trust Churchill 100% to run 1940 UK while simultaneously refusing to trust him (0% trust) to care about anyone who wasn’t a white Protestant, though obviously your percentages might be different.

I’ve been interested in artificial intelligence and artificial neural networks for longer than I’ve been able to follow the maths. When you, as a natural neural network, try to measure something, you do so with a high-dimensional vector-space of inputs (well, many such spaces, each layered on top of each other, with the outputs of one layer being the inputs of the next layer) and that includes morality.

When you ask how moral someone else is, how moral some behaviour is, what you’re doing is essentially a dot-product of your moral code with their moral code. You may or may not filter that down into a single “good/bad” boolean afterwards — that’s easy for a neural network, and makes no difference.

1 I can’t remember exactly, but it doesn’t matter.

Advertisements
Standard
Philosophy, Psychology

A life’s work

There are 2.5 billion seconds in a lifetime and (as of December 2018) 7.7 billion humans on the planet.

If you fight evil one-on-one, if you refuse to pick your battles, if only 1% of humans are sociopaths, you’ve got 21 waking seconds per opponent — and you’ll be fighting your whole life, from infancy to your dying breath and from when you wake to when you sleep, with no holiday, no weekends, no retirement.

Conversely, if you are a product designer, and five million people use your stuff once per day, every second you save them saves a waking lifetime of waiting per year. If you can relieve a hundred thousand people of just 5 minutes anxiety each day (say, about social media notifications), you’re saving six and a half waking lifetimes of anxiety every year.

When people complained about the cost of the Apollo programs, someone said Americans spent more on haircuts in the same time. How many Apollo programs of joy are wasted tapping on small red dots or waiting for them?

Standard
Minds, Philosophy, Psychology

One person’s nit is another’s central pillar

If one person believes something is absolutely incontrovertibly true, then my first (and demonstrably unhelpful) reaction is that even the slightest demonstration of error should demolish the argument.

I know this doesn’t work.

People don’t make Boolean-logical arguments, they go with gut feelings that act much like Bayesian-logical inferences. If someone says something is incontrovertible, the incontrovertibility isn’t their central pillar — when I treated it as one, I totally failed to change their minds.

Steel man your arguments. Go for your opponent’s strongest point, but make sure it’s what your opponent is treating as their strongest point, for if you make the mistake I have made, you will fail.

If your Bayesian prior is 99.9%, you might reasonably (in common use of the words) say the evidence is incontrovertible; someone who hears “incontrovertible” and points out a minor edge case isn’t going to shift your posterior odds by much, are they?

They do? Are we thinking of the same things here? I don’t mean things where absolute truth is possible (i.e. maths, although I’ve had someone argue with me about that in a remarkably foolish way too), I mean about observations about reality which are necessarily flawed. Flawed, and sometimes circular.

Concrete example, although I apologise to any religious people in advance if I accidentally nut-pick. Imagine a Bible-literalist Christian called Chris (who thinks only 144,000 will survive the apocalypse, and no I’m not saying Chris is a Jehovah’s Witness, they’re just an example of 144k beliefs) arguing with Atheist Ann, specifically about “can God make a rock so heavy that God cannot move it?”:

P(A) = 0.999 (Bayesian prior: how certain Chris’s belief in God is)
P(B) = 1.0 (Observation: the argument has been made and Ann has not been struck down)
P(B|A) = 0.99979 (Probability that God has not struck down Ann for blasphemy, given that God exists — In the Bible, God has sometimes struck down non-believers, so let’s say about 21 million deaths of the 100 billion humans that have ever lived to cover the flood, noting that most were not in the 144k)

P(A|B) = P(B|A)P(A)/P(B) = 0.99979×0.999/1.0 = 0.99879021

Almost unchanged.

It gets worse; the phrase “I can’t believe what I’m hearing!” means P(B) is less than 1.0. If P(B) is less than 1.0 but all the rest is the same:

P(B) = 0.9 → P(A|B) = P(B|A)P(A)/P(B) = 0.99979×0.999/0.9 = 1.1097669

Oh no, it went up! Also, probability error, probability can never exceed 1.0! P>1.0 would be a problem if I was discussing real probabilities — if this was a maths test, this would fail (P(B|A) should be reduced correspondingly) — but people demonstrably don’t always update all their internal model at the same time: if we did, cognitive dissonance would be impossible. Depending on the level of the thinking (I suspect direct processing in synapses won’t do this, but that deliberative conscious thought can) we can sometimes fall into traps, so this totally explains another observation: some people can take the mere existence of people who disagree with them as a reason to believe even more strongly.

Standard
Philosophy

Mathematical Universe v. Boltzmann Brains

I’m a fan of the Mathematical Universe idea. Or rather, I was. I think I came up with the idea independently of (and before) Max Tegmark, based on one of my old LiveJournal blog-post dated “2007-01-12” (from context, I think that’s YYYY-MM-DD, not YYYY-DD-MM).

Here’s what I wrote then, including typos and poor rhetorical choices:

Ouch, my mind hurts. I've been thinking about The Nature of Reality again. This time, what I have is the idea that from the point of view of current science, the universe can be described as a giant equation: each particle obeys the laws of physics, which are just mathematical formula. Add to this that an mathematical system can exist before anyone defines it (9*10 was still equal to 90 before anybody could count that high), and you get reality existing because its underlying definitions do not contradict each-other.

This would mean that there are a lot of very simple, for lack of a better word, "universes" along the lines of the one containing only Bob and Sarah, where Sarah is three times the age of Bob now, and will be twice his age in 5 years' time. But it would also mean that there are an infinite number of universes which are, from the point of view of an external observer looking at the behaviour of those within them, completely indistinguishable from this one; this would be caused by, amongst other things, the gravitational constant being represented by an irrational number, and the difference between the different universes' gravitational constants varies by all possible fractions (in the everyday sense) of one divided by Graham's number.

Our universe contains representations of many more simple ones (I've described a simple one just now, and you get hundreds of others "universes" of this type in the mathematics books you had at school); you cannot, as an outside observer, interfere with such universes, because all you end up with is another universe. The original still exists, and the example Sarah is still 15. In this sense of existence, the Stargate universe is real because it follows fundamental rules which do not contradict themselves. These rules are of course not the rules the characters within it talk about, but the rules of the Canadian TV industry. There may be another universe where the rules the characters talk about do apply, but I'm not enough of a Stargate nerd to know if they are consistent in that way.

The point of this last little diversion, is that there could be (and almost certainly is) a universe much more complex than this one, which contains us as a component. The question, which I am grossly unqualified to contemplate but tried anyway (hence my mind hurting), is what is the most complex equation possible? (Apart from "God" in certain senses of that word). All I feel certain of at the moment, is that it would "simultaneously" (if you can use that word for something outside of time but containing it) contain every possible afterlife for every possible subset of people.

Tomorrow I will be in Cambridge.

Since writing that, I found out about Boltzmann brains. Boltzmann brains are a problem, because if they exist at all then it is (probably) overwhelmingly likely that you are one, and if you are one then it’s overwhelmingly likely that the you’re wrong about everything leading up to the belief that they exist, so any belief in them has to be irrational even if it’s also correct.

Boltzmann brains appear spontaneously in systems which are in thermal equilibrium for long enough (“long enough” being 101050 years from quantum fluctuations), but if you have all possible universes then you have a universe, an infinite number of universes, where Boltzmann brains are the most common form of brain — Therefore, all the problems that apply to Boltzmann brains must also apply to the Mathematical Universe.

Standard
AI, Minds, Philosophy, Politics

A.I. safety with Democracy?

Common path of discussion:

Alice: A.I. can already be dangerous, even though it’s currently narrow intelligence only. How do we make it safe before it’s general intelligence?

Bob: Democracy!

Alice: That’s a sentence fragment, not an answer. What do you mean?

Bob: Vote for what you want the A.I. to do 🙂

Alice: But people ask for what they think they want instead of what they really want — this leads to misaligned incentives/paperclip optimisers, or pathological focus on universal instrumental goals like money or power.

Bob: Then let’s give the A.I. to everyone, so we’re all equal and anyone who tells their A.I. to do something daft can be countered by everyone else.

Alice: But that assumes the machines operate on the same speed we do. If we assume that an A.G.I. can be made by duplicating a human brain’s connectome in silicon — mapping synapses to transistors — then even with no more Moore’s Law an A.G.I. would be out-pacing our thoughts by the same margin a pack of wolves outpaces continental drift (and the volume of a few dozen grains of sand).

Because we’re much too slow to respond to threats ourselves, any helpful A.G.I. working to stop a harmful A.G.I. would have to know what to do before we told it; yet if we knew how to make them work like that, then we wouldn’t need to, as all A.G.I. would stop themselves from doing anything harmful in the first place.

Bob: Balance of powers, just like governments — no single A.G.I can get too big, because all the other A.G.I. want the same limited resource.

Alice: Keep reading that educational webcomic. Even in the human case (and we can’t trust our intuition about the nature of an arbitrary A.G.I.), separation of powers only works if you can guarantee that those who seek power don’t collude. As humans collude, an A.G.I. (even one which seeks power only as an instrumental goal for some other cause) can be expected to collude with other similar A.G.I. (“A.G.I.s”? How do you pluralise an initialism?)


There’s probably something that should follow this, but I don’t know what as real conversations usually go stale well before my final Alice response (and even that might have been too harsh and conversation-stopping, I’d like to dig deeper and find out what happens next).

I still think we ultimately want “do what I meant not what I said“, but at the very least that’s really hard to specify and at worst I’m starting to worry that some (too many?) people may be unable to cope with the possibility that some of the things they want are incoherent or self-contradictory.

Whatever the solution, I suspect that politics and economics both have a lot of lessons available to help the development of safe A.I. — both limited A.I. that currently exists and also potential future tech such as human-level general A.I. (perhaps even super-intelligence, but don’t count on that).

Standard
AI, Philosophy

Unfortunate personhood tests for A.I.

What if the only way to tell if a particular A.I. design is or is not a person, is to subject it to all the types of experience — both good and harrowing — that we know impact the behaviour of the only example of personhood we all agree on, and seeing if it changes in the same way we change?

Is it moral to create a digital hell for a thousand, if that’s the only way to prevent carbon chauvinism/anti-silicon discrimination for a billion?

Standard
AI, Futurology, Philosophy, Psychology, Science

How would you know whether an A.I. was a person or not?

I did an A-level in Philosophy. (For non UK people, A-levels are a 2-year course that happens after highschool and before university).

I did it for fun rather than good grades — I had enough good grades to get into university, and when the other A-levels required my focus, I was fine putting zero further effort into the Philosophy course. (Something which was very clear when my final results came in).

What I didn’t expect at the time was that the rapid development of artificial intelligence in my lifetime would make it absolutely vital that humanity develops a concrete and testable understanding of what counts as a mind, as consciousness, as self-awareness, and as capability to suffer. Yes, we already have that as a problem in the form of animal suffering and whether meat can ever be ethical, but the problem which already exists, exists only for our consciences — the animals can’t take over the world and treat us the way we treat them, but an artificial mind would be almost totally pointless if it was as limited as an animal, and the general aim is quite a lot higher than that.

Some fear that we will replace ourselves with machines which may be very effective at what they do, but don’t have anything “that it’s like to be”. One of my fears is that we’ll make machines that do “have something that it’s like to be”, but who suffer greatly because humanity fails to recognise their personhood. (A paperclip optimiser doesn’t need to hate us to kill us, but I’m more interested in the sort of mind that can feel what we can feel).

I don’t have a good description of what I mean by any of the normal words. Personhood, consciousness, self awareness, suffering… they all seem to skirt around the core idea, but to the extent that they’re correct, they’re not clearly testable; and to the extent that they’re testable, they’re not clearly correct. A little like the maths-vs.-physics dichotomy.

Consciousness? Versus what, subconscious decision making? Isn’t this distinction merely system 1 vs. system 2 thinking? Even then, the word doesn’t tell us what it means to have it objectively, only subjectively. In some ways, some forms of A.I. looks like system 1 — fast, but error prone, based on heuristics; while other forms of A.I. look like system 2 — slow, careful, deliberative weighing all the options.

Self-awareness? What do we even mean by that? It’s absolutely trivial to make an A.I. aware of it’s own internal states, even necessary for anything more than a perceptron. Do we mean a mirror test? (Or non-visual equivalent for non-visual entities, including both blind people and also smell-focused animals such as dogs). That at least can be tested.

Capability to suffer? What does that even mean in an objective sense? Is suffering equal to negative reinforcement? If you have only positive reinforcement, is the absence of reward itself a form of suffering?

Introspection? As I understand it, the human psychology of this is that we don’t really introspect, we use system 2 thinking to confabulate justifications for what system 1 thinking made us feel.

Qualia? Sure, but what is one of these as an objective, measurable, detectable state within a neural network, be it artificial or natural?

Empathy or mirror neurons? I can’t decide how I feel about this one. At first glance, if one mind can feel the same as another mind, that seems like it should have the general ill-defined concept I’m after… but then I realised, I don’t see why that would follow and had the temporarily disturbing mental concept of an A.I. which can perfectly mimic the behaviour corresponding to the emotional state of someone they’re observing, without actually feeling anything itself.

And then the disturbance went away as I realised this is obviously trivially possible, because even a video recording fits that definition… or, hey, a mirror. A video recording somehow feels like it’s fine, it isn’t “smart” enough to be imitating, merely accurately reproducing. (Now I think about it, is there an equivalent issue with the mirror test?)

So, no, mirror neurons are not enough to be… to have the qualia of being consciously aware, or whatever you want to call it.

I’m still not closer to having answers, but sometimes it’s good to write down the questions.

Standard