AI, Minds, Philosophy, Politics

A.I. safety with Democracy?

Common path of discussion:

Alice: A.I. can already be dangerous, even though it’s currently narrow intelligence only. How do we make it safe before it’s general intelligence?

Bob: Democracy!

Alice: That’s a sentence fragment, not an answer. What do you mean?

Bob: Vote for what you want the A.I. to do 🙂

Alice: But people ask for what they think they want instead of what they really want — this leads to misaligned incentives/paperclip optimisers, or pathological focus on universal instrumental goals like money or power.

Bob: Then let’s give the A.I. to everyone, so we’re all equal and anyone who tells their A.I. to do something daft can be countered by everyone else.

Alice: But that assumes the machines operate on the same speed we do. If we assume that an A.G.I. can be made by duplicating a human brain’s connectome in silicon — mapping synapses to transistors — then even with no more Moore’s Law an A.G.I. would be out-pacing our thoughts by the same margin a pack of wolves outpaces continental drift (and the volume of a few dozen grains of sand).

Because we’re much too slow to respond to threats ourselves, any helpful A.G.I. working to stop a harmful A.G.I. would have to know what to do before we told it; yet if we knew how to make them work like that, then we wouldn’t need to, as all A.G.I. would stop themselves from doing anything harmful in the first place.

Bob: Balance of powers, just like governments — no single A.G.I can get too big, because all the other A.G.I. want the same limited resource.

Alice: Keep reading that educational webcomic. Even in the human case (and we can’t trust our intuition about the nature of an arbitrary A.G.I.), separation of powers only works if you can guarantee that those who seek power don’t collude. As humans collude, an A.G.I. (even one which seeks power only as an instrumental goal for some other cause) can be expected to collude with other similar A.G.I. (“A.G.I.s”? How do you pluralise an initialism?)


There’s probably something that should follow this, but I don’t know what as real conversations usually go stale well before my final Alice response (and even that might have been too harsh and conversation-stopping, I’d like to dig deeper and find out what happens next).

I still think we ultimately want “do what I meant not what I said“, but at the very least that’s really hard to specify and at worst I’m starting to worry that some (too many?) people may be unable to cope with the possibility that some of the things they want are incoherent or self-contradictory.

Whatever the solution, I suspect that politics and economics both have a lot of lessons available to help the development of safe A.I. — both limited A.I. that currently exists and also potential future tech such as human-level general A.I. (perhaps even super-intelligence, but don’t count on that).

Advertisements
Standard
AI, Philosophy

Unfortunate personhood tests for A.I.

What if the only way to tell if a particular A.I. design is or is not a person, is to subject it to all the types of experience — both good and harrowing — that we know impact the behaviour of the only example of personhood we all agree on, and seeing if it changes in the same way we change?

Is it moral to create a digital hell for a thousand, if that’s the only way to prevent carbon chauvinism/anti-silicon discrimination for a billion?

Standard
Futurology, Science, AI, Philosophy, Psychology

How would you know whether an A.I. was a person or not?

I did an A-level in Philosophy. (For non UK people, A-levels are a 2-year course that happens after highschool and before university).

I did it for fun rather than good grades — I had enough good grades to get into university, and when the other A-levels required my focus, I was fine putting zero further effort into the Philosophy course. (Something which was very clear when my final results came in).

What I didn’t expect at the time was that the rapid development of artificial intelligence in my lifetime would make it absolutely vital that humanity develops a concrete and testable understanding of what counts as a mind, as consciousness, as self-awareness, and as capability to suffer. Yes, we already have that as a problem in the form of animal suffering and whether meat can ever be ethical, but the problem which already exists, exists only for our consciences — the animals can’t take over the world and treat us the way we treat them, but an artificial mind would be almost totally pointless if it was as limited as an animal, and the general aim is quite a lot higher than that.

Some fear that we will replace ourselves with machines which may be very effective at what they do, but don’t have anything “that it’s like to be”. One of my fears is that we’ll make machines that do “have something that it’s like to be”, but who suffer greatly because humanity fails to recognise their personhood. (A paperclip optimiser doesn’t need to hate us to kill us, but I’m more interested in the sort of mind that can feel what we can feel).

I don’t have a good description of what I mean by any of the normal words. Personhood, consciousness, self awareness, suffering… they all seem to skirt around the core idea, but to the extent that they’re correct, they’re not clearly testable; and to the extent that they’re testable, they’re not clearly correct. A little like the maths-vs.-physics dichotomy.

Consciousness? Versus what, subconscious decision making? Isn’t this distinction merely system 1 vs. system 2 thinking? Even then, the word doesn’t tell us what it means to have it objectively, only subjectively. In some ways, some forms of A.I. looks like system 1 — fast, but error prone, based on heuristics; while other forms of A.I. look like system 2 — slow, careful, deliberative weighing all the options.

Self-awareness? What do we even mean by that? It’s absolutely trivial to make an A.I. aware of it’s own internal states, even necessary for anything more than a perceptron. Do we mean a mirror test? (Or non-visual equivalent for non-visual entities, including both blind people and also smell-focused animals such as dogs). That at least can be tested.

Capability to suffer? What does that even mean in an objective sense? Is suffering equal to negative reinforcement? If you have only positive reinforcement, is the absence of reward itself a form of suffering?

Introspection? As I understand it, the human psychology of this is that we don’t really introspect, we use system 2 thinking to confabulate justifications for what system 1 thinking made us feel.

Qualia? Sure, but what is one of these as an objective, measurable, detectable state within a neural network, be it artificial or natural?

Empathy or mirror neurons? I can’t decide how I feel about this one. At first glance, if one mind can feel the same as another mind, that seems like it should have the general ill-defined concept I’m after… but then I realised, I don’t see why that would follow and had the temporarily disturbing mental concept of an A.I. which can perfectly mimic the behaviour corresponding to the emotional state of someone they’re observing, without actually feeling anything itself.

And then the disturbance went away as I realised this is obviously trivially possible, because even a video recording fits that definition… or, hey, a mirror. A video recording somehow feels like it’s fine, it isn’t “smart” enough to be imitating, merely accurately reproducing. (Now I think about it, is there an equivalent issue with the mirror test?)

So, no, mirror neurons are not enough to be… to have the qualia of being consciously aware, or whatever you want to call it.

I’m still not closer to having answers, but sometimes it’s good to write down the questions.

Standard
AI, Philosophy

Nietzsche, Facebook, and A.I.

“If you stare into The Facebook, The Facebook stares back at you.”

I think this fits the reality of digital surveillance much better than it fits the idea Nietzsche was trying to convey when he wrote the original.

Facebook and Google look at you with an unblinking eye; they look at all of us which they can reach, even those without accounts; two billion people on Facebook, their every keystroke recorded, even those they delete; every message analysed, even those never sent; every photo processed, even those kept private; on Google maps, every step taken or turn missed, every place where you stop, becomes an update for the map.

We’re lucky that A.I. isn’t as smart as a human, because if it was, such incomprehensible breadth and depth of experience would make Sherlock look like an illiterate child raised by wild animals in comparison. Even without hypothesising new technologies that a machine intelligence may or may not invent, even just a machine that does exactly what its told by its owner… this dataset alone ought to worry any who fear the thumb of a totalitarian micro-managing your life.

Standard
Philosophy

Normalised, n-dimensional, utility monster

From https://en.wikipedia.org/wiki/Utility_monster:

Utilitarian theory is embarrassed by the possibility of utility monsters who get enormously greater sums of utility from any sacrifice of others than these others lose … the theory seems to require that we all be sacrificed in the monster’s maw, in order to increase total utility.

How would the problem be affected if all sentient beings have their utility functions normalised into the same range, say -1 to +1, before comparisons are made?

Example 1: 51% (this is not a Brexit metaphor) of a group gained maximum possible normalised utility, +1, from something that caused 49% maximum possible normalised anti-utility, -1. Is that ethical? Really? My mind keeps saying “in that case look for another solution”, and so I have to force myself to remember that this is a thought experiment where there is no alternative to do—or—do-not… I think it has to be ethical if there really is no alternative.

Example 2: Some event can cause 1% to experience +1 normalised utility while the other 99% to experience -0.01 normalised utility (totalling -0.99). This is the reverse of the plot of Doctor Who: The Beast Below. Again, my mind wants an alternative, but I think it’s valid, that “shut up and multiply” is correct here.


Even if that worked, it’s not sufficient.

If you consider utility to be a space, where each sentient being is their own axis, how do you maximise the vector representing total utility? If I understand correctly, there isn’t a well-defined > or < operator for even two dimensions. Unless you perform some function that collapses all utilities together, you cannot have Utilitarianism for more than just one single sentient being within a set of interacting sentient beings — that function, even if it’s just “sum” or “average”, is your “ethics”: Utilitarianism is no more than “how to not be stupid”.

Standard