Why extrapolate?

This might be the most obvious point where my intuitions differ from my fellow researchers’. One day I was thinking about Eliezer’s coherent extrapolated volition proposal (CEV), and it occurred to me that though the coherence part seemed pretty manageable, the bit about extrapolation dynamics promised to be a philosophical nightmare. The idea seemed misguided in the first place. Give people what they wanted if they knew more? What if they don’t want to know more? Nerds are genuinely curious, but normal people, even smart normal people, have no problem with not understanding the universe. Is this a justified violation of peoples’ volitions? But I hadn’t paid attention to this sense of ugliness until I realized that not only would extrapolation feel wrong, it might not even be necessary in the first place. Why is it, again, that we can’t just give people what they want?

Here are some common objections to this proposal in bold, followed by my replies. You should probably read my previous posts before tackling this one; your objections might be thereby addressed, and I might be thinking of things in a way that makes more sense than you think it might.

  • Humans might have silly wants. Let’s make it clear that I’m not talking about what humans think they want: another SUV, more money, a catgirl, whatever. Perhaps parts of them really do want these things, in which case the AI would deliver. But it’s more probable that these are things that are just convenient for fulfilling some other more terminal value of some of the mind’s algorithms, in which case the AI would provide the terminal value. Now, I personally am not too impressed with the average human. But I do not want to steal their volition and give them values that I think are less silly, simply because I’m smarter than them. I mean, I kind of do, and the AI will take into account that I want the world to be light and good and not like Idiocracy. But if I’m trying to not be a dick, then letting people have what they want seems like an okay idea. And even so I’m optimistic about humanity resolving its confusions.
  • What if they really do want to go to the Christian heaven? Then let them go to the Christian heaven! We have resources to spare. And if they get there and realize they’d have more fun somewhere else, well, the AI will keep track of their implicit preferences — even if the Christians would never explicitly announce they were dissatisfied with the paradise they’d been promised. And anyway, it sounded like a trippy place.
  • Humans might want to destroy themselves. I really don’t think this is likely. In the language of Buddhism, all beings have the potential for Enlightenment. The algorithms that have driven the amazing progress in the world over the last many thousands of years are in every properly functioning adult human. We have had suicidal tendencies on both an individual level and a cultural one, this is true. But this is mostly because there is suffering in the world that is unbearable, or because different parts of us disagree about how to stop the suffering. There would be no need to destroy ourselves, if we could have our wishes fulfilled. Having to survive in an evolutionary setting caused humans to acquire fairly robust drives. I’m not sure this generalizes to scenarios where their drives are actually fulfilled for once, but it’s worth noting that humans don’t eat pure salt even though it’s delicious and EEA-nutritious.
  • What if humans enter a hell universe but lose their minds and thus their preference for escaping? I don’t think this will happen, and I don’t want it to happen. In fact, I think the vast majority of people wouldn’t want this to happen to each another. If the AI is giving people what they want, then the AI won’t allow this to happen.
  • Imagine a man who has unknowingly worn a blindfold his entire life, and therefore nowhere in his brain is there a preference for removing this imposed ignorance, even though he would want the blindfold removed if he knew it was there. This is a neat argument, but the premise is horribly unlikely, because humans are curious. A drive for curiosity, to figure out the world, to acquire new information, to become unbiased, to learn; if anywhere in the mind of the human exists the seed of this preference, then the AI will remove the blindfold of the man and of mankind. But human minds may not be ready for such a full enlightenment all at once, and we may want other things as well. The AI will take into account these implicit preferences.
  • If you give people what they want and not what they would want in the natural course of events if you’d run the AI a few years later, aren’t you cutting against the grain of what would have been a naturally occurring reflective consistency? Fair point. I’m not sure that the people a few years earlier versus later are really the same people in the relevant sense, but I think I can steel-man this argument. If those years ended up important, and if those two people at different points of space-time are the same, then it feels like I’m cutting off valuable information from the future. But I think this is an argument against one possible way of implementing the AI, not the idea of giving people what they want. We need not fulfill the preferences of things right-when-we-hit-the-button. We could have a spatiotemporal discount function, or a causal discount function, et cetera, where we include the preferences of what we have been, could have been, could be, and will be. This is similar to the idea of extrapolation except you’re not ‘deciding’ how to extrapolate: you’re just modeling what’s already out there and finding coherence, without making guesses as what things people should know, or trying to determine what counterfactuals ‘should’ be considered instead of which ones suggest themselves. I’m going to include this as part of solving computational axiology but I should note that I don’t think it’s necessary for solving Friendliness the way Eliezer seemed to think about it circa 2008.
  • What about the children? Well, this is a problem with CEV, too — the best way to extrapolate a baby probably isn’t to let it grow up the normal way — but I think it’s a fair point. Luckily, my readings of development psychology and Freudian psychology indicate that babies have almost entirely satiable drives. Why is this lucky? Because then the preference of parents and elders for the babies to grow up in a certain way — hopefully a way that is Light and Good, but if not, it’s probably no worse than the reality where everything happens for pretty much no good reason — will also be satisfied on top of the babies’ preferences. Everyone wins.
  • Wouldn’t erring on the side of caution be to make people a lot smarter before we start giving them what they want? It really depends on how you go about making them smarter. I object to doing so in a way that loses information or causes goal distortion, or causes people to be so unlike the people they were that we’re not even talking about the same people anymore. I think Eliezer’s CEV would probably work if implemented right, but I’m nervous about doing it, and I really don’t think it’s necessary. (Or more accurately, I think it’s a lot less necessary than other people seem to think.) If humans were going along on this really cool vector we can see the traces of in the Age of Enlightenment or in Buddhism or in rationality or in the arts or wherever, and giving people what they want leads us off that path because we weren’t far enough along the path to realize it was a path we wanted, then failing to make people realize the path was there before you give them a crossroads on it is probably a bad idea. I think this is unlikely. I think it is part of the soul of humanity that it has this bootstrapping nature, this Enlightenment, and our desires in aggregate will reflect that. But I am not sure, and so perhaps we will need more light in order to see that we will need more light. This is where I do not as vivaciously object to some extrapolation, though extrapolation in this spirit seems easier, somehow.

Now, this is my attempt at beginning to solve a problem that is kind of like the actual problem I want to solve, but is different in important ways. I’m arguing against CEV in CEV’s terms. But if I was trying to solve the similar dilemma for my own more general problem on my own terms — the problem of computational axiology, that is, understanding and building an algorithm for discovering and maximizing arbitrary value sets — I would definitely not reason in terms of these mysterious things called ‘humans’.

Extrapolation wasn’t well-defined in CEV and there were similar things in CFAI but I’m not sure if Eliezer considers those particular ideas deprecated. Thus it could be that I’m mostly in agreement with Singularity Institute folk when it comes down to what the actual implementation looks like. But at least on matters of philosophy, this is an area where I feel a little more confident than usual in my dissent.


About Will Newsome

Aspiring protagonist. View all posts by Will Newsome

6 responses to “Why extrapolate?

  • Adam Atlas

    How would this type of AI resolve conflicts between different people’s desires?

    You say you’re “not talking about what humans think they want”. How do you propose to get from “think they want” to “really want”? It seems to me that if you are proposing to determine people’s terminal values and act based on those, you’re probably going to end up reinventing something like extrapolated volition (or at least volition); instrumental values are basically the combination of terminal values (or instrumental values higher on the hierarchy) with factual beliefs about what would achive those values, so moving from an instrumental desire closer to a terminal value means breaking it down into whatever higher-level value it flows from and whatever factual belief implies that it’s a good path to achieving that higher-level value; so how is a terminal-value-wish-granter particularly different from “Give people what they’d want if they knew more”? Or is it going to grant people’s wishes according to their terminal values, but only taking into account the beliefs they already held… in which case how is it different from “Give people what they already think they want”?

    • Will Newsome

      Conflict resolution would be a different algorithm, but I wouldn’t want to write that part without better knowledge of game theory. I don’t think extrapolation is necessary for resolving conflicts of preference. (Also note that this isn’t my AI proposal: I have other problems with Friendliness and CEV and I’d rather not work within their conceptual framework at all.)

      “You say you’re “not talking about what humans think they want”. How do you propose to get from “think they want” to “really want”?”

      I don’t think I have to get from one to the other — they’re two rather different cognitive operations. I’m not really interested in even looking at what people think they want, just what they want. The part of the brain that thinks “I want X” isn’t always the part of the brain that wants X. When they overlap, then yeah, we’ll give people what they already think they want (taking into account metawants like wanting to believe true things about what they want, et cetera). And if they want to know more about what they want, which they probably will, then the AI will give them that knowledge.

      Humans have a lot of implicit metapreferences about when they want to know truths of falsities, what they want to acknowledge, et cetera. The AI would fulfill these metawants just as it would fulfill the wants.

      It might be more clear why I think this is possible when I start writing more technically about how I think such an algorithm could possibly be implemented.

  • Will Sawin

    The theoretical case for utilitarianism to deal with conflicts is fairly strong.

    Axiom 1: Your determine individual’s preferences through an expected-utility framework.
    Axiom 2: You determine group preferences through an expected utility framework
    Axiom 3: Pareto, so individual’s preferences are the only thing that matter for group preferences

    I think this gets you that group utility is a weighted sum of individual utility. Now, what weights? I don’t know. Tough one, that.

    Don’t you begin by saying that non-nerds aren’t curious and then use people being curious to defend your system? I don’t think it’s self-contradictory but it certainly isn’t clear.

    • Will Newsome

      “Don’t you begin by saying that non-nerds aren’t curious and then use people being curious to defend your system? I don’t think it’s self-contradictory but it certainly isn’t clear.”

      Right, good point. I do think that even non-nerds are curious in an an absolute sense (which is the important sense). Therefore my earlier thinking was wrong. But luckily since humans in general are curious we can expect them to want to know more and think faster without programming an extrapolation dynamic ourselves that tells the AI how we want people to know more and think faster. I’m not confident of this, of course.

      I think this will be clearer when I look at where the human drives for curiosity, problem-solving, et cetera came from evolutionarily speaking, and try to see if we should expect those genes/memes to be satiable or not even in normal humans. There are probably other lines of attack that escape me at the moment. Hopefully I can think up a way to reason about non-humans for a change, since I keep saying I want to.

      Something like utilitarianism will probably be the best aggregation method, but the problem is that you have different levels of organization with different preferences and I’m not sure how well the concept of an atomic individual is going to fit into an actually implementable method of coherence. Understanding more about different levels of selection in evolution, evolutionary game theory, and decision theory where it subsumes expected utility theory seems like a method of attack.

  • Louie

    The “Possibly related post” for this article is humorously apropos.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: