This might be the most obvious point where my intuitions differ from my fellow researchers’. One day I was thinking about Eliezer’s coherent extrapolated volition proposal (CEV), and it occurred to me that though the coherence part seemed pretty manageable, the bit about extrapolation dynamics promised to be a philosophical nightmare. The idea seemed misguided in the first place. Give people what they wanted if they knew more? What if they don’t want to know more? Nerds are genuinely curious, but normal people, even smart normal people, have no problem with not understanding the universe. Is this a justified violation of peoples’ volitions? But I hadn’t paid attention to this sense of ugliness until I realized that not only would extrapolation feel wrong, it might not even be necessary in the first place. Why is it, again, that we can’t just give people what they want?
Here are some common objections to this proposal in bold, followed by my replies. You should probably read my previous posts before tackling this one; your objections might be thereby addressed, and I might be thinking of things in a way that makes more sense than you think it might.
- Humans might have silly wants. Let’s make it clear that I’m not talking about what humans think they want: another SUV, more money, a catgirl, whatever. Perhaps parts of them really do want these things, in which case the AI would deliver. But it’s more probable that these are things that are just convenient for fulfilling some other more terminal value of some of the mind’s algorithms, in which case the AI would provide the terminal value. Now, I personally am not too impressed with the average human. But I do not want to steal their volition and give them values that I think are less silly, simply because I’m smarter than them. I mean, I kind of do, and the AI will take into account that I want the world to be light and good and not like Idiocracy. But if I’m trying to not be a dick, then letting people have what they want seems like an okay idea. And even so I’m optimistic about humanity resolving its confusions.
- What if they really do want to go to the Christian heaven? Then let them go to the Christian heaven! We have resources to spare. And if they get there and realize they’d have more fun somewhere else, well, the AI will keep track of their implicit preferences — even if the Christians would never explicitly announce they were dissatisfied with the paradise they’d been promised. And anyway, it sounded like a trippy place.
- Humans might want to destroy themselves. I really don’t think this is likely. In the language of Buddhism, all beings have the potential for Enlightenment. The algorithms that have driven the amazing progress in the world over the last many thousands of years are in every properly functioning adult human. We have had suicidal tendencies on both an individual level and a cultural one, this is true. But this is mostly because there is suffering in the world that is unbearable, or because different parts of us disagree about how to stop the suffering. There would be no need to destroy ourselves, if we could have our wishes fulfilled. Having to survive in an evolutionary setting caused humans to acquire fairly robust drives. I’m not sure this generalizes to scenarios where their drives are actually fulfilled for once, but it’s worth noting that humans don’t eat pure salt even though it’s delicious and EEA-nutritious.
- What if humans enter a hell universe but lose their minds and thus their preference for escaping? I don’t think this will happen, and I don’t want it to happen. In fact, I think the vast majority of people wouldn’t want this to happen to each another. If the AI is giving people what they want, then the AI won’t allow this to happen.
- Imagine a man who has unknowingly worn a blindfold his entire life, and therefore nowhere in his brain is there a preference for removing this imposed ignorance, even though he would want the blindfold removed if he knew it was there. This is a neat argument, but the premise is horribly unlikely, because humans are curious. A drive for curiosity, to figure out the world, to acquire new information, to become unbiased, to learn; if anywhere in the mind of the human exists the seed of this preference, then the AI will remove the blindfold of the man and of mankind. But human minds may not be ready for such a full enlightenment all at once, and we may want other things as well. The AI will take into account these implicit preferences.
- If you give people what they want and not what they would want in the natural course of events if you’d run the AI a few years later, aren’t you cutting against the grain of what would have been a naturally occurring reflective consistency? Fair point. I’m not sure that the people a few years earlier versus later are really the same people in the relevant sense, but I think I can steel-man this argument. If those years ended up important, and if those two people at different points of space-time are the same, then it feels like I’m cutting off valuable information from the future. But I think this is an argument against one possible way of implementing the AI, not the idea of giving people what they want. We need not fulfill the preferences of things right-when-we-hit-the-button. We could have a spatiotemporal discount function, or a causal discount function, et cetera, where we include the preferences of what we have been, could have been, could be, and will be. This is similar to the idea of extrapolation except you’re not ‘deciding’ how to extrapolate: you’re just modeling what’s already out there and finding coherence, without making guesses as what things people should know, or trying to determine what counterfactuals ‘should’ be considered instead of which ones suggest themselves. I’m going to include this as part of solving computational axiology but I should note that I don’t think it’s necessary for solving Friendliness the way Eliezer seemed to think about it circa 2008.
- What about the children? Well, this is a problem with CEV, too — the best way to extrapolate a baby probably isn’t to let it grow up the normal way — but I think it’s a fair point. Luckily, my readings of development psychology and Freudian psychology indicate that babies have almost entirely satiable drives. Why is this lucky? Because then the preference of parents and elders for the babies to grow up in a certain way — hopefully a way that is Light and Good, but if not, it’s probably no worse than the reality where everything happens for pretty much no good reason — will also be satisfied on top of the babies’ preferences. Everyone wins.
- Wouldn’t erring on the side of caution be to make people a lot smarter before we start giving them what they want? It really depends on how you go about making them smarter. I object to doing so in a way that loses information or causes goal distortion, or causes people to be so unlike the people they were that we’re not even talking about the same people anymore. I think Eliezer’s CEV would probably work if implemented right, but I’m nervous about doing it, and I really don’t think it’s necessary. (Or more accurately, I think it’s a lot less necessary than other people seem to think.) If humans were going along on this really cool vector we can see the traces of in the Age of Enlightenment or in Buddhism or in rationality or in the arts or wherever, and giving people what they want leads us off that path because we weren’t far enough along the path to realize it was a path we wanted, then failing to make people realize the path was there before you give them a crossroads on it is probably a bad idea. I think this is unlikely. I think it is part of the soul of humanity that it has this bootstrapping nature, this Enlightenment, and our desires in aggregate will reflect that. But I am not sure, and so perhaps we will need more light in order to see that we will need more light. This is where I do not as vivaciously object to some extrapolation, though extrapolation in this spirit seems easier, somehow.
Now, this is my attempt at beginning to solve a problem that is kind of like the actual problem I want to solve, but is different in important ways. I’m arguing against CEV in CEV’s terms. But if I was trying to solve the similar dilemma for my own more general problem on my own terms — the problem of computational axiology, that is, understanding and building an algorithm for discovering and maximizing arbitrary value sets — I would definitely not reason in terms of these mysterious things called ‘humans’.
Extrapolation wasn’t well-defined in CEV and there were similar things in CFAI but I’m not sure if Eliezer considers those particular ideas deprecated. Thus it could be that I’m mostly in agreement with Singularity Institute folk when it comes down to what the actual implementation looks like. But at least on matters of philosophy, this is an area where I feel a little more confident than usual in my dissent.