Is indexical experience valued? (And more anti-Friendliness propaganda.)

This is a mysterious question, and so we will be tempted to give mysterious answers. Readers beware.

When I reason about who ‘I’ am in a non-philosophical way, I notice a few things. I’m all in one physical space. I see out of two eyes and act through various muscles according to my intentions. And this feels very natural to me.

And yet when I want to reason about values in a coherent framework, I prefer to think in terms of massively parallel cognitive algorithms and their preferences, rather than the preferences of these bundles of algorithms that seem to have indexical subjective experience. In order for me to figure out if I’m going about this the wrong way, then, I have to ask: why is subjective experience indexical? And is indexicality an important value?

Let’s say that two computer programs happen to be installed on the same computer and can be accessed by a program-examining program or a program-running program or a program-optimizing program. If one of the programs is doing pretty much the same operation on this computer as on some other computer, then we can reason about both programs as if they were the same program. We can talk about what Mathematica does, and what specific parts of Mathematica do. But when we ask what the program-examining program does, then descriptions must become more general. It’s very dependent on which other programs are on the computer, and how often they get run, et cetera. The program-examining program only has information from the computer it’s on, and maybe it has access to the internet, but even then it generally doesn’t get a very in-depth view of the contents of programs on other computers.

Consider human experience. It seems that the most interesting qualia are the qualia of reflecting on cognitive algorithms or making decisions about which algorithms to run, which has a lot to do with consciousness. Instead of coming up with reasons as to why all subjective experience is indexical, we could come up with reasons as to why the subjective experience of the reflection or planning or decision-making algorithms is indexical. And I think there are okay explanations as to why it would be.

Humans aren’t telepathic, mostly. Our minds are connected tenuously by patterned vibrations in the air, seeing the movements of one another, and so on. This is not a high-bandwidth way to communicate information between minds, and especially not complicated information like the qualia of thinking a specific thought. It tends to take awhile to get enough details of another’s experience to recreate it oneself, or sometimes even to recognize what it could be pointing at. Read/write speed is terribly slow. Thus most information processing goes on in single minds, and the qualia of processing information is indexical. But when two minds are processing the exact same thing in the same way, then their experiences are not indexical. Their epiphenomenal spirits could jump back and forth and never know the difference.

What does that mean for the study of value? Humans seem to value their subjective experience above all else. A universe without sentience seems like a very bad outcome. But it’s not clear whether humans value indexical subjective experience, of subjective experience generally. Some of the most intense spiritual experiences I’ve heard of involve being able to feel true empathy for another, or to feel connected to all of the minds in the universe. These experiences have always been considered positive. The algorithms that make up humans thus might not strongly value keeping their subjective experience confined to inputs from one small physical space.

If you look at the brain as a piece of computing hardware for lots of algorithms inside it, then it seems natural to answer the question of what humans value by asking what the things that make up humans value. And if the algorithm running in one mind is the same as the algorithm running on another, then we needn’t look at their computing substrate. But does the fact that the decision algorithms in each mind tend to look at different patterns of cognitive algorithms (even if the decision algorithms themselves are nearly identical) mean that we have to go ahead and look at individual minds specifically to figure out what each decision algorithm wants? What sorts of values do these decision algorithms have? Are they subordinate to the values of the more parallel cognitive algorithms that they run? Do they largely use the same operations for satiating other algorithms in the mind, and if so, are their values not actually indexical, even if they can only satiate indexical drives? If so, do we need to reason about the wants of individual humans at all? I have my intuitions, but we’ll see if they’re justified.

It makes sense to ask these questions for the sake of axiology, but what about it helps with computational axiology specifically? Most proposals to solve the Friendliness problem I’ve heard of involve doing various sophisticated forms of surveying individual humans and seeing what they want, and then resolving conflicts between the humans. I suspect this probably works if you do it right. But I contend that it is difficult because it is the wrong way of going about it. It is difficult to tell a computer program to look at individual humans. Artificial intelligences are programs, and naturally reason in terms of programs. Humans are not programs. Humans run programs. And what humans value is those programs. If we had an AI look at the world to find algorithms, or decision processes, or what have you, it will find the algorithms that run on minds, and ignore whatever pieces of hardware they were running on. This isn’t a bug; it’s the way humans should be reasoning, too.

I’ve said before that I’m not particularly interested in Friendliness. This is because I care about programs, not their computing substrate. And if the same program is running on a human mind as on an iguana mind… what’s the difference?

Advertisements

Why meme-focused axiology?

I’ve spent some time talking about memes already and I plan on talking about them a lot more. Yet the supposed overarching theme of this blog, computational axiology, isn’t obviously connected to memetics. Why are we looking at memes and not partially observable Markov decision process or what have you?

Memetic cognitive algorithms, or what I somewhat sorrowfully misleadingly call memes for short, are a central class of valuing structures. They’re somewhere between genes and temes on the scale of algorithmic physicality versus universality (temes can run on simpler Turing machines, whereas genes often require specific physical laws). Some are satiable and some are arguably insatiable. Some are created by genes (technically genetic cognitive algorithms) and some are capable of being or creating temes. They’re under selection pressures observable to humans human genes and memes over recorded history, over a lifetime, and over a day.

Memes are incredibly powerful. They’re what make humans the most intelligent life yet observed in the universe, and the source of an astonishingly wide array of values. In fact, for nearly everything valued by memes, there are yet more memes that value the exact opposite. Light and dark, good and evil, Blue and Green, Yankees and not-Yankees. They’re somewhat alien: some run on millions of minds at once and tell their host brains to do very strange things, like abstract math. Some are viral, some are parasitic, most are symbiotic. Humans made them and let them into the world, but the humans weren’t very careful in doing so, and now the comparatively stupid genes must tread very cautiously, lest the memes eat their brains or have them killed.

But there is also a more pragmatic reason to study memes than to understand their kind of valuing structures. In order to study computational axiology well, we will need to skillfully wield a wide array of very powerful memes. We should figure out what they want and what their true nature is, if we want to make sure that they’re on our side, and aren’t honey in a fly trap secretly waiting for their turn to turn ourselves against our selves.

It is for these reasons that I find memes very interesting. I was rather convinced by the gene’s-eye-view of evolution expounded upon by Dawkins in The Selfish Gene and The Extended Phenotype. Perhaps this is a large part of why reasoning in terms of genes and memes seems natural to me, whereas reasoning in terms of humans seems messy and obfuscatory. Most summaries of memetics focus on memes like music or language. These don’t interest me as much as memes like science, rationality, or philosophy: the memes that created a lot of what I whatever cognitive algorithms are writing this seem to value, and what seem capable of engineering an even brighter future.

Unless you persuade me otherwise, memetic cognitive algorithms will be a common theme here as we explore what they are, what they want, and what they might do if they get it.


What is computational axiology?

This blog is supposedly dedicated to solving computational axiology, which is a term I made up out of thin air. What is it? How is it different from Friendliness?

I use ‘value’ as a catchall term for things like ethics, aesthetics, drives, inclinations, et cetera. Axiology is the study of value. Normally this is meant to mean what humans value, or what God values, or things like that. I don’t want to be that specific. I don’t know whose values I value yet, or even who I am, and so I want to figure out the values of everything that can value. Knowing what sorts of things are valuable seems useful. Certain values are attractors in mindspace. Others are less probable. For thousands of years we humans have wondered what our purpose in life is, or whether eating animals is morally bad, or which music is objectively good, and other difficult philosophical questions that we weren’t equipped to handle. Nonetheless, a small measure of progress has been made, and though humans in general are still very confused about their desires and the desires of those around them, some of us are lucky enough to feel comfortable in accurately reflecting on what it is we truly want.

Though axiology is an important endeavor for every person, every couple, and every organization, it has never been more important. It appears very possible that humanity will engineer a recursively self-improving artificial intelligence sometime in the next few centuries, and probably sooner than later. With ever-increasing optimization power it becomes more and more important to know what we’re trying to optimize for. Value is fragile and diverse, and this is more true of human values than any other values we’ve seen in the universe. But this is not cause for pessimism. We should be careful not to mindlessly destroy value, but this is also an opportunity for humanity to spread the most glorious values to every corner of the universe, and get everything we could ever want.

The ‘computational’ part of computational axiology is twofold. First, we’re trying to get powerful computer programs to solve axiology — doing axiology in the abacus that is our collective mind is a fool’s move. But secondly, it is to emphasize that I’d rather reason in terms of formal structures and algorithms, and not ultra-high-level concepts like ‘human’ or ‘human-Friendly’. We can begin by looking at our intuitive concepts and figuring out their implications, but eventually we need to start getting formal. The point is to actually win the universe, after all.

The problem of Friendliness is narrower than the problem of axiology, for Friendliness is determining what humanity wants, and axiology is determining what is wanted by anything. Nonetheless there is a lot of overlap, for humans hold many of the values in this universe. I hope that we won’t have to solve what the boundaries for ‘humanity’ are, or difficult and seemingly arbitrary decisions like that. In fact, I think it’s a sign that something’s wrong if our decisions feels even a little arbitrary.  This is potentially where I break with the idea of Friendliness. In the words of Steven Kaas, the good is the enemy of the tolerable. Nonetheless, with so much good on the line I’d like to solve the problem as perfectly as transhumanly possible.

I don’t want to get too involved in certain somewhat tangential super-technical details of what it would look like to implement an algorithm for solving computational axiology. This is mostly because I don’t have them, but it also because the details I do have could be repurposed to fill the universe with values that humanity would object to. But I’m not averse to discussing technical ideas in private. Dangerous ideas shouldn’t be sent over email if possible. I err on the side of caution if I am to err.

Informally, we could describe it as ‘figuring out how to get a computer to figure out what is valuable’. More formally, in a single sentence:

Computational axiology is the study of the foundations of value and of techniques for implementing axiological algorithms in computer systems.


Gene/meme/teme sanity equilibria

Some inspiring ideas like this came from some combination of Michael Vassar and Peter de Blanc.

Why are Scandinavians so rational? I don’t know, but one hypothesis seems rather aesthetically pleasing. Because the Scandinavians had such crazy memes — pillaging, exploring, and just being generally reckless with their lives — there was a selection pressure for people who were genetically predisposed to sanity. All of the people with crazy genes went totally crazy when supplied with crazy memes, and probably died off in battle or when trying to cross the Atlanic or whatever. The memes stayed in equilibrium because they were pretty good memes on the societal level, even if individual Vikings had a high chance of death. Thus the selection pressure was stronger on the memetic side than the genetic side, and because there was such a large amount of selection pressure coming from the constant exploration and raiding, it was enough to drive noticeable increases in genetic predisposition to rationality in the general Scandinavian populace.

Classical reasoning about evolutionary psychology similar to the kind displayed in “The Psychological Foundations of Culture” might point towards the selection pressure not being large enough to cause noticeable difference, but I find “The 10,000 Year Explosion” to be a convincing argument that evolution on such short timescales is possible. (I don’t know what Tooby, Cosmides, or other evolutionary psychologists in general think of this model of human evolution.)

At any rate, whether or not it happened noticeably in the case of Scandinavians, we can see that such equilibria between genetic and memetic sanity could exist. This is pointed at by the theory of ontogenic evolution, better known as the Baldwin effect and the corresponding shielding effect. In this case, however, it is not the ‘natural’ environment that is throwing difficult general reasoning problems at the genes, but partially-humanly engineered memetic conditions.

The reason somewhat stable equilibria are reached is because the Baldwin effect for genes is a shielding effect for memes. Crazy memes select for sane genes, but crazy genes will also select for sane memes. The rate at which the memeplex or geneplex can adapt to the the environment and each other is what determines where an equilibrium is reached. In environments where you don’t as often encounter difficult cognitive problems there will be a stronger selection for irrational memes that satisfy other constraints, like being interesting.

Susan Blackmore introduced the concept of ‘temes’ to talk about technological replicators. They’re like memes that need not run on brains, being more universal. You could say the dangers of Seed AI are the dangers of a teme that is so good at replicating that it eats all the genes and memes for food, in much the same way that the first replicators turned the chemicals of the oceans into themselves, and the same way that memes have taken over human minds for their own various purposes.

Currently, temes aren’t all that impressive — we think our memes are smarter.  But as temes get more intelligent, we might start to see selection pressures for sanity between memes and temes, both of which can evolve at a much faster timescale than genes. Having a really smart robot take care of your life means that you can spend less time following politics or keeping your arithmetic skills sharp. Because both memes and temes evolve so quickly, especially temes, I don’t think that this window of interesting equilibria will be very large as measured by the timescale of our genes. But we should note that our memes are going to have to be very sane, if we are to design temes that aren’t themselves very sane.  Because we can’t rely too much on the sanity of our genes or our memes, we’ll have to spend a lot of effort on getting our temes to be as sane as possible instead. Hence we solve computational axiology as generally and completely as possible, and trust only our very safest memes.


Why extrapolate?

This might be the most obvious point where my intuitions differ from my fellow researchers’. One day I was thinking about Eliezer’s coherent extrapolated volition proposal (CEV), and it occurred to me that though the coherence part seemed pretty manageable, the bit about extrapolation dynamics promised to be a philosophical nightmare. The idea seemed misguided in the first place. Give people what they wanted if they knew more? What if they don’t want to know more? Nerds are genuinely curious, but normal people, even smart normal people, have no problem with not understanding the universe. Is this a justified violation of peoples’ volitions? But I hadn’t paid attention to this sense of ugliness until I realized that not only would extrapolation feel wrong, it might not even be necessary in the first place. Why is it, again, that we can’t just give people what they want?

Here are some common objections to this proposal in bold, followed by my replies. You should probably read my previous posts before tackling this one; your objections might be thereby addressed, and I might be thinking of things in a way that makes more sense than you think it might.

  • Humans might have silly wants. Let’s make it clear that I’m not talking about what humans think they want: another SUV, more money, a catgirl, whatever. Perhaps parts of them really do want these things, in which case the AI would deliver. But it’s more probable that these are things that are just convenient for fulfilling some other more terminal value of some of the mind’s algorithms, in which case the AI would provide the terminal value. Now, I personally am not too impressed with the average human. But I do not want to steal their volition and give them values that I think are less silly, simply because I’m smarter than them. I mean, I kind of do, and the AI will take into account that I want the world to be light and good and not like Idiocracy. But if I’m trying to not be a dick, then letting people have what they want seems like an okay idea. And even so I’m optimistic about humanity resolving its confusions.
  • What if they really do want to go to the Christian heaven? Then let them go to the Christian heaven! We have resources to spare. And if they get there and realize they’d have more fun somewhere else, well, the AI will keep track of their implicit preferences — even if the Christians would never explicitly announce they were dissatisfied with the paradise they’d been promised. And anyway, it sounded like a trippy place.
  • Humans might want to destroy themselves. I really don’t think this is likely. In the language of Buddhism, all beings have the potential for Enlightenment. The algorithms that have driven the amazing progress in the world over the last many thousands of years are in every properly functioning adult human. We have had suicidal tendencies on both an individual level and a cultural one, this is true. But this is mostly because there is suffering in the world that is unbearable, or because different parts of us disagree about how to stop the suffering. There would be no need to destroy ourselves, if we could have our wishes fulfilled. Having to survive in an evolutionary setting caused humans to acquire fairly robust drives. I’m not sure this generalizes to scenarios where their drives are actually fulfilled for once, but it’s worth noting that humans don’t eat pure salt even though it’s delicious and EEA-nutritious.
  • What if humans enter a hell universe but lose their minds and thus their preference for escaping? I don’t think this will happen, and I don’t want it to happen. In fact, I think the vast majority of people wouldn’t want this to happen to each another. If the AI is giving people what they want, then the AI won’t allow this to happen.
  • Imagine a man who has unknowingly worn a blindfold his entire life, and therefore nowhere in his brain is there a preference for removing this imposed ignorance, even though he would want the blindfold removed if he knew it was there. This is a neat argument, but the premise is horribly unlikely, because humans are curious. A drive for curiosity, to figure out the world, to acquire new information, to become unbiased, to learn; if anywhere in the mind of the human exists the seed of this preference, then the AI will remove the blindfold of the man and of mankind. But human minds may not be ready for such a full enlightenment all at once, and we may want other things as well. The AI will take into account these implicit preferences.
  • If you give people what they want and not what they would want in the natural course of events if you’d run the AI a few years later, aren’t you cutting against the grain of what would have been a naturally occurring reflective consistency? Fair point. I’m not sure that the people a few years earlier versus later are really the same people in the relevant sense, but I think I can steel-man this argument. If those years ended up important, and if those two people at different points of space-time are the same, then it feels like I’m cutting off valuable information from the future. But I think this is an argument against one possible way of implementing the AI, not the idea of giving people what they want. We need not fulfill the preferences of things right-when-we-hit-the-button. We could have a spatiotemporal discount function, or a causal discount function, et cetera, where we include the preferences of what we have been, could have been, could be, and will be. This is similar to the idea of extrapolation except you’re not ‘deciding’ how to extrapolate: you’re just modeling what’s already out there and finding coherence, without making guesses as what things people should know, or trying to determine what counterfactuals ‘should’ be considered instead of which ones suggest themselves. I’m going to include this as part of solving computational axiology but I should note that I don’t think it’s necessary for solving Friendliness the way Eliezer seemed to think about it circa 2008.
  • What about the children? Well, this is a problem with CEV, too — the best way to extrapolate a baby probably isn’t to let it grow up the normal way — but I think it’s a fair point. Luckily, my readings of development psychology and Freudian psychology indicate that babies have almost entirely satiable drives. Why is this lucky? Because then the preference of parents and elders for the babies to grow up in a certain way — hopefully a way that is Light and Good, but if not, it’s probably no worse than the reality where everything happens for pretty much no good reason — will also be satisfied on top of the babies’ preferences. Everyone wins.
  • Wouldn’t erring on the side of caution be to make people a lot smarter before we start giving them what they want? It really depends on how you go about making them smarter. I object to doing so in a way that loses information or causes goal distortion, or causes people to be so unlike the people they were that we’re not even talking about the same people anymore. I think Eliezer’s CEV would probably work if implemented right, but I’m nervous about doing it, and I really don’t think it’s necessary. (Or more accurately, I think it’s a lot less necessary than other people seem to think.) If humans were going along on this really cool vector we can see the traces of in the Age of Enlightenment or in Buddhism or in rationality or in the arts or wherever, and giving people what they want leads us off that path because we weren’t far enough along the path to realize it was a path we wanted, then failing to make people realize the path was there before you give them a crossroads on it is probably a bad idea. I think this is unlikely. I think it is part of the soul of humanity that it has this bootstrapping nature, this Enlightenment, and our desires in aggregate will reflect that. But I am not sure, and so perhaps we will need more light in order to see that we will need more light. This is where I do not as vivaciously object to some extrapolation, though extrapolation in this spirit seems easier, somehow.

Now, this is my attempt at beginning to solve a problem that is kind of like the actual problem I want to solve, but is different in important ways. I’m arguing against CEV in CEV’s terms. But if I was trying to solve the similar dilemma for my own more general problem on my own terms — the problem of computational axiology, that is, understanding and building an algorithm for discovering and maximizing arbitrary value sets — I would definitely not reason in terms of these mysterious things called ‘humans’.

Extrapolation wasn’t well-defined in CEV and there were similar things in CFAI but I’m not sure if Eliezer considers those particular ideas deprecated. Thus it could be that I’m mostly in agreement with Singularity Institute folk when it comes down to what the actual implementation looks like. But at least on matters of philosophy, this is an area where I feel a little more confident than usual in my dissent.


Are evolved drives satiable?

Can we expect evolved drives to be satiable at any one instant? If so, which drives are satiable, and which would eat the entire universe if they could?

Thermostats have a narrow domain of preference. When the temperature is at the desired point (as measured by some internal representation), the thermostat is satiated. The thermostat does not usually need many resources at any one point in time to maximally fulfill its goals. Can the same be said to be true of the drives of various evolved life forms? How about human-specific drives?

Maslow created a now famous hierarchy of needs of humans (which I quite like), and claimed that the first four levels of the hierarchy — physiological, safety, love/belonging, and esteem — are all deficit (satiable) needs. The need for what he deemed ‘self-actualization’, though, he said could not be satiated. Was Maslow correct in this description?

  • Physiological needs: It seems correct to say that these homeostatic needs are satiable. Needs for food, water, sex, excretion, breathing, sleep, et cetera, are all satiable, and indeed they must be sated before humans can seriously work on satiating any of their other needs. I don’t see any insatiable physiological needs.
  • Safety needs: These needs include the security of the body, employment, resources (monetary), health, and property. It is less clear that these drives are satiable, especially as ‘resources’ could be taken to include lavish material possessions, for which humans seem to have a large if not unbounded desire. That said, humans seem satisfied beyond a given level of safety in the sense that Maslow intended, as somewhat indicated by the diminishing marginal returns of available spending money on self-evaluated happiness. I will mark this as unclear, though I suspect that Maslow was right to say that these are satiable.
  • Love/belonging needs: Needs for friendship, family, and sexual intimacy. It seems that beyond a point humans become satisfied with having a large but comfortable amount of friends. The same is true of family. Humans may wish they had the capacity to keep track of a large group of friends and family, but their need for friendship is bounded by their cognitive abilities in much the same way that the need for food is bounded by the size of the stomach or the speed of metabolism. It is still a satiable need. Sexual intimacy is less clearly satiable. If you gave humans a button they could continue to press forever, each time doubling their amount of sexual intimacy per moment, it is less clear to me that they would ever stop hitting the button. I have never heard anyone complain of too much (good) intimacy with someone they love. Thus, I am not sure that the need for loving/sexual intimacy is satiable. I do suspect that it is.
  • Esteem needs: Needs for self-esteem, confidence, achievement, and respect. These seem satiable. I have felt that I was at the desired level of confidence or self-esteem in the past, beyond which I wouldn’t have appreciated additional boosts. Achievement is less clear, especially because it does not cleanly decouple from self-actualization needs, which Maslow claims are of a different character. Certainly it is possible to satiate the need for achievement in certain domains — being the best in the narrow domain is one such form of achievement. Beyond a certain point you get diminishing marginal returns. I believe this is what Maslow meant by esteem needs, and I think he is correct to say that they are satiable.
  • Self-actualization needs: Needs for creativity, problem solving, spontaneity, morality, rationality, virtue. As long as there are things humans wish to learn or discover or create, I do not think this need is satiable. I am unsure. Though in the abstract it is easy for some of the memetic algorithms in my mind to say, “Yes, we want infinite compassion, infinite knowledge, infinite whatever-is-right-and-good”, I’m not sure what the rest of my mind thinks of what those memes think, and I’m not sure those memes are reflective enough to know what they want. There are also memes in others’ minds that would quite adamantly state that they desired infinite suffering, death, and all-that-is-evil. Both extremes are somewhat unrelated to what I intuitively think of as ‘self-actualization’, for they are more preferences than needs, and so Maslow did not categorize them. Although I would feel uncomfortable speaking of infinities here, I do think that humans do want a very, very large amount of things that go along with self-actualization at any one instant. The needs would be hard to satiate.

Thus I tentatively agree with Maslow that all needs ‘below’ (evolutionarily older and less cerebral) needs for self-actualization are satiable. I surely wouldn’t bet the universe on it. What Maslow left untouched — preferences that are not needs, aesthetics, the desires of memetic algorithms like egalitarianism or the Christian heaven — I would evaluate as being similar to the class of self-actualization needs. Some are satiable, some are not. Or they are at least not at all easily satiable.

It seems that non-human animals have completely satiable needs. If true, I think this is excellent news. Humans may feel guilty about only satisfying human needs and not the needs of the countless animals that can be found on Earth, to say nothing of counterfactual animals or aliens of the factual or counterfactual variety. If animal needs are satiable we can burn a small amount of the cosmic commons to satisfy their needs, while still spending the vast majority of resources on the insatiable needs that delineate humans and that humans seem to care the most about. It is of course not obvious that we should be so generous, but that problem is deemed the Friendliness problem, and I’d rather solve the general problem of computational axiology for now.

Why should we expect evolved drives to be satiable? We could imagine drives like dynamical processes that were unstable and therefore like a thermostat pulling the temperature towards infinity: perhaps having to split resources with other satiable needs in a larger process that only finitely values a process that itself values something infinitely, but nonetheless desiring infinite resources, and therefore subject to weird decision theoretic or control problems, or competing to an alarming degree for the attention of altruistic superintelligences.

I suspect that more knowledge of the importance and centrality of reinforcement learning to evolved systems would point in the right direction. Many animal behaviors are sphexish: reward is endogenously generated for running a certain subroutine in response to a pattern of stimuli, regardless of its effects on what humans would see as the system’s implied goals. Because the reward generated is limited by the number of times the subroutine is called, and because that is limited by the number of stimuli occur, the sphexish drive is satiable. But are there subroutines that fire off constantly and get positive reinforcement (which is rather distinct from lack of negative reinforcement), entirely in the absence of external stimuli? Are there subroutines which would be run an infinite number of times as quickly as possible, each time being rewarded, if only you would let them? Breathing, for instance, seems like it can be satiated because on the whole each breath is not positive reinforcement, and even if there could be an infinitely long chain of breaths, at each point of the chain there is only a finite amount of breathing that the breathing algorithms desire. Are there embodied drives that want a signal of infinite intensity? I doubt it, but why?

Peter de Blanc pointed out that desires that needed to work well with other desires had to be satiable, but once you had minds smart enough to explicitly model the idea of (and create algorithms for) ‘have as many kids as possible’, the importance of satiable needs (at least at that cognitive level) is less pronounced. This could potentially be understood with game theory and especially evolutionary game theory, or the field of mental accounting and evolutionary mental accounting, if such a field exists.

One obvious observation is that values seem more likely to be insatiable as they become more abstract and more general. These kinds of trends along the vectors of universality or epistemology (which I normally contrast with arbitrarity and confusion) show up a lot in my thinking, so expect to see a lot more of them.

The concept of diminishing marginal returns seems important here. There might be literature on resources with infinitely increasing marginal returns. There might be other ideas from microeconomics that are relevant.

I also suspect that better intuitions about dynamical systems and their stability would yield insight. But I currently don’t have the analogical knowledge. Understanding preference-like attractors in mindspace, like the universal AI drives but perhaps less reliably attractive, would also appear to be useful for this kind of reasoning.

No firm conclusions were reached, but I feel a little easier in continuing to think that most drives are satiable, and that paperclip maximizer AI designs are pretty difficult to engineer. Hopefully doing so is perhaps even more difficult than creating a solid framework for the more general problem of computational axiology. The human drives that humans seem to care most about, at least when waxing philosophical, like freedom, happiness, peace, equality, beauty, knowledge, and other highly abstract attractors, seem to be largely insatiable, or at least not easily satiable. So we’ll probably end up wanting to fill the void between the starts with some pretty interesting utilitronium. At any rate, at some point in future I plan on following up on this line of reasoning, hopefully with more knowledge and sharper tools, and also exploring another similar topic: the stability of evolved drives.


What are humans?

We’re human, so naturally we want to know what humans value. But we don’t know precisely what values are or where they come from, and we know even less of what humans are. What are humans? What’s an answer to the question that will hint at what kinds of things humans could be expected to value?

We know some important things about humans. They evolved and are evolving. They adapted to a tribal environment where social maneuvering was very important for survival. They learned to model their environment and each other. They learned to model themselves, which is probably relevant to this mysterious phenomenon called ‘consciousness’. They are reinforcement learners. They eventually acquired all of the basic AI drives [pdf] to varying degrees, both at an individual level and a tribal/social level. Their general intelligence is borrowed from other special-purpose planning algorithms and the like; nowhere in the human brain is there a general intelligence module. Humans specialize, not often making connections to the meta-level nor decompartmentalizing knowledge between domains.

A human is a bundle of thermostats loosely wired together, pretending to have agency.

Humans are kludgey. Their brains are made up of many algorithms with different purposes and methods, and these algorithms don’t often talk to each other. They compete for resources, normally measured in thinking time. Some are constantly active, like breathing; some are selectively active, like bicycle riding skills; and some are nearly never active, like associations between memories that will never be primed again. Humans are thus often hypocritical. The part of them that wants something and says so might be less powerful than the part of them that doesn’t want it but is less vocal. Humans often say things of their desires that are transparently false while believing them true, for there was a selection pressure for being sincere, and less so for speaking uncomfortable truths.

There are two partially overlapping classes of algorithms within human minds.

The first are what we may call ‘genetic algorithms’: acquired over the course of development in the absence of any contact with humans, the algorithms you’d expect to find in the mind of a man raised by wolves.  Visual processing, imagination, athleticism, gracefulness, perhaps rudimentary language: these are all in-born genetic algorithms for most humans.

The second class is that of ‘memetic algorithms’: the processes and memories humans acquire in the course of interacting with one another and social structures, which the man in the wild could only have thought up if he was luckily creative. Humans were shaped for and then designed by memes, a new type of reasoning that jumped into a universal mind as soon as one popped up in the universe. Some memes are attractors in mindspace: many minds will find similar mathematics, for we believe that mathematics are universal. Economics, egalitarianism, so-called ‘humanism’, even things like art, are all probabilistic attractors for minds in general. Humans boast of discovering theorems; but perhaps it makes just as much sense that the theorems found brains as their computing substrate.

The intersection between these two classes of algorithms is fairly large, for the memes were not invented overnight: they were the result of specific idea generation algorithms in humans that had to be in genes in order to start the bootstrapping process. In Jungian psychology these algorithms are called ‘archetypes’, and make up some of the ‘collective unconscious’ of humankind. Similar things are found in Freudian psychology, which places a greater emphasis on understanding development. The result is that certain similar memes will show up across all of humanity despite not being transmitted between the cultures. Language; storytelling; animism, spirituality, and religion; magical thinking of all kinds; astronomy; dreams: these all pop up in various cultures and lead to the development of more complicated and more potent memes.

Some memetic algorithms are very smart. Science, for instance, is very powerful. Is science smart enough to find humans and enter their brains? Science is of course an attractor in mindspace by its powerful nature, but is it actually powerful enough to actively and ‘willfully’ enter the human universe and human minds when those minds are ripe? Do humans find this weird and implausible simply because they’re humans, not science, and not nearly smart enough to understand the entirety of the algorithm that is science all at once?

It seems not implausible to me that this is the case, and that though individual humans have the illusion of humanity inventing these many universal concepts for humanity to use for its own aims, it also seems that these memetic algorithms that genetic algorithms have discovered have their own agendas, and that human genes are in symbiosis with these memes. Humans could not exist in their current form without bodies, but neither could they exist without these powerful memes of which their minds are only one of millions of parallel processes for computing. This view doesn’t change our anticipations, but it might change the things we might notice to anticipate.

What else are humans? How else should we construct our ontology? Though there are many things to be said, I’ve outlined the direction of my thoughts. At the very least, I remain skeptical of proposals to determine what humans value so long as they don’t bother to define ‘human’. By reducing humans to something sensible like algorithms or processes, I hope we’ll discover and then solve the problems of figuring out what these structures-called-human ultimately want.