Thoughts and directions on Human Compatible (Russell, 2019)

I just finished reading Human Compatible (2019) by Stuart Russell. Unless I have compelling reasons, I usually like to read books that have aged well. Given that I’m quite invested in AI safety, and Stuart Russell is no doubt distilling decades of experience into 300 pages, this felt important enough for me to study.

I would be highly grateful to anybody’s comments, criticisms, discussion, thoughts publicly or in private. Especially references to further readings that refute/challenge my interpretations.

I feel more grounded in my understanding in AI and safety after reading this book. I’d like to highlight some takeaways and also some questions/exercises that I’m now thinking about and exploring. I also have a clearer idea about how many known unknowns there are for me to learn. This is more of a “this is where I’m at, and where I’m going” rather than a book review.


According to Russell, the most important step in reaching human-level AI is learning hierarchy and abstraction, or chunking (p. 89-90).

This refers to an AI being able to induce and over-hypothesize meta or higher level ideas . In some ways all I’m describing is “learning” or “generalization”, but I believe Russell is being more specific in that this is more about understanding parts, wholes, and the ‘emergent properties’ in assembly. A machine, just like a human, has to be able to put pieces together in a certain order and at different resolutions. The way we function, we understand eyes, noses, faces, and people.

  • If artificial intelligence is optimization, it is search in a hypothesis space. Inductive hierarchical abstractions help over “dumb” brute force.
  • Imagine a search for a solution that is a 10 digit binary number between $0$ and $2^{10}$. A program can reason each of the 1024 numbers as a “hypothesis”. But problems are more meaningful than that: you have more to go on. Imagine an over-hypothesis from looking at similar instances of this problem that the last bit of the binary number is always 1. You have now reduced the search space from $2^{10}$ to $2^9$ hypothesis. Now imagine a another over-hypothesis that the hamming weight (sum of the individual ten bits) in your binary number is 8. The search space is reduces to $10 \choose 8$. The ability to intelligently prune or compress a search space requires knowledge at a higher abstraction or chunk than the current hypothesis space. For example, a program here has to understand numbers at a binary bit level, addition, and hamming-weight.
  • This is a major simplification - even if problems we care about were kind enough to be discrete, the space would be $googleplex^{googleplex}$. But problems are usually uncountably continuous, and almost ineffable. Imagine solving an intractable problem like “cure cancer”.
  • Actually, over-hypotheses discretize your space. I view this abstract hierarchical reasoning as perceiving in higher dimensions (dimensions not necessarily orthogonal or linearly independent)
  • Another analogy is object oriented programming (OOP), which offers abstraction to manage complexity and move many layers quicker than machine code.

Further study: I plan to dive into Goodman’s philosophy book on induction [4], Dietterich’s papers on explanation based learning [5] [6], Universal AI [1] [2], complexity theory [3], Hierarchical RL [15].

Deep learning won’t be the sole base of AGI. Russell believes symbolic reasoning is needed and cites other thinkers’ arguments (Appendix D, p. 295).

  • Deep learning opposers seem to minimize DL to “input-to-output mapping” or “perception layers” [7], but also acknowledge it’s importance, especially in advances in computer vision that do have some semblance of hierarchical induction.
  • Russell believes software is the bottleneck (algorithms), not hardware (compute) in the pillars of AI advancement (data, compute, algorithms). I’ve had similar thoughts, but it surprised me that Russell took this stance definitively. While there is heavy emphasis on the semi-conductor industry as a key player in AI competition, Russell makes some strong arguments that extrapolating the laws of physics and even quantum computing are insufficient if problems are solved blindly.
  • Russell is bearish on deep learning. He advocates for symbolic or symbolic hybrid approaches, which is relevant to his research in the 80’s and 90’s.
  • The counter-argument on this is that so far, the limits of deep learning haven’t been reached. In language models, power laws and smooth and consistent (if you can keep adding water, you’ll keep getting predictable results) and there is no limit in sight yet even with 175B parameters (see GPT-3) [8].

Further study: I plan to go more into the literature, but there seems to be somewhat of a consensus that neuro-symbolic deep learning (DL + symbolic reasoning) [9] (Object-oriented deep learning) and causal deep learning [10] (such as deep disentanglement/representation learning as causality) may be the most promising wave. Gary Marcus & work from DeepMind also support this.

For the alignment problem of learning human preferences, Russell believes uncertainty from humans will be passed onto uncertainty in machines

Human preferences and preference change is slippery. Research on rationality has shown how counterintuitive our preferences can be. One solution is learning meta-preferences, or preferences about preference change.

A framework to infer human preferences is inverse reinforcement learning (IRL, p. 191) which learns rewards from behavior as opposed to behavior from reward in RL. I admit I have much to read and learn from this takeaway still.

Further study: I plan to dive more into IRL, model mis-specification [12], and am currently working on studying safe AI proposals from the alignment forum [11] .

Half-baked Questions

Is intelligence universal and orthogonal? Is it just search in a hypothesis space?

  • Universality in computation means that among different models, there is a general model that can superset do what any other model can do. One universal computer to rule them all. Does being more intelligent (IQ, g-factor, whatever intelligence means) mean you can do everything (superset) a less intelligent being can do? What about the other way - I’ve heard the idea that a 100 IQ person can do what a 140 IQ person does, just slower. The assumption is, given infinite time, the subspace of ‘computation’ for what each person can do is the same. But doesn’t the constraint of time mean a 140 IQ person is doing something a 100 IQ person cannot do if both work every waking hour - more?
  • The orthogonality thesis describes your stated goal and intelligence as orthogonal dimensions - therefore intelligence doesn’t perceive things like ‘morality’ [13].
  • Can you define intelligence as search in a hypothesis space? This is only one model of intelligence.

Further study: Dive into Chollet’s Abstract Reasoning Challenge paper [14].

What math/stat/theory/other advances from 20-30 years ago could be the basis of the future?

  • There seems to be a trend where consumer tech is X years behind AI advancements, which might be another 20-30 years behind advances in mathematics and statistics. At any point in time, a new advancement might lack one piece, and in the future that piece gets discovered. Independent study into past advances, and a look at what pieces exist and are missing might be interesting.

Further study: research from AI Impacts, independent study and analysis into papers starting from the 1950’s.

What projects can I implement and study?

Further study:

  • Self-Driving, autonomy, and SLAM
    • Self-driving seems to be the “baby-step” AGI. Going into some depth about the current state seems worthwhile.
  • Classical and “Good Old Fashioned AI”, especially inductive logic.
  • Utilitarian philosophy, the philosophy of science, the philosophy of mathematics
    • AI researchers are philosophers.
  • Decision theory/game theory
  • Semi-conductors, GPU and TPU computing
  • Forecasting and rationality

This is a long list of future directions I intend to learn more about. I’ll check back in with another post in a few weeks on where I sit.


Like this post? Subscribe for more.

Kevin Chow
Kevin Chow
Fledging Computer Scientist
comments powered by Disqus