The Alignment Problem


A more techincal/philosphical read: what follows are notes from the book for reference


3 areas of neural network research: (in order of least to most controlled):

  • Unsupervised learning: A heap of uncategorized data, trained to draw correlations between them
  • Supervised learning: Categorized data, trained to make predictions on new examples
  • Reinforcement learning: Envirnoment of rewards and punishments, trained to maximize rewards and minimize punishments.

How do we create general models that capture the nuiances of human ethics? Aka, the Alignment Problem. Social and civic problems are becomming more techincal, tehnical problems becomming more social and civic. Computer scientists are becomming philosphers.

I. Representation

  • Models will always have biases and connotations as long as the society it reflects continues to have biases.
  • As such, models need to be used descriptively rather than prescriptively. Otherwise, a reinforcement loop perpetuating socital biases will occur (e.g new models train on biased data that older models helped exaggerate).
  • Models offer strikingly accurate and updated snapshots of society. They can (and are) being used in social studies for their quantative data

II. Fairness

  • Fairness through blindness does not work. Making a model "gender blind" or "race blind" only perpeutates biases due to correlated factors. (Redundant encoding)
  • A calibrated model cannot have equal false positive and negative rates across groups when events in said groups occur at differning rates
    • What should fairness be defined as then? This question needs more specific domain to be answered (e.g lending system vs parole prediction system)

"A machine learning-model, trained by data, 'is by definition a tool to predict the future, given that it looks like the past.... That's why it's fundamentally the wrong tool for a lot of domains, where you're trying to design interventions and mechanisms to change the world."

III. Transparency