Can We Distinguish Machine Learning from Human Learning?

  • 2019-10-08 15:37:03
  • Vicki Bier, Paul B. Kantor, Gary Lupyan, Xiaojin Zhu
  • 2


What makes a task relatively more or less difficult for a machine compared toa human? Much AI/ML research has focused on expanding the range of tasks thatmachines can do, with a focus on whether machines can beat humans. Allowing fordifferences in scale, we can seek interesting (anomalous) pairs of tasks T, T'.We define interesting in this way: The "harder to learn" relation is reversedwhen comparing human intelligence (HI) to AI. While humans seems to be able tounderstand problems by formulating rules, ML using neural networks does notrely on constructing rules. We discuss a novel approach where the challenge isto "perform well under rules that have been created by human beings." Wesuggest that this provides a rigorous and precise pathway for understanding thedifference between the two kinds of learning. Specifically, we suggest a largeand extensible class of learning tasks, formulated as learning under rules.With these tasks, both the AI and HI will be studied with rigor and precision.The immediate goal is to find interesting groundtruth rule pairs. In the longterm, the goal will be to understand, in a generalizable way, whatdistinguishes interesting pairs from ordinary pairs, and to define saliencybehind interesting pairs. This may open new ways of thinking about AI, andprovide unexpected insights into human learning.


Quick Read (beta)

Can We Distinguish Machine Learning from Human Learning?

Vicki Bier, Paul B. Kantor, Gary Lupyan, Xiaojin Zhu
University of Wisconsin, Madison
{[email protected], [email protected], [email protected], [email protected]}
October 10, 2019

What makes a task relatively more or less difficult for a machine compared to a human? Much AI/ML research has focused on expanding the range of tasks that machines can do, with a focus on whether machines can beat humans. Allowing for differences in scale, we can seek interesting (anomalous) pairs of tasks T, T’. We define interesting in this way: The “harder to learn” relation is reversed when comparing human intelligence (HI) to AI. While humans seems to be able to understand problems by formulating rules, ML using neural networks does not rely on constructing rules. We discuss a novel approach where the challenge is to “perform well under rules that have been created by human beings.” We suggest that this provides a rigorous and precise pathway for understanding the difference between the two kinds of learning. Specifically, we suggest a large and extensible class of learning tasks, formulated as learning under rules. With these tasks, both the AI and HI will be studied with rigor and precision. The immediate goal is to find interesting groundtruth rule pairs. In the long term, the goal will be to understand, in a generalizable way, what distinguishes interesting pairs from ordinary pairs, and to define saliency behind interesting pairs. This may open new ways of thinking about AI, and provide unexpected insights into human learning.

1 Introduction

There is enormous interest in, and confidence regarding, Machine Learning. The situation is reminiscent of Archimedes’s observation about the power of a lever: “Give me a lever long enough and a fulcrum on which to place it, and I shall move the world” (Figure 1.) Enormous computing power is used to show that, for example, a computer can teach itself to play Go, and become better than human experts (Silver et al. (2017)). Similarly, an algorithm has learned to play computer games, having been informed only whether it has won or lost, and the allowed set of moves (Mnih et al. (2015)).

Figure 1: The claim attributed to Archimedes, NYU (no date)

More generally, there is hope that machine learning can both augment human science and be a true partner in discovery. We propose a path for exploring one aspect of that hope. Specifically, we posit that the progress of science hinges on the discovery or formulation of “rules.” A rule is a compact statement that supports or justifies calculations, or laboratory procedures, which, in turn, can be validated by the world. With this is mind, we have sought an abstract task of rule discovery, which does not improperly advantage either a Machine Learner (ML) or a Human Learner (HL). Specifically, can we identify classes of problems for which humans can learn the rules better than machines, and vice versa?

2 Statement of the Problem

We note that a “fair” study of how human learning aligns, or does not align with machine learning requires first that a task is being posed to humans and machines in a sufficiently comparable way. We also need to deal with the fact that even the best learning programs today may require millions of examples to be able to tell a cat from a dog, while young children seem to learn the difference much more efficiently. Even more to the point here, computers may be presented “training examples” of cats as a set of 2-dimensional images. Humans experience cats as objects moving continuously in time about in 3-dimensional space, and may actively interact with them.

We say that rule A is harder to learn than rule B if on average it takes more training episodes to learn rule A than rule B. Whether the number itself is measured in tens or millions is not the issue. What is interesting, and may provide a pathway to better understanding differences between human and machine learning, are pairs of tasks, let us call them (A,B), such that task A is harder than B for a machine but easier than B for a human, or vice versa. The idea is illustrated in Figure 2. The relation of “interesting pair” is ordinal and does not depend at all on the units of measurement for either scale.

In the example of Figure 2, there are three classes of rules, A,B,C. All of the examples in Class A cross all of the examples in Class B. When this can be found, it will provide a foundation for understanding what distinguishes the rules in those two classes. While a class, such as Class C might have some internal crossings, that will not provide good information about the reasons for the crossing.

Figure 2: Suppose that a number of rules in some family A have been studied, and a number of rules in family B. They are ordered in difficulty for humans along one line, and in difficulty for ML along another line. The two symbols representing the same rule are then joined. If the lines joining the symbols for two rules, A and B, must cross then we say that they form an interesting pair. See text.

There are several operational challenges in making this notion precise. First, for both humans and machines, the difficulty of learning a rule may depend quite strongly on the specific training sequences. The training sequences may be such as to cause one (or both) types of learning to proceed quickly. Or it may make learning difficult (that is, slower) for either, or both types of learners. Thus research must either be able to identify these order effects, or must average over a sufficiently large set of training sequences, for both ML and HL. Since a computer can be told to forget everything it has experienced, this is relatively easy for the ML arm of the research. For the HL arm, by contrast, different participants must be employed for each training sequence, to provide a comparable tabula rasa.

A second concern is intrinsic randomness. Almost all contemporary ML approaches involve some stochasticity. We ought to average over multiple runs. Of course, there are individual human learner differences as well, and we again must average over them. This suggests that such research may require hundreds or more human participants. Fortunately, such studies have become possible and accepted using Mechanical Turk techniques. To decide which type of learner is better at a task, a non-parametric test such as Wilcoxon Rank test can be applied.

There is a third concern. We are interested not only in the relative learnability of specific classes of rules, but what is perhaps even more important, to explore whether transfer from one learning task to another is the same or different for HL and ML. In order to explore this, we will have to develop concrete measures of transferablity. Conceptually, the problem is the same as the one represented in Figure 2. However, in this case the black dots will represent transfer pairs. Thus the symbol A1 would represent the amount of transfer from one specific rule, say R1, to another, R2.

3 Background to the Problem

Every aspect of this task has an enormous and relevant literature. In this brief summary we can point to only a few of the publications that provide a framework for the task posed here.

3.1 Human Learning

Many of the classic domains that were once viewed as pinnacles of human intelligence (chess, logical reasoning) have been conquered by relatively simple algorithms. Conversely, tasks that are easy for humans and other animals — such as flexible locomotion and rapid and robust visual categorization — are all at the cutting edge of modern artificial intelligence research. This is known as Moravec’s paradox.

Consider, for example, that a simple electronic calculator can do arithmetic orders of magnitude faster than a brain made up of a 100 billion neurons. It is not just that a calculator is so much faster than a person. When applying simple algorithms, people make mistakes that computers never make. For example, a large minority of people who are able to correctly define what makes a number even or odd, but nevertheless systematically misclassify numbers like 798 as odd. People who know that a triangle is a three-sided polygon nevertheless claim that equilateral triangles are “better” triangles than scalene triangles, and frequently misclassify the latter type of triangle as not a triangle at all ( Lupyan (2013). )

Cognitive science and neuroscience help us understand what is going on here. What makes computers fast and accurate is their ability to perform simple computations with high precision. Biological computation is in comparison much slower, and — critically — much noisier. This means that long serial computations (even as ‘simple’ as binary addition) cannot be achieved with high precision. Biological neural networks compensate through the use of massively parallel and distributed computation, an observation presciently made by von Neumann more than half a century ago (reprinted as von Neumann (2012)).

This parallel and distributed architecture is ill-suited for carrying out the kinds of computations (arithmetic, logic), that are trivial for electronic circuits. The reason people mis-classify 798 as an odd number is not that they are inattentive or careless. Rather, applying abstract rules — even very simple ones like ‘MOD 2,’ requires representing only the parts of the input that matter — here, whether a number is evenly divisible by 2. This requires projecting the original representation (which contains information about the number, its magnitude, the color of the font that comprises it, its location in space, etc.) to a space with a discrete decision boundary (even/odd). In this process, a number like 798 is closer to the odd/even boundary than a “more even” number like 400; occasionally, 798 ends up on the wrong side, as reported by Lupyan (2015, 2013).

The same similarity-based processing that makes it difficult for people to apply an abstract rule quickly and robustly is ideal for learning similarity-based representations and discovering (even very subtle) covariance structures present in the input (Rogers and McClelland (2004); Rumelhart et al. (1986)). Even six-month-old infants can learn what cats have in common that distinguishes them from dogs (Quinn and Eimas (1996) ). It is not a coincidence that research attempts to build categorization algorithms that approximate human categorization began to succeed only when the underlying architecture moved from rule-based ‘expert’ systems to distributed architectures (such as artificial neural networks) that rely on gradually learning from multiple examples. It is also not a coincidence that we do not use these architectures for doing arithmetic or logic.

Figure 3: Three Bongard Problems

Despite being slow and error-prone when applying even simple rules, people are capable of remarkable acts of abstraction in tasks requiring extracting rules from examples. A classic example of this problem domain are Bongard problems (Bongard (1967)), three of which are shown in Figure 3. In each problem, people are presented with 12 shapes and must formulate a rule that distinguishes the shapes on the left from the shapes on the right. The “rule” formulating the distinction in Figure 3A is that all shapes on the left and none on the right have a narrowing in the middle. Despite the geometric simplicity of the distinction (which can be obtained through computing the polarity of the second derivative), this problem is relatively difficult for people — only 36% of the subjects succeed. The problem shown in Figure 3B is easier — 76% correctly induce that the rule is threes vs. fours. Compared to the previous problem, this one seems to require much more abstraction: the solver needs to abstract over the different instantiations of ‘threeness’: three line segments; a triangle; three polygons; three notches; etc. The easiest of all is the problem shown in Figure 3C, which was solved by over 95% of our participants. The rule — triangles vs. circles — may seem trivial, but is exceedingly difficult to derive if one does not already know about “circles” and “triangles”–the “circle” images that are shown have no features in common with one another. This challenge was first pointed out by Mikhail Bongard, the original creator of the problems (Bongard (1967), see also Linhares (2000)). What enables people to solve this problem so easily is that they come to the task having previously learned a set of higher-level categories (e.g., circle, triangle, three). They are then able to flexibly deploy them as hypotheses in a top-down way (Lupyan and Clark (2015)). Many of these units may have been learned in the course of learning one’s native language (Majid et al. (2018); Lupyan and Clark (2015)). It is not a coincidence that the relative ease of the problems in Figures 3B-C compared to Figure 3A is strongly correlated with knowing words like “three,” “four”, “triangle,” and “circle”. If this analysis is correct, then the key to closing the gap between the human and machine extraction of rules may lie in understanding what units people use when extracting rules from examples, and what makes a unit especially useful.

3.2 Machine Learning

When choosing a representative machine learning paradigm for a study such as is contemplated here, the main consideration is to closely match its human learner counterpart. As such, we propose to study reinforcement learning agents: like humans, they play with possible actions, receive rewards or penalty, and update their policies. In addition, reinforcement learning is well-studied in machine learning and has previously achieved impressive abilities in playing games like Go (Silver et al. (2017)).

To specify the machine learning task, we define the rule game learning problem in terms of a Markov Decision Process M=(S,A,T,R,γ):

  • The state space S represents the game board as well as historical plays, both successful and unsuccessful.

  • The action space A represents possible moves.

  • The state transition probability T specifies how the game is updated to new state s upon playing move a in state s: P(ss,a):=T(s,s,a).

  • The reward function R(s,a) is 1 if the move a is accepted, -1 if not. The cumulative reward therefore penalizes long sequences of wrong moves.

  • γ is a standard discounting factor in reinforcement learning to combine long-term rewards.

It is important to point out a subtlety in evaluation. Since our overall task is “rule learning,” a natural goal might be for the machine to exactly identify the rule. While this is certainly reasonable, it is nonetheless restrictive for several reasons. First, most reinforcement learning agents represent their policies (why they choose an action under a given circumstance, or state) implicitly: as a Q-table, or value function approximation, or a policy neural network. It is difficult to extract the “rule” in a human readable form. Second, even if we could do that, we would then need a quality measure to compare the machine’s rule vs. the ground-truth rule (this applies to human learners, too). This can be difficult to do. Finally, it is possible that the human learners say one thing (their purported rule) but do another (playing the game in a way that does not match their purported rule), which creates complications in comparing to machines.

Therefore, rather than attempting to extract rule knowledge from the machine, we will measure machine performance by how well machines can learn and play games with different rules. More precisely, we will consider discounted reward, average per-round success rate, and speed at which these measures asymptote. Note that these measures can be equally applied to human learners, which we plan to do (except that for human learners, we will also ask them to state their purported learned rule at the end). Eventually, the relative difficulty of two rules to a reinforcement learning agent can be measured by the agent’s learning curves for the two rules: what levels of performance the curves reach, and how fast they get there.

3.3 Translation to Real Problems

To understand the importance of rule learning in applications of artificial intelligence, it is useful to present a bit of history. The earliest applications of artificial intelligence were typically rule-based—e.g., for medical diagnosis, with a very early example being Shortliffe et al. (1975) . For a more nearly modern view see Lim et al. (1993)), or electronic checklists to support automated diagnosis of equipment problems (Fung (1989))—meaning that the rules were explicitly programmed by humans, rather than being learned endogenously by machines. These applications, while useful at improving efficiency and reducing tedium, could never significantly outperform the best human experts. They could allow novices to achieve near-expert performance, and improve the consistency and accuracy of human experts, but since they were dependent on “handcrafted knowledge,” they were inherently limited by human capabilities. In other words, rule-based AI could be be “faster and less error-prone,” and have “a higher degree of precision,” but represent at best an incremental improvement over human capability. More recently, there has been an explosion of AI capabilities, due to the adoption of machine learning and statistical pattern recognition. This has resulted in truly spectacular achievements, such as the development of a computer program (Alpha Go) that has far outstripped the world’s best Go master in a remarkably short period of time. Thus, it would appear that we are now in an era where artificial intelligence has surpassed human intelligence, although the “rules learned” by machines are embedded in a truly opaque forest of linking parameters.

However, artificial intelligence still has some dramatic limitations. In particular, it works best in highly constrained environments (e.g., games such as Go or chess), where it is clear which types of “moves” or rules are permitted (even if the software needs to learn the rules on its own by observation (Mnih et al. (2015)). In less structured environments, it often performs poorly (or at least non-intuitively), making “mistakes” or misinterpretations that in some cases would have been obviously (or hilariously) wrong to even the most naive human subject; see for example Goodfellow et al. (2017), Krakovna (2018).

As a result, many applications of AI are still quite “small.” Even though deep neural networks are now capable of recognizing objects in a complex visual field, they are still of limited reliability in the real world. Thus, for example, in development of software for identifying skin cancer, “if an image had a ruler in it, the algorithm was more likely to call a tumor malignant” (Patel (2017)). Visual recognition can automate the review of vast quantities of visual data; even if the process is error-prone, it can still be useful by flagging potential targets for human review.

However, the task is limited by the binary (or near-binary) nature of the response variable. AI is typically not used to recognize every item in a complex visual field, only to flag those that meet specified criteria. When the classification process is more open-ended, image recognition can still yield surprising errors. In some cases, these errors are not too bad, and might also be made by humans (e.g., incorrectly classifying a comforter as a pillow, or a dog as a cat or wolf). In other cases, however, the errors are more serious; for example, erroneously classifying a turtle as a rifle (or the reverse) could have significant adverse consequences (Molnar (2018), Athalye and Sutskever (2017)). Moreover, even for a single item, it is not difficult to fool an algorithm; an interactive example is given by Papernot and Frosst (2019).

Therefore, humans are still needed for higher-level tasks—e.g., making decisions of what to do about objects after they have been recognized by machine learning. This is especially true in situations where the decisions have high stakes (e.g., deciding on a medical treatment, rather than a chat bot deciding on which product to recommend to a customer). Thus, suitability for machine learning (Brynjolfsson et al. (2018)) is judged to be low if a task requires “complex, abstract reasoning,” while computers are more suitable for routine repetitive tasks, where efficiency is prized and the cost of errors may be low. Machine learning can also be vulnerable to adversarial attacks such as fraud (Levin et al. (2019)).

The proposed research—identifying which types of rules (or changes in rules) are more easily learned by humans, and which are more easily learned by computers—could pave the way for more complete human-assisted AI (or AI-assisted human decision making), in which computers can take over more complex functions, but in a gradual manner, consistent with a thorough understanding of their capabilities. As stated by Polson and Scott (2018), AI can yield “different and better jobs, new conveniences, freedom from drudgery, safer workplaces, better health care, fewer language barriers, new tools for learning and decision-making that will help us all be smarter, better people.”

4 Game as a Learning Task

As an illustration, one might consider a learning task in which colored blocks are placed in any one of L=20 positions along a line, as in Figure 4. A move is to take a block and place it in a bucket at one end of the line or the other. Although there are 20 places in the lines, on any given episode of play, somewhere between five and ten colored blocks are placed randomly in some of the positions, with no more than one block in a position, as in Figure 5.

Figure 4: An example display might have some number, in this case, 20, of positions along a line. There are “buckets” at either end of the line. A “rule” specifies the order in which objects are to be moved from the display. It further specifies, when an object may be moved, the bucket into which the object, at that move, is to be placed.
Figure 5: This is an example initial configuration in which colored objects are placed in some of the positions. The initial configuration for each episode of play is to be generated randomly.

One player, Alice, formulates a rule (see some examples in Exhibit 1) and observes while a second player, Bob, tries to play the game. An episode ends in success when all of the blocks have been removed in accordance with the rule, and each has been placed in a bucket allowed by the rule.

To be concrete, some possible rules are shown in Exhibit 1.

Exhibit 1 1. Remove items from to left to right, placing each object in any bucket. 2. Remove items from left to right placing each in the nearest bucket. 3. Remove all blue blocks, into the left bucket, and all red blocks into the right bucket, and all other blocks can go in either bucket 4. Remove blocks, from the outside in, starting at the left end. Place each in the farthest bucket 5. Place any block in any bucket except that if there is a red block in the seventh position, reading from the left, it must be the third item removed, and must go into the right bucket. 6. Place a first block in either bucket, and thereafter remove blocks in any order, placing them alternately into left and right buckets

5 How Many Rules are There?

The size of the rule space is enormous. For example, the rule “remove objects, from left to right, and place them all on the left” is one of L! possible orders. In addition, if there are C colors, and allowing for interaction between position and color there are 2CL possible rules, for each order. So, for this particular instance, with, say, only three colors [our example actually has four], C=3, and L=20 there are 20!260=2.8×1036 possible rules.

In theory, this should be an advantage for machine learning, since a computer could search a much larger fraction of the space of possible rules than any human could. In practice, however, humans may have good intuition for certain types of rules, if based on preexisting concepts (e.g., from natural language).

6 What is a Simple Rule?

Because the problem that we pose (about rule learning) could easily be said to encompass all of knowledge, both human and otherwise, it may be helpful to compress the space of examples to be considered. This may necessarily be somewhat arbitrary. There are several ways to limit the space of possible rules. One is to somehow limit the expressive power of the language used to encode the rules. If that is the approach, there are still many design decisions to be made. Since a rule will not be “interesting” unless it can be created and learned by humans, the gold standard for these decisions is going to be what is learned from human learners participating in the proposed “game” (see discussion in Section 4 above).

For example, one could say that the rule should be “not too big,” and one way to measure size could be by the number of bits in the rule, together with the number of bits in the code book (Rissanen (1989)). However, it is easy to create examples (as we illustrate below) that can be described in a relatively small string, but require enormous exploration to be determined.

The situation becomes even more difficult if we replace the notion of “learning the rule” with some variant of being Probabilistically Approximately Correct. In this case, rules that encompass rare exceptions may be considered well learned even by learners who do not learn those exceptions. In addition, constraints might be proposed about what kinds of information about the display, and about previous moves, can enter into the rules to be considered.

A rule will be an unambiguous statement that makes it possible, at any point during an episode, to determine whether a block can be moved at that point, and whether it has been moved to the correct bucket. One way of limiting the set of allowed rules is in the way that the rule for what is “allowed” not be permitted to depend on unsuccessful prior attempts. This would eliminate rules such as “you must try two moves thatare not allowed, before you will be permitted to make a move that is allowed.

Even when some types of rules are excluded, there are still many possible rules, as discussed in Section 5. As an extreme example, the rule could contain a precise description of one possible starting position, with blocks of colors c1,c2,,ck occupying positions p1,p2,,pk; let us call this configuration 𝐃𝟏. A rule, let us call it 𝐑𝟏, might say “if 𝐃𝟏 is presented, place everything on the right; otherwise, place everything anywhere.” While this rule appears simple, discovering it would require an enormous search. Even if given the hint that “there is only one starting configuration for which you are not free to do whatever you like,” the expected time to discover this particular configuration would be half the number of possible configurations.11 1 With bad luck, one might even accidentally do the allowed thing for this configuration, and would find that out only after trying another solution for every configuration, until one is told that a move is not allowed. Thus it seems reasonable to eliminate rules with such a strong dependence on the initial configuration. For this particular game, with K positions filled, there are 2K(LK) possible initial configurations. If there were two special configurations, the “learning process” would take 50% longer, in expectation.

Of course, rules of that form could also be disallowed.

Another promising approach is to develop a specific language for the rules, and then place a limit on the number of terms from that language that can be present in an allowed rule. For example, one could permit a rule to use information about both position and color, provided that the rule can be expressed in a “limited number of bytes.” Of course, such a constraint would depend not only on the rule itself, but also on the cleverness of the team specifying the code book and the notation, so it may be difficult to establish whether a given rule “can be expressed in less than N bytes.”

A second issue is to limit the kind of “scratch tape” or “auxiliary registers” that may be used in the process of learning a rule; for example, whether the machine-learning algorithm is allowed to track only the history of successful moves in a given episode, or also any unsuccessful move attempts.22 2 Of course, both human and machine learners will have to remember previous episodes, in order to find a rule. Here, the comparison between machine and human learners is complicated by the fact that humans will remember (some fraction of) both successful and unsuccessful moves, but may remember them imperfectly, or even erroneously.

Any particular framework for machine learning, along the lines of the above, will of course limit the aspects of history that can be used in rule learning. For example, the present move for an object of a given color could be restricted to depend only on the most recent correct move of an object having the same color. Thus, “blues must be dropped alternately left and right, when they are dropped” would be allowed under this type of rule, but “reds must be dropped cycling around and counting the Fibonacci numbers for the positions (modulo 4)” would not be allowed.

With regard to the treatment of past history in rule learning, there are a few key observations. First, one may find that human subjects are capable of both generating and learning rules requiring a more complex treatment of history than we might initially assume. Whether this happens naturally, or can be elicited with suitable instruction, is an open question.

Second, we do not yet know how such a study will apply to real-world rule induction situations. Such rules might apply to tactical issues, such as diagnosing the problem with a portable generator. On the other hand, rules may also be sought for strategic issues, such as identifying methods used by an adversary such as a fraudster. One must, for this kind of translation of research results, explore what kinds of rules have been proposed in the literature on these issues. While the rules will almost surely be related to the adversary’s historical behavior, they will probably not contain complex mathematical concepts.

Finally, scientific discovery is also a kind of “rule-learning,” where the rules are the Laws of Nature. For example, the historian of science Peter Galison (Galison et al. (1997)) has given primacy not to theories (as in the work of Thomas Kuhn (Kuhn (1962))33 3 Note that at least one of the present authors holds that “Kuhn is not a Kuhnian”, as explained in the post-script to the second version of his influential book Kuhn (1970), but to experiment and observation.

It is well known that for physics, at least, a first order model (with respect to time) is clearly not adequate. For example, the distance that an object falls depends quadratically on the time. If we imagine steps in the game as a time variable, such a quadratic dependence can not be imposed if the rule must depend only on the most recent previous event. For gravity, the new increment, in this case, distance, is not given by a static rule, but must change after each new increment.

There are examples of physical rules that involve only the relation between a velocity and the current state of the system (and not on some “wall clock,” see for example Carter (2003)). As these examples suggest, any restrictions on the types of rules that are allowed in research on rule learning will somehow limit the types of situations to which the findings of that learning can be applied.

7 Discussion

What might be gained by the investigations sketched here? There are potential implications for both psychology and computer science. If one can find a “comprehensible” (to humans) distinction between rule pairs that are “interesting” and those that are “uninteresting,” that will suggest new lines of research.

  • For psychology: can we train people to do better on the classes which are, by comparison, ML-easy and HL-hard?

  • For computer science: can one extend learning methods to make some of the classes that are ML-hard become “easier?”

  • For application of this research to real-world problems, this research may lead to better harmonization of human and machine capabilities to jointly solve complex problems in a manner consistent with their capabilities.

In particular, expanding on the third point, deeper understanding of the differences between human and machine learning might make it possible to “triage” problems that lack known rules of procedure, and direct such problems to humans or machines, according to which has a better chance of inferring or inducing the correct rule in time for the solution to be useful. While the successes of Machine Learning are impressive, their consumption of time and energy is a significant factor in potential application (García-Martín et al. (2019)). Machine Learning has shown substantial advances on problems for which deterministic “oracles” exist (such as video games or board games). In these cases the learner is told some part of the rules (such as what moves are allowed), and the remainder is provided by an oracle. Problems of image classification appear deterministic to the machine learner, but of course the human labeling of “ground truth” almost certainly contains errors.

The field of generalized language understanding remains very challenging, and the largest ongoing project (Lenat (1995)), seemed to pursue an ever-retreating horizon. The more recent refocus on specific (multiple choice) tasks, seems to promise a path for solution to (at present) eighth grade New York State Regents examination in science. However, this success is apparently limited to chains of reasoning about synonyms and relations, and cannot (yet) deal with information presented in visual diagrams (Boyle (2019)).

The line of research proposed here would concentrate specifically on the most visible difference between the way that humans seem to “understand” and the way in which machines do. Humans, in both everyday and scientific problems, reduce complex realities to set of powerful and concise rules. In the scientific realm, the rules are often mathematical. In the more human realm they may be folkloric, as in “a stitch in time saves nine,” or “there is more than one way to skin a cat.” Skilled craftspeople know many such rules, and solve novel problems every day, by rethinking what they know, and formulating a useful (if temporary) rule for the situation at hand.


  • Athalye and Sutskever [2017] Anish Athalye and Ilya Sutskever. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397, 2017.
  • Bongard [1967] M. M. Bongard. Pattern Recognition. Hayden Book Co., Spartan Books., Rochelle Park, NJ., 1967. ISBN 978-0-8104-9165-6.
  • Boyle [2019] Alan Boyle. Allen Institute’s Aristo AI system finally passes an eighth-grade science test, September 2019. URL
  • Brynjolfsson et al. [2018] Erik Brynjolfsson, Tom Mitchell, and Daniel Rock. What can machines learn, and what does it mean for occupations and the economy? AEA Papers and Proceedings, 108:43–47, 2018.
  • Carter [2003] W. Craig Carter. Ordinary differential equations from physical models, 2003. URL
  • Fung [1989] Francis Cheong Yiu Fung. Framework for building rule-based machine diagnostic expert systems. Knowledge-Based Systems, 2(4):228–238, 1989.
  • Galison et al. [1997] Peter Galison et al. Image and logic: A material culture of microphysics. University of Chicago Press, 1997.
  • García-Martín et al. [2019] Eva García-Martín, Crefeda Faviola Rodrigues, Graham Riley, and Håkan Grahn. Estimation of energy consumption in machine learning. Journal of Parallel and Distributed Computing, 134:75–88, December 2019. ISSN 0743-7315. doi: 10.1016/j.jpdc.2019.07.007. URL
  • Goodfellow et al. [2017] Ian Goodfellow, Nicolas Papernot, Sandy Huang, Rocky Duan, Pieter Abbeel, and Jack Clark. Attacking Machine Learning with Adversarial Examples, February 2017. URL
  • Krakovna [2018] Victoria Krakovna. Specification gaming examples in AI - master list : Sheet1, April 2018. URL
  • Kuhn [1962] Thomas S Kuhn. The structure of scientific revolutions. University of Chicago Press, 1962.
  • Kuhn [1970] Thomas S. Kuhn. The Structure of Scientific Revolutions: Second Edition, Enlarged. In Otto Neurath, editor, International Encyclopedia of Unified Science, volume IIn2, pages xii+174. University of Chicago Press, 2nd, enlarged edition, 1970. URL
  • Lenat [1995] Douglas B Lenat. Cyc: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33–38, 1995.
  • Levin et al. [2019] Owen Levin, Zihang Meng, Vikas Singh, and Xiaojin Zhu. Fooling computer vision into inferring the wrong body mass index. arXiv, 2019. URL
  • Lim et al. [1993] I Lim, R.K. Walkup, and M.W. Vannier. Rule based artificial intelligence expert system for determination of upper extremity impairment rating. Computer methods and programs in bio-medicine, 39(3-4):203–211, 1993.
  • Linhares [2000] Alexandre Linhares. A glimpse at the metaphysics of Bongard problems. Artificial Intelligence, 121(1):251–270, 2000. URL
  • Lupyan [2013] G. Lupyan. The difficulties of executing simple algorithms: Why brains make mistakes computers don’t. Cognition, 129(3):615–636, December 2013. ISSN 0010-0277. doi: 10.1016/j.cognition.2013.08.015. URL
  • Lupyan [2015] G. Lupyan. The paradox of the universal triangle: Concepts, language, and prototypes. Quarterly Journal of Experimental Psychology, 2015. doi: 10.1080/17470218.2015.1130730.
  • Lupyan and Clark [2015] G. Lupyan and A. Clark. Words and the World: Predictive coding and the language-perception-cognition interface. Current Directions in Psychological Science, 24(4):279–284, 2015. doi: 10.1177/0963721415570732.
  • Majid et al. [2018] A. Majid, Seán G. Roberts, Ludy Cilissen, Karen Emmorey, Brenda Nicodemus, Lucinda O’Grady, Bencie Woll, Barbara LeLan, Hilário de Sousa, Brian L. Cansler, Shakila Shayan, Connie de Vos, Gunter Senft, N. J. Enfield, Rogayah A. Razak, Sebastian Fedden, Sylvia Tufvesson, Mark Dingemanse, Ozge Ozturk, Penelope Brown, Clair Hill, Olivier Le Guen, Vincent Hirtzel, Rik van Gijn, Mark A. Sicoli, and Stephen C. Levinson. Differential coding of perception in the world’s languages. Proceedings of the National Academy of Sciences, 115(45):11369–11376, November 2018. ISSN 0027-8424, 1091-6490. doi: 10.1073/pnas.1720419115. URL
  • Mnih et al. [2015] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518:229–533, 2015.
  • Molnar [2018] Christoph Molnar. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2018. URL
  • NYU [no date] NYU. Quotations about Archimedes’ Lever, no date. URL
  • Papernot and Frosst [2019] Nicolas Papernot and Nicholas Frosst. How to know when machine learning does not know, May 2019. URL
  • Patel [2017] Neel V. Patel. Why doctors aren’t afraid of better, more efficient ai diagnosing cancer. Daily Beast, 2017. URL
  • Polson and Scott [2018] Nick Polson and James Scott. AIQ: How People and Machines Are Smarter Together. St. Martin’s Publishing Group, May 2018. ISBN 978-1-250-18215-9. Google-Books-ID: 35NUDwAAQBAJ.
  • Quinn and Eimas [1996] P.C. Quinn and Peter D. Eimas. Perceptual Cues That Permit Categorical Differentiation of Animal Species by Infants. Journal of Experimental Child Psychology, 63(1):189–211, October 1996. doi: 10.1006/jecp.1996.0047. URL
  • Rissanen [1989] Jorma Rissanen. Stochastic complexity in statistical inquiry. World Scientific, 1989.
  • Rogers and McClelland [2004] T.T. Rogers and J.L. McClelland. Semantic Cognition: A Parallel Distributed Processing Approach. Bradford Book, Cambridge, MA, 2004.
  • Rumelhart et al. [1986] D.E. Rumelhart, J.L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volumes 1 and 2. MIT Press, Cambridge, MA, 1986.
  • Shortliffe et al. [1975] Edward H Shortliffe, Randall Davis, Stanton G Axline, Bruce G Buchanan, C Cordell Green, and Stanley N Cohen. Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the mycin system. Computers and biomedical research, 8(4):303–320, 1975.
  • Silver et al. [2017] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
  • von Neumann [2012] John von Neumann. The Computer and the Brain. Yale University Press, New Haven, Conn. ; London, 3 edition edition, August 2012. ISBN 978-0-300-18111-1.