The IQ Controversy and the Philosophy of Education
In R.S. Cohen et al. (eds.),
Although my two colleagues on this panel are bona fide philosophers who clearly have a bona fide interest in the ongoing controversy over IQ and the educability of children, I nonetheless feel constrained to say a word in defense of our topic as a viable subject for discussion in a forum devoted to the Philosophy of Science. Perhaps this is because, as the representative of the discipline that created this mess, I fear you might associate me with it. You might feel that bad science is a bad subject for study by philosophers of science, and furthermore that anyone associated with such a discipline would be the last person to have anything intelligent to say about philosophy.
While it is
true that we do not a need a philosophy of bad science, a kind of
At issue, from one point of view, are a set of statistics called heritability estimates, which are supposed to tell us about the relative innateness, fixedness, determinedness of a child’s potential intelligence. From another point of view, at issue are a set of social policies referred to as streaming or tracking, which have to do with the allocation of resources to children within schools, and the optimal forms of grouping for education. The reason we have a full-blown controversy is that these two major issues intersect. Or rather they are regarded as intersecting by nearly everyone: psychologists, educators, and the readers of magazines. It is assumed that knowing whether intelligence is innately determined will tell us how children ought to be grouped in school. Sometimes the even more outrageous converse is assumed, that knowing how we wish to organize our schools will tell us what we must believe about the nature and development of intelligence. But let us first look at the issue of heritability of IQ, and return later to the social-political context of the educational issues.
It has been pointed out, by my predecessors on this panel as well as by others, that we have no scientific reason for believing that the heritability of IQ is as high as Cyril Burt, Arthur Jensen, and Hans Eysenck have claimed. I think this is true. The twin studies are inadequately controlled and are biased, etc. On the other hand, I do not think we have any basis for believing that IQ has low heritability. It would be astonishing, given the nature of the IQ test, if IQ were not heritable. The important point to be added to the criticisms you have just heard, without in any way denying their force, is that IQ heritability tells us nothing whatever about the educability of children. Nothing whatever.
Heritability is a measure of the amount of variance in some trait which is accounted for, in a particular environment, by genotypic variance in a population. Heritability depends upon the environment: for a given trait and a given population of genotypes, the heritability will be high in some environments and low in other environments. If the particular environment has a differential effect upon the trait in individual members of the population, heritability will be low. If the environment does not have much differential effect upon individuals with respect to this trait, the trait will be highly heritable. Heritability is just as much a measure of the environment, or the range of environments in which a tested population has developed, as it is of anything else.
Heritability also depends upon the population. A given trait may be found to have different heritabilities for different strains, breeds, races, or gene pools, even under the same range of environmental conditions. It is a measure that applies only within a gene pool, and has nothing to do with explaining between-group variance. An attempt to use heritability estimates in accounting for racial differences in any trait is, put simply, a hoax.
Finally, heritability depends upon the trait. This is a more significant point than merely that some traits are highly heritable while others are not. A trait is simply some measurable characteristic in which individuals differ. What is measured, and the way in which it is measured, determine the heritability of a given population in a given environment.
Indictment of the schools
Each of these three points has been ignored by the majority of disputants on both sides of the IQ controversy. The first point alone should have ruled heritability studies out of court for any decision regarding educability. If IQ and school achievement have high heritability among present-day Americans, this may be considered an indictment of American schools. Our schools, and our society in general, are not making much of a difference in the kind of skills tested by either achievement or IQ tests. Children who go to school know more than those who do not, but the quality of particular schools, and of other supposedly educational experiences in the lives of children, do not make much difference among their test scores (Jencks, 1972; Kaye, 1973a). Both IQ and school achievement can be best predicted by knowing who a child’s parents are. But this tells us nothing about either educability or heritability in another sort of environment. There is every reason to assume that in some other society the quality of schooling would make a difference, and thus heritability would be lower.
As my colleagues have pointed out, the parental variables that predict IQ are by no means all genetic variables. In all of the IQ heritability studies, chiefly by correlation among consanguineous relatives and between adopted-out twins, the common prenatal and similar postnatal environments grossly inflate heritability estimates beyond the true genotypic variance. But even if this were not the case, if we had perfect measures of true heritability, the heritability would be a function of the relative ineffectualness of a given environment.
The second point is that we can estimate (from a sample) heritability only for a particular population or gene pool. This, too, has been largely ignored in the controversy. A number of authors, including Jensen (1972), have been at great pains to argue that there is every reason to assume IQ is equally heritable among blacks as among whites. But that is off the point. Neither the heritability within one gene pool nor within the other gene pool has anything to do with the mean difference in IQ between the two populations. It is perfectly possible, indeed quite common, for a trait to be highly heritable in two or more groups, that is for the within-group variance to be accounted for genetically, while the between-group variance is entirely due to environmental differences. This is true whenever we compare two cornfields that have been irrigated differently. It would also be true if we divided a group of children in half alphabetically, and fed to one half of them the answers to IQ test items. The heritability of IQ in the A-M group would still be the same as that in the N-Z group and would probably be high, yet one group would be one or more standard deviations superior to the other, entirely for experiential reasons.
There is no contradiction between my first and second points. Our society could be, and apparently is, ineffectual at making much of a difference among children within gene pools, within social classes and racial groups, while at the same time it is very effective at discriminating between the major groups.
Third, heritability depends upon the measured trait. We must constantly remind each other that we are talking about IQ, not intelligence. Intelligence can be defined in many ways and measured in many more, and each way of measuring it must be considered a different (though not independent) trait. The trait whose heritability has been so exhaustingly debated is IQ, the score on a certain type of test. Even if we were concerned with a single gene pool and an inexorable environment, we would still have to examine the nature of the IQ tests in order to understand and interpret the heritability of this trait.
Elsewhere I have tried to describe the systematic biases built into IQ tests, their historical development and current dangers (Kaye, 1973b). This has also been dealt with in detail by others (e.g. Cronbach, 1975). There are a few points worth reiterating briefly. Most important is that the tests discriminate between all and only those groups between which the test makers wanted to discriminate. IQ tests do not discriminate between boys and girls because, after girls were found to be consistently superior on the 1916 Stanford-Binet, certain items were eliminated from the test until girls scored no higher than boys. The same sort of fiddling is possible, and has been successfully done, to eliminate racial differences in IQ scores. But the resulting tests were not adopted since they failed to predict school achievement (Eells et al., 1951). The gross differences in achievement between black and white children had to be accounted for by corresponding differences on a supposedly objective test.
Another important way in which the tests were fiddled with was by eliminating items and subtests that contributed to instability of IQs over time. Thus what was originally an hypothesis, that each individual’s growth in ‘mental age’ was a constant function of his chronological age, became self-fulfilling. The test was constructed so that IQs would be relatively constant. This meant eliminating items which tested skills and knowledge upon whose development the environment made a difference. Inevitably this would create a trait with relatively high heritability. If differences in experience made a difference in the acquisition of some skill, within the population on which IQ tests were standardized, that that skill was dropped from the battery of skills included in the test. Thus the genotypic variation within the population was allowed to play a greater role than the environmental variation. However, this stability-increasing and heritability-increasing procedure was only applied within the standardization sample. It did not have the effect of eliminating items for which the experience of most white children prepared them better than did the experience of most black children. In other words, precisely the situation that I said was possible, in hypothetical examples of cornfields or alphabetically grouped children, did indeed prevail in the construction and standardization of IQ tests.
There is a basic difference between tests of the IQ or ‘aptitude’ type and tests of achievement. In the latter type, items are included just because the test makers or some policy-making body regards them as important items of knowledge, whether for driving a car, programming a computer, or going on to a more advanced subject. In the IQ-type tests, on the other hand, items may have no face validity at all; their inclusion on the test is due to the fact that they contribute to total scores that meet the desired criteria of normal distribution and stability. There are similarities between the two types of test; in fact, achievement tests are often nationally standardized so that they acquire some of the worst features of ‘aptitude’ tests. But in principle an achievement test is potentially capable of telling us something about the strengths and weaknesses of individuals in different subjects. IQ scores can never provide such information (though the individual child’s response to the IQ testing situation is often used creatively for a clinical diagnosis).
Effects of the IQ tests on research
has the existence of IQ tests had upon research in education and developmental
psychology? First, it has perpetuated certain myths about the nature of
learning and intelligence. I mentioned the fact that what was initially an
hypothesis, that ‘mental age’ bears a constant relation to chronological age,
quickly became an assumption that could no loner be tested. The tests were
constructed in such a way that most children’s IQs typically were fairly
constant. In fact the notion of ‘mental age’ lost its meaning. Since the 1940s,
IQs have not been quotients at all, but simply scores read out of a table in
the test manual. The table is constructed so as to yield nearly-perfect
distributions at each age. Unfortunately the smoothness of these curves, and of
various forms of stability curves for IQ, tend to be interpreted as proving
that there is something fixed, determined, and innate about human intelligence.
The almost universal misrepresentation and misunderstanding of heritability
compounds the problem. And as perhaps the worst consequence of all, the
entrenched establishment of psychometricians and the
test publishing industry (over 200 million standardized tests per year, or four
per child in the
A second sort of effect has to do with the notion of ‘upping’ IQs. At some point in history, educators bought the myth that if children were educable, it ought to be possible to raise their IQ scores. This would mean, essentially, that IQ tests were nothing other than achievement tests. They do test achievement in the sense that the items require information acquired from experience, and as I have said, a group of children whose experiences have been more closely oriented to the types of knowledge required on the IQ test will indeed ‘achieve’ higher scores than children with different kinds of experience. Nonetheless it remains true that an IQ test is the worst possible kind of test by which to measure achievement, whether one is concerned about the acquisition of important skills or the more general question whether children are educable. IQ tests are explicitly designed to be unresponsive to school experiences of varying quality. Therefore the studies that have successfully boosted the IQ scores of Headstart or other children are all the more remarkable; the vast number of studies that found negligible or only temporary effects on IQ are simply meaningless. Well-meaning educators who accept the assignment of trying to boost IQs are taking on an Augean task. It is a Sisyphean task as well, for boosting the IQs of some children automatically lowers the IQs of an equal number of children as soon as the tests are restandardized.
Finally, there is tracking. IQ tests together with other tests constructed in much the same way are the principal means of dividing schoolchildren into tracks or streams. The justification for this widespread practice (beginning in the first grade in 20% of American school systems and by the eighth grade in 80%; Findley and Bryan, 1970) is that children learn more in homogeneous groups. Yet there is no adequate research indicating that any children learn more in homogeneous groups; there is some evidence that poorly-achieving children do worse; and there are clear indications that self-esteem and motivation suffer in the ‘lower’ streams. The popular belief that tracking defends poorly-achieving children from invidious comparisons with their betters has never received the slightest empirical support. One thing we do know is that segregating high-IQ from low-IQ children makes the prediction of school achievement by IQ a self-fulfilling prophecy. There is even some possibility that this practice aids and abets the stability of IQ scores, thus contributing to the perpetuation of the myth, but the evidence on this is not yet clear (Jencks, 1972).
A number of current educational innovations involve homogeneous grouping of pupils in one way or another. Chief among these is mastery learning, the attractive approach that allocates time and resources where they are most needed to bring the maximum number of pupils to some criterion, instead of moving the whole class along from one unit to the next while individual children fall further and further behind (Bloom, 1974). In grouping of this type, however, the tests used are necessarily achievement tests. Unlike tracking, the mastery tests and mastery groups are done independently for each school subject, so that a child can spend more in arithmetic and less time in reading. And there is constant retesting, rather than an irreversible decision being mad in the ninth, or the fifth, or even the first grade.
These mundane matters seem to have taken us far from the cleanliness of philosophy. Yet there is no clear boundary at which we could have stopped. Our purest conceptions about human intelligence and development are inseparable from the research paradigms and instruments we use. These in turn are inseparable from the applied questions that guide the research, and the policy questions are inseparable from the society’s values and myths, which finally reduce to the purest conceptions about human development. Some of the critics of tracking, IQ testing, and heritability research have suggested that there is something un-American about them, that they conflict with our egalitarian ideals. I personally see these practices and the research as very American, ideologically speaking; as the resolution of two conflicting ideals. We are said to believe, on the one hand, in equal opportunity and self-determination. Yet we clearly believe in the supremacy and inheritance of property. We need a device by which to guarantee the succession of children to their parents’ status and earning power, while making it look as though they are succeeding by merit. That has been effectively accomplished by IQ testing, by tracking, and by the research purporting to investigate intelligence. This is the context in which science exists. Against this background, and yet in the abstract, please: tell us what distinguishes good science from bad.
Bloom, B.: 1974, ‘Time and Learning,’ American Psychologist 29, 682-88.
Cronbach, L.J.: 1975, ‘Five Decades of Public Controversy over Mental Testing,’ American Psychologist 30, 1-14.
Eells, K., Davis, A., Havighurst,
R., Herrick, R., and Cronbach, L.: 1951, Intelligence and Cultural Differences,
Findley, W.G., and
Jencks, C., Smith, M., Acland, H., Bane,
M.J., Cohen, D., Gintis, H., Heyns,
B., and Michelson, S.: 1972, Inequality: A
Reassessment of the Effect of Family and Schooling in
Jensen, A.R.: Genetics and
Education, Harper and Row,
Kaye, K.: 1973a, ‘Some Clarity on Inequality,’ School Review 81, 634-41.
Kaye, K.: 1973b, ‘I.Q.: Conceptual Deterrent to Revolution in Education,’ Elementary School Journal 74, 9-23.