Theorising Social Work Research
What works as evidence for practice? The methodological repertoire in an applied discipline 27th April 2000 Cardiff
The limits of positivism revisited Professor David Smith Department of Applied Social Science, Lancaster University
In 1987 the British Journal of Social Work printed an article of mine called 'The limits of positivism in social work research'. The article was mainly taken up with a critique of the work of Brian Sheldon, as the leading advocate over the previous ten years of what would now be called 'evidence-based practice'. It argued that Sheldon's traditional version of positivism, and his rejection of other research approaches, were epistemologically and methodologically limited and limiting, since if we were to take Sheldon's advice a number of other useful research approaches would be lost to the social work community, whether of practitioners or of researchers. It also suggested that Sheldon was wrong to argue that social workers were unique among comparable professional groups in neglecting the evidence of evaluative research, since much of the evaluation literature would be incomprehensible if it were the case that teachers, for example, attended as a matter of course to evaluations of educational practices and modified their own practice accordingly. The article argued for attention to processes as well as to outcomes, on the grounds that measuring and counting outcomes was of little use unless one knew what had produced them (a rather naïve version of the 'realistic evaluation' more recently advocated by Ray Pawson and Nick Tilley), and it was probably this part of the argument that received most attention and gave the article whatever influence it had. Now that positivist outcome-oriented evaluation has made a dramatic reappearance in the guise of advocacy of 'evidence-based practice', and, in the field of criminal justice social work at least, in managerial and political demands that practice should be based on 'what works', it may be useful to try to take the arguments of the 1987 paper on a stage, and take a critical look at just what it would mean at the start of the 21st century to take seriously the expectation that practice should be evidence-based (and that there is something wrong - and in need of managerial correction - with any practice that cannot demonstrate that it has this quality).
On the face of it, it is very hard to argue with the proposition that practice in social work should be 'evidence-based'. The same demand has recently been stressed in relation to medicine, and most of us are likely to find that reassuring. What else could practice be based on? Intuition, gut conviction, habit, whim, obsession, mania? But in the language of politicians and of many social work - and, in England and Wales, probation - managers, the demand that practice should be based on evidence reveals an over-simplified and over-certain view of what evidence does or might consist of, and of how it should be interpreted and used. In trying to justify this claim and suggest a more nuanced, more modest, but also more helpful approach to evidence in the field of social work I shall move from the general to the particular, arguing first that the demand for evidence-based practice often rests on a misconception about the nature of the social sciences, and then drawing on bits of personal experience and the work of other evaluation researchers to support the argument that knowing what counts as evidence, what it is evidence of, and how we should use it rationally is more complicated and also more interesting and creative than managers and politicians would like to believe.
One way of beginning to look at the question of the nature of social science knowledge is to note that social work seems recently - and perhaps for the first time - to have embraced a most uncritical version of positivism just as the most closely related academic disciplines are trying to get away from it. For these purposes, positivism means the assumption that social science should proceed on the model of the natural sciences, and that the more it resembles them the better - more rigorous, more valid, more useful and so on - it will be. It is intelligible that social work should suddenly embrace evidence as a source of practice, because there is truth in the charge that it has done without it for too long (not that this makes it unique among the helping professions), and it is now being told that its very existence is at risk if it does not mend its ways. But there is no need for the social work professional community to adopt a view of evidence which encourages exactly the misconceptions about what it means and how to use it which are dear to the hearts of bureaucrats and politicians.
The philosopher Alasdair MacIntyre puts the matter thus: 'What managerial expertise requires for its vindication is a justified conception of social science as providing a stock of law-like generalisations with strong predictive power' (After Virtue, p. 90). This is, according to MacIntyre, exactly the conventional image of social science over the past 200 years - the period of positivist domination. But MacIntyre argues that this is to misunderstand the nature of the social sciences and the kinds of generalisation they can produce. In practice, theories can survive in the social sciences alongside plenty of instances in which their predictions fail to be confirmed, and still be found useful, which is not the case in the natural sciences. An example given by MacIntyre is Oscar Newman's theory of defensible space. Based on impressive and extensive research, this predicts, among other things, that crime rates will rise with the height of buildings up to a height of 13 storeys, and then level off. This is a risky prediction, and positivist criminologists were not slow to test it, find disconfirming cases, and claim that the theory was wrong. But it has survived, and now routinely informs decisions in architecture and town planning.
This suggests that the logic of theory in the social science is different from that of theory in the natural sciences, and, according to MacIntyre, this is inevitable because the social world is ineradicably unpredictable; Machiavelli knew this, and called the element of chance, of unpredictability, fortuna. Those positivist social scientists, like their bureaucratic counterparts, who want to remove all sources of uncertainty are yearning for a God-like omniscience (God knows everything that will ever have happened) - and of course they keep failing (MacIntyre's main examples are economics and demography). Empirical social science, which is just as old as the natural sciences - the Greeks did both - relies on induction from research to produce its generalisations, and these take the form not of universal laws but of statements which begin with something like 'Characteristically and for the most part...', not 'If x is the case then, given that certain conditions hold, y will always follow'. This is so because social science generalisations are rooted in the form of human life, and the practice of social science reveals that its ancestry and tradition are different from those of the natural sciences. Positivists - or some of them - forget this when they aspire to total control of all that is unpredictable; and the same is true of some managers and politicians.
One source of unpredictability for social work is the context, changing with time and space, in which it is practised. This is crucial for the main alternative to the positivist tradition - evaluation within the philosophical tradition of realism. Realist evaluators (like Pawson and Tilley) are right to stress that context (and not outcomes alone) is crucial in the evaluation of any social programme. So are the 'mechanisms' which generate change - the choices and capacities which are made available to participants - and their operation is always contingent on context: 'subjects will only act upon the resources and choices offered by a program if they are in conducive settings' (Realistic Evaluation, p. 216). Understanding the contexts that are needed for the mechanisms for change to work is essential for understanding how outcomes are produced. Pawson and Tilley talk of context-mechanism-outcome configurations, which are propositions stating what it is about programmes that works for some people in some circumstances. The same programme will work in different ways in different circumstances, and sometimes it will not work at all. So rather than trying to replicate programmes which seem to work in the hope that they will work everywhere and always, we should try to generalise about programmes by developing middle range theories about context-mechanism-outcome patterns which will allow us to interpret differences and similarities among groups of programmes. This is the realist alternative to the aspirations of the experimental method of positivism, which, hypnotised by method to the point where theory is forgotten, has rarely managed to tell us anything helpful about the questions that matter: what is it about this programme that works for whom in what specifiable conditions and given what contextual features?
This is because positivist ways of thinking about evaluation ignore contexts and (despite some claims to the contrary) generally also ignore processes, or mechanisms in Pawson and Tilley's terms. Its decontextualised preoccupation with outcomes inevitably means that most of the results of positivist research are non-significant and inconclusive, because the theories that it is supposed to be testing depend crucially on the specific context in which they are implemented. The philosopher Russell Keat, in a book about something quite different, indicates the ground which a realist philosophy might occupy in trying to understand how outcomes are produced, and some of the problems in connecting processes with outcomes. I quoted this in the 1987 article, and still do not know of any clearer statement of the problem. Keat is referring to the relationship between the truth or falsehood of psychoanalytic theory and the success or failure of psychoanalytic practice:
the failure of therapeutic techniques is compatible with the truth of this theory, whilst the success of those techniques may provide little support for it. I have argued that this is primarily due to the fact that in deriving predictions about therapeutic outcomes from psychoanalytic theory, a number of auxiliary statements must typically be assumed, whose own truth or falsity may display various degrees of independence from the explanatory claims made within this theory. Such auxiliaries may usefully be said to comprise a 'theory of technique': that is, an attempt to specify and explain the effects upon the patient of various elements of the therapeutic process. Thus even in those cases where predicted therapeutic success is achieved, it is possible that neither psychoanalytic theory nor its associated theory of technique are significantly supported, since it may be that this success is better explained by another theory of technique (The Politics of Social Theory, p. 159).
So the relationship between outcomes and the theory on which the programme is based - and some kind of theory or theories necessarily lie behind any social work intervention - is nothing like as straightforward as the managerial culture which demands single right answers requires it to be.
The positivist programme of theory falsification is thus not only ill-founded philosophically, since it misunderstands the nature of social science generalisations, but usually unhelpful to practitioners and policy-makers. John Braithwaite, writing about criminological positivism, has suggested that what is important is to develop a range of theories that are sometimes useful. These will often be theories which positivists say explain less variance than others across sets of decontextualised cases - that is, which have less predictive power. Braithwaite suggests that a useful way of thinking about theory is to see it as metaphor. Practitioners concerned with a particular problem in a local context can then scan lists of theories to see which supplies a helpful or interesting metaphor. 'In the world of problem solving that matters, it is contextualised usefulness that counts, not decontextualised statistical power' ('Beyond positivism', p. 386). In social work too, what is likely to be helpful is to use theory for the generation of interesting hypotheses about what might work in a particular local context, not to search for a universal one best way of responding to a given problem.
There are other reasons why what appears to be evidence is less straightforward to interpret and use in practice than we are being encouraged to believe. One is that a great deal of evaluative research is not very good. Positivism must again take some of the blame for this: as editor for four years of the British Journal of Social Work I read more papers than was good for me, some by quite distinguished social work academics, that showed a preoccupation with statistical testing combined with very little understanding of what statistical tests are for and in what circumstances they are useful. I'm no statistician, but even I could recognise nonsense in some of these papers. The preoccupation with scientific method meant that tests were used on data for which they were sometimes literally meaningless. As an aside on statistics, whose application to social data is one of the main achievements of positivism, one piece of learning in the editorship was that this is not the exact and settled science that I had taken it to be. People who clearly knew what they were talking about often disagreed radically about what statistical procedure should be used when, and for what purpose. The fact of having been around for a long time, and looking scientific, does not, apparently, make statistical analysis any more beyond argument than any other approach in the social sciences.
Another indication of the quality of much evaluative research comes from Andrew Underdown's review of evaluated programmes in probation services in England and Wales (published as Strategies for Effective Offender Supervision). This is, of course, preceded by a management summary, but if the managers were to go on and read the body of the report, described by the Chief Inspector of Probation as the most important he has ever written an introduction to, they would find that from the 267 replies on programmes received in the initial trawl, 210 were judged to have been evaluated in some way, 33 had produced enough evaluation material for analysis, and eleven were identified as having some value as indications of good evaluative practice. Of these, the clearest model for imitation was the evaluation by Raynor and Vanstone of the STOP (Straight Thinking on Probation) project in mid-Glamorgan, a careful and rigorous piece of external evaluation covering process as well as outcomes and examining implementation issues, levels of compliance and completion, and attitudinal change as well as reconvictions after one and two years, compared with those for several similar groups of offenders as well as against a statistical predictor. This was an evaluation of practice which was already evidence-based, since the STOP programme was designed as an adaptation to local conditions of the Reasoning and Rehabilitation programme developed by Robert Ross and his colleagues in Canada, and the conditions for its successful implementation and maintenance seem to have been near to ideal, so it as well to remember that while the one-year reconviction rates were promising for those who completed the programme (though not for those who did not), the two-year rates were much less so (though they were still better in terms of offence seriousness than for the comparison groups). The researchers concluded that the falling-off in performance was attributable to the absence of relevant reinforcement and support after people had left the programme - that is, for continued good results the programme would have needed to be supplemented by opportunities to build on the learning and solving of problems participants had achieved; it should have been part of a broader network of resources integrated with the programme's aims and methods. I will come back later to the importance of strategies that are contextual, integrated and multi-modal, and that, necessarily, draw on more than one strand of theory.
The reason why there are so few adequate evaluations of practice, and therefore so (relatively) little evidence to base practice on, is that evaluation is difficult. My own work (partly published, partly in progress) on two projects for persistent juvenile offenders in Scotland has been a strong reminder of this. The collection of rich process data that allow confident conclusions to be drawn about what the important aspects of a programme are, associated with success or failure, requires close, time-consuming observation and analysis of what is observed. It needs to be able to chart changes over time, and to incorporate the understandings and theories of both staff and participants. For quantitative data, you need systems of collection which are reliable and consistent, and give access to data sources that are reliable and complete. A particularly sharp reminder has been about the limitations of reconviction data as a measure of change. In both projects we have had access to details on the number and nature of charges faced by the young people in the twelve months before they started at the projects. This material on charges stops being collected when the young people reach the age of sixteen, after which the only source for reconvictions is the Scottish Criminal Records Office. While of course the young people may not have committed some of the offences with which they were charged, it is still the case that the SCRO record gives a much attenuated account of the volume and rate of offending, and that the time lag between charge and conviction, or at least the appearance of the conviction on the official record, is often very long. It has also proved far from straightforward to find a suitable comparison group. We have, after a good deal of effort on the part of Scottish Office colleagues and two time-wasting false starts, identified a comparison group which should be adequate for evaluative purposes, but it will of course be subject to the same limitations of the SCRO record. George Mair and others have discussed the limitations and problems of using reconviction rates, as follows:
What should be counted (court appearances, number of convictions, all offences)?
Reconviction is not reoffending
What should the follow-up period be?
When should you start counting (date of court order, date of implementation, date of completion)?
How to deal with time lag between offence and conviction, and consequent false positives and negatives?
How to interpret results (e.g. does reconviction mean failure)?
Can you assess the impact of variations in police and prosecution practice?
- but I'm inclined to think that the health warning which customarily accompanies research results which use reconvictions as the main outcome measure could do with being strengthened. It is probably rare for a persistent offender to avoid convictions altogether, at least in the long run, though there is plenty of anecdotal evidence to suggest that it can happen over a year or two, but there is no certain fit between recorded convictions and amount of crime actually committed. Thus some of the hardest-looking data available in social work evaluation become noticeably softer when you look at them closely.
Another area in which there is currently a strong demand for evidence is on cost-effectiveness. Since this is one element of the evaluation of the two Scottish Office projects, I have been trying to read the relevant literature conscientiously, get my head round the maths, and understand the assumptions which are characteristically built into such analyses in the field of criminal justice interventions about what would have happened without the intervention (it is possible that fewer assumptions are required in other fields, such as the evaluation of health services, in which cost-benefit evaluation seems to be more firmly established). Most writers in this area advocate comprehensiveness, but the more comprehensive the evaluation gets the more complex it is likely to become, and the more assumptions need to be built into the analysis. For instance, there have been various attempts to assess the cost of a 'typical' crime - itself a difficult concept - some of which have tried to measure only criminal justice system costs, while some have tried to assess the cost to the victim, to insurance companies, to employers, and so on...because once you start down the road of inclusiveness the possibilities multiply. Even studies which consider only criminal justice system costs typically have to make assumptions about the marginal cost saving of each offence prevented, and, still more fundamentally, seem almost universally to consider all criminal justice costs as net social costs, whereas one could easily argue that the creation of jobs, and therefore wealth, and the avoidance of unemployment among criminal justice personnel count as overall social and economic benefits (and the private sector in criminal justice is of course a notable current case of economic success). Finally, it is worth noting that in a recent Rand Corporation report on the effectiveness of early intervention with children the researchers decided that there were only two studies which provided sufficiently high quality and long term data to use in their effort to assess cost savings (their preferred term). We could all wish that cost-benefit or cost-savings analysis were an exact science, as those who seek to control the social world require it to be; but it is not. This is not to say that nothing sensible can be said about these issues, but what is said will generally be tentative and qualified, and the assumptions behind the conclusions should be explicit (it is reasonable to say, for instance, that someone who has been free of convictions for two years at the age of eighteen is less likely to get into a criminal career as an adult than someone who has 20 convictions over the same period, but the unpredictability of social life means that this assumption will sometimes be false).
I have spoken of the nature of generalisations in social science, of the importance of context and processes in making sense of outcomes, of the inevitability, as I see it, of theoretical pluralism, and of various more mundane and technical matters which have a bearing on the production and interpretation of evidence. Given that the status of any evidence is therefore qualified and ambiguous, how should practitioners and policy-makers use it? The following panel adapts John Braithwaite's argument for what he calls 'contextual integrated strategies' in tackling crime problems (from a talk to the American Society of Criminology, most of whose members' recent work suggests that they paid him very little attention). It is also influenced by my own and colleagues' experiences of the Scottish projects for juvenile offenders, from which one of the strongest lessons has been that context matters, that it makes little sense to try to understand a special project without reference to the local environment which sustains it (or fails to do so), and that to rely on a single theory, and therefore on a single form of intervention, is unlikely to produce good results. It argues for theoretical pluralism and tolerance within limits set by positivist achievements in identifying nonsense and making it manifest; it asserts the importance of context and the realist stress on what it is that has made the difference (in either direction); it hints at the problems this implies for a simple conception of replication; it suggests that much evidence will only emerge over a longer time span than has usually been available (but that when it comes, the evidence will be more stable, reliable and useful); it argues for the integration of different approaches rather than exclusive reliance on one (since we are not, say, industrial chemists); and it suggests that the stress in the literature on 'programme integrity' will mislead if it is taken to mean a determination to change nothing even when the context changes. The reason that programmes with high 'integrity' (a technical term with moral overtones) tend to do better than those without integrity is (I think) that the former are generally better resourced and more serious in their pursuit of their aims; proper resourcing and seriousness are virtues in any programme, but they should not entail refusal to change.
Theories that are wrong much or even most of the time can still be useful
Put positivism in its place. The best positivist research can tell you what not to do (things that never work anywhere), but don't expect law-like universals
Pay attention to context in time and space: will what worked then and there work here and now?
What was it that made the difference?
Use evidence to develop long-term integrated policy packages and conduct historical, qualitative and quantitative evaluations
Don't expect anything always to work on its own. Prefer integrated strategies that are dynamically responsive to environmental change to static approaches based on a single type of theory
If evaluation researchers were to think along those lines, and become modest enough to encourage practitioners and policy-makers to do so too, the quality of the available evidence would improve, and so would the practice based on it.