Systematic Literature Reviews and the Discipline of Not Picking Favorites

Every literature review reflects choices. You choose which databases to search. You choose which keywords to use. You choose which papers to include and which to exclude. The difference between a systematic literature review and a narrative review is not that one makes choices and the other does not. Both make choices. The difference is that a systematic review documents every choice in advance, applies it consistently, and makes the whole process auditable by someone who was not there.

A narrative review is expert-driven. An experienced researcher reads through the literature, selects the papers they find most useful, and synthesizes what they found. The result can be genuinely valuable, especially when the expert has deep knowledge of a field. But the selection process is opaque and non-reproducible. Two experts reviewing the same literature can reach different conclusions, and there is no mechanism to tell which one is right. The selection bias is invisible. Papers that confirm the expert's existing view are somewhat more likely to appear than papers that challenge it, not always through bad faith, but through the normal operation of human memory and relevance judgments.

A systematic literature review tries to replace that opacity with a documented protocol. The research question is defined before the search begins. The search strings are specified and documented. The databases to be searched are listed. Inclusion and exclusion criteria are written down before any screening happens, covering things like publication type, language, date range, and topic relevance. Papers are screened in stages, first by title and abstract, then by full text, and the screening decisions are documented. The result is a source set that a different researcher could reproduce, or at least closely approximate, by following the same protocol.

PRISMA, which stands for Preferred Reporting Items for Systematic Reviews and Meta-Analyses, is a widely used framework for reporting these kinds of reviews. It is used extensively in health sciences and has spread into IS and management research as systematic reviewing has become more common there. I am hedging on PRISMA slightly because my study-hub notes on this topic (day2.html, Topic 11) cite Templier and Pare (2015) as the main IS-specific reference for SLR protocols and guidelines, and they do not mention PRISMA explicitly. PRISMA is widely cited in the research methods literature as a reporting standard, particularly in medicine, and IS researchers increasingly use it as a template, but the IS-specific guidance in Templier and Pare (2015) is what my comps preparation identified as the primary reference. Templier and Pare describe four SLR types and provide nineteen guidelines across six steps: formulating the problem, searching the literature, screening for inclusion, assessing quality, extracting data, and analyzing and synthesizing data.

Webster and Watson (2002) made an earlier argument about literature reviews in IS that is still relevant: a good review is concept-centric, not author-centric. The failure mode of a traditional review is what my notes call "the author parade," where the review simply walks through a sequence of papers in chronological order, summarizing each one in turn. That structure tells the reader what each author argued but does not synthesize across them. It does not tell you what the field as a whole believes, where the disagreements are, or what questions remain open. A concept-centric review organizes around the ideas themselves, uses individual papers as evidence for or against positions, and produces an integrated picture of what is known and what is not.

The quality problem in IS systematic reviews is real and under-discussed. The method constrains source selection more than it constrains analysis. Following a documented search protocol gives you a defensible source set, but what you do with those sources, how you code them, what categories you impose, how you handle contradictory findings, still requires interpretive judgment. I have read IS systematic reviews that searched fifteen databases with carefully documented inclusion criteria and then coded the included papers in ways that seemed to confirm the authors' prior expectations. The method's transparency stops at the screening stage. The synthesis quality depends on intellectual honesty and methodological care that no protocol can fully guarantee.

The boundary definition problem is also underestimated. Deciding what counts as relevant is harder than it sounds. If my review asks about "AI adoption in healthcare organizations," does a paper about AI diagnostic tools in radiology count? What about a paper about electronic health record adoption that does not mention AI explicitly but discusses predictive analytics? What about a paper about AI in pharmaceutical research with no organizational component? Each boundary decision seems small, but they accumulate. Two researchers starting from the same question can end up with source sets that overlap only partially, and the synthesized pictures they build can look quite different. A good SLR documents boundary decisions explicitly so readers can evaluate them.

Contrast this with how industry analyst firms like Gartner approach their research. Gartner's reports and briefings, accessible through the Gartner newsroom, are produced quickly by expert analysts who synthesize primary research, practitioner interviews, and vendor briefings. The output is influential and often more current than academic reviews. But the methodology is not published. You cannot reproduce a Gartner Magic Quadrant or a Hype Cycle position from the description in the report. You have to trust the analysts' judgment. This is not necessarily bad, expert judgment has real value, but it is structurally identical to the narrative review problem that systematic reviews were designed to address. The academic SLR takes longer precisely because it is trying to make the judgment process auditable.

My notes from Topic 11 of my comps preparation also flag a newer concern: AI-assisted systematic reviews. Tools like large language models can help with search, screening, and summarization tasks. But Susarla et al. (2023) argue that AI tools create risks of hallucination, fabricated references, shallow synthesis, and poor contextualization when used without careful human supervision. The AI can make a review faster. It cannot make it better unless a human remains responsible for fact-checking, theoretical grounding, and scholarly judgment at every stage. Using AI output as if it were paper-backed synthesis is exactly the kind of shortcut that undermines what systematic reviewing is supposed to achieve.

I think about this when I'm planning my own literature reviews. The honest version of a systematic review is not just following the steps. It is being willing to include papers that challenge your argument, to report findings that complicate your story, and to write the limitations section with real specificity about what your search protocol might have missed.