When “Probably” Is Not Good Enough in Clinical Research Document Review
There is a real difference in clinical research between a response that feels plausible and something you would be comfortable relying on during study conduct, monitoring, or an audit. That is a big part of why I pay close attention to AI.
Two terms are useful here, even if they sound more technical than they need to:
Deterministic means the same question, asked against the same documents, under the same rules, should give you the same result. The system is bounded by the study documents you gave it.
Probabilistic means the system is generating the answer it thinks is most likely based on patterns learned from very large amounts of text. In simple terms, it predicts the next word, then the next, and so on. The study documents you give it influence those predictions, but they do not force the system to stay strictly within the wording or boundaries of the documents. That is why the same question can produce slightly different wording, or even somewhat different answers, across runs.
Those are not perfect technical labels for every product on the market, but I think they are a helpful way to understand an important difference. Deterministic does not mean perfect. It just means constrained. Probabilistic does not mean random. It means the tool is predicting rather than simply locating and returning cited source material.
By AI here, I mostly mean large language models (LLMs). That is the technology behind tools like ChatGPT, Claude, Gemini, and Copilot. I think these tools are impressive. I use them often, and I think anyone in clinical research who ignores them is making a mistake, similar to people who dismissed the internet early on. But I also think it helps to be clear about what they are actually doing.
One way to think about it is that using a deterministic tool is like going back to the study file and pulling the exact page. A probabilistic tool is closer to asking a very knowledgeable person for the likely answer based on everything they have seen before. Both can be useful. But they are not the same kind of output, and in clinical research that difference matters.
Clinical research documents are not just content to be summarized. A protocol, an amendment, an informed consent form, an investigator brochure, a pharmacy manual, a lab manual, or sponsor guidance can drive real decisions. Sometimes the issue comes down to one sentence. Sometimes it is a visit window, a dosing qualifier, or whether a later amendment changed what the base protocol said. Sometimes the answer sits across several documents, and those documents do not always line up cleanly.
Anyone who has worked in clinical research has seen this. A coordinator is checking whether a procedure can happen outside the target day but still be inside the visit window. A CRA is trying to confirm whether something really should be documented as a protocol deviation under the current amendment. A sponsor is reviewing whether patient-facing language changed when the protocol changed. Those are not abstract questions. They affect study conduct, documentation, monitoring, and sometimes patient conversations.
In that setting, a polished answer is not enough. In clinical research, the question usually is not just, “Can you answer this?” It is, “Can you show me exactly where you got it?” You need to know what the document says, where it says it, and whether another document in the study file says something different.
That is where probabilistic AI can get into trouble. Its strength is flexibility. It is good at summarizing, brainstorming, translating dense text into plain English, and helping you get started. But that same flexibility can become a weakness when the work depends on exact language. A probabilistic model will often produce an answer even when the underlying support is incomplete, mixed across sources, or a little too loosely paraphrased. In a casual setting, that may not matter. In clinical research, it can matter very much.
A more deterministic, source-bound approach is narrower, but I think that is often exactly what you want here. It is bounded by the record. It looks across the documents you loaded, finds the relevant passages, and stays close to the wording that is actually there. If the answer is supported, it should show you where. If the answer is not clearly supported, it should stop there instead of filling the gap with something that merely sounds right. It is better to say, “I can’t support that from these documents,” than to give a confident answer that is only probably right.
Some people hear that and think it sounds limited. I hear it as disciplined. In clinical research, I would usually rather have a tool show me the limit of the record than have it smooth over that limit and hand me a more confident answer than the documents actually support.
That does mean the result is only as good as the documents. If the wrong version was loaded, if a key amendment is missing, or if the study file itself is incomplete, the answer will reflect that. But for this kind of work, I think that is the better failure mode. A tool that says, in effect, “I cannot support that from these documents” is often more useful than a tool that gives a polished answer without a clean line back to the source.
Privacy is a major factor too. Many sponsors, CROs, and sites are still cautious, and rightly so, about what can be uploaded into general-purpose AI systems, especially when it is unclear who may have access to that information or how it may be retained. But that is a whole other conversation (and probably a good future blog post).
The bigger operational issue, in my view, is that study questions rarely live inside one clean piece of text. A simple question may touch the protocol, a later amendment, the ICF, a visit schedule, a site-facing document, and sponsor guidance. Those sources may agree, but many times they do not. When they do not, I do not want a system that blends them into one tidy answer and hides the tension. I want the relevant passages in front of me so I can see what each document says.
That is the distinction that matters most to me. A probabilistic system tends to have an answer to everything. A more deterministic system tends to show the supporting text, or stop short of claiming more than it can support. In clinical research, that is the difference between a response that sounds good and a response you can verify, cite, and stand behind later.
DocCite takes that more deterministic approach by design. It is built around cited proof: reviewing your clinical research documents locally, searching across them, and showing the supporting passages with the document name, page number, and section. It keeps document versions distinct. It does not quietly collapse differences between documents into one blended paragraph. If the support is strong enough, it can give you a short answer. If not, it stays at the passage level so you can read the source or sources yourself.
I think that is a better fit for our kind of work. When I am reviewing study documents, I do not need a tool to sound smart. I need it to stay close to the source, make verification easy, and show me exactly where the answer came from. And if two documents disagree, I want the disagreement shown clearly instead of smoothed over.
None of this is an argument against AI. I do think AI has an important place in clinical research. It can help with drafting, training, summarization, and early-stage thinking. It can save time and improve clarity when used appropriately. But there should be different tools for different situations.
When I need help brainstorming, probabilistic AI can be very useful. When I need to review clinical research documents and stand behind what I found, I want exact passages, clear citations, local privacy, and a system that does not hide uncertainty when the documents do not fully agree. That is what DocCite is built to do, and it is why I think a more deterministic, source-bound approach is the better tool for this job.