Healthcare Generative AI Practitioners Prioritize Industry-Specific and Task-Specific Models
John Snow Lab’s first Generative AI in Healthcare survey reveals trends, challenges, and best practices among healthcare and life sciences practitioners.
Note: TDWI's editors carefully choose vendor-issued press releases about new or interesting research and services. We have edited and/or condensed this release to highlight key study results or service features but make no claims as to the accuracy of the vendor's statements.
John Snow Labs, an AI for healthcare company, released findings of its inaugural Generative AI in Healthcare survey. Conducted by Gradient Flow, the research explores the trends, tools, and behaviors around generative artificial intelligence use among healthcare and life sciences practitioners. Findings showed a significant increase in generative AI budgets across the board, with one-fifth of all technical leaders witnessing a more than 300% budget growth, reflecting strong advocacy and investment.
The survey highlights key priorities of practitioners unique to the healthcare industry. A strong preference for healthcare-specific models was a key criterion when evaluating large language models (LLMs). Requiring models to be tuned specifically for healthcare (4.03 mean response) was of higher importance than reproducibility (3.91), legal and reputation risk (3.89), explainability and transparency (3.83), and cost (3.8). Accuracy is the top priority when evaluating LLMs and lack of accuracy is considered the top risk in GenAI projects.
Another key finding is a strong preference for small, task-specific language models. These targeted models are optimized for specific use cases, unlike general-purpose LLMs. Survey results reflected this, with 36% of respondents using healthcare-specific task-specific language models. Open-source LLMs (24%) and open-source task-specific models (21%) follow behind. Proprietary LLMs are less commonly used, whether through a SaaS API (18%) or on premises (7%).
In terms of how models are tested and improved, the survey highlights one practice that addresses both the accuracy and compliance concerns of the healthcare industry: human-in-the-loop workflows. This was by far the most common step taken to test and improve LLMs (55%), followed by supervised fine-tuning (32%), and interpretability tools and techniques (25%). A human-in-the-loop approach enables data scientists and domain experts to easily collaborate on training, testing, and fine-tuning models to their exact needs, improving them over time with feedback.
The survey also explores the large amount of remaining work in applying responsible AI principles in healthcare generative AI projects. Lack of accuracy (3.78) and legal and reputational risk (3.62) were reported as the most concerning roadblocks. Worse, a majority of generative AI projects have not yet been tested for any LLM requirements cited. For those that have, fairness (32%), explainability (27%), private data leakage (27%), hallucinations (26%), and bias (26%) ranked as the most commonly tested. This suggests that no aspect of responsible AI is being tested by more than a third of organizations.
You can read the report here. No registration is required.