-
LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning
Authors:
Jiajie Li,
Garrett Skinner,
Gene Yang,
Brian R Quaranto,
Steven D Schwaitzberg,
Peter C W Kim,
Jinjun Xiong
Abstract:
Multimodal large language models (LLMs) have achieved notable success across various domains, while research in the medical field has largely focused on unimodal images. Meanwhile, current general-domain multimodal models for videos still lack the capabilities to understand and engage in conversations about surgical videos. One major contributing factor is the absence of datasets in the surgical f…
▽ More
Multimodal large language models (LLMs) have achieved notable success across various domains, while research in the medical field has largely focused on unimodal images. Meanwhile, current general-domain multimodal models for videos still lack the capabilities to understand and engage in conversations about surgical videos. One major contributing factor is the absence of datasets in the surgical field. In this paper, we create a new dataset, Surg-QA, consisting of 102,000 surgical video-instruction pairs, the largest of its kind so far. To build such a dataset, we propose a novel two-stage question-answer generation pipeline with LLM to learn surgical knowledge in a structured manner from the publicly available surgical lecture videos. The pipeline breaks down the generation process into two stages to significantly reduce the task complexity, allowing us to use a more affordable, locally deployed open-source LLM than the premium paid LLM services. It also mitigates the risk of LLM hallucinations during question-answer generation, thereby enhancing the overall quality of the generated data. We further train LLaVA-Surg, a novel vision-language conversational assistant capable of answering open-ended questions about surgical videos, on this Surg-QA dataset, and conduct comprehensive evaluations on zero-shot surgical video question-answering tasks. We show that LLaVA-Surg significantly outperforms all previous general-domain models, demonstrating exceptional multimodal conversational skills in answering open-ended questions about surgical videos. We will release our code, model, and the instruction-tuning dataset.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
One-shot skill assessment in high-stakes domains with limited data via meta learning
Authors:
Erim Yanik,
Steven Schwaitzberg,
Gene Yang,
Xavier Intes,
Jack Norfleet,
Matthew Hackett,
Suvranu De
Abstract:
Deep Learning (DL) has achieved robust competency assessment in various high-stakes fields. However, the applicability of DL models is often hampered by their substantial data requirements and confinement to specific training domains. This prevents them from transitioning to new tasks where data is scarce. Therefore, domain adaptation emerges as a critical element for the practical implementation…
▽ More
Deep Learning (DL) has achieved robust competency assessment in various high-stakes fields. However, the applicability of DL models is often hampered by their substantial data requirements and confinement to specific training domains. This prevents them from transitioning to new tasks where data is scarce. Therefore, domain adaptation emerges as a critical element for the practical implementation of DL in real-world scenarios. Herein, we introduce A-VBANet, a novel meta-learning model capable of delivering domain-agnostic skill assessment via one-shot learning. Our methodology has been tested by assessing surgical skills on five laparoscopic and robotic simulators and real-life laparoscopic cholecystectomy. Our model successfully adapted with accuracies up to 99.5% in one-shot and 99.9% in few-shot settings for simulated tasks and 89.7% for laparoscopic cholecystectomy. This study marks the first instance of a domain-agnostic methodology for skill assessment in critical fields setting a precedent for the broad application of DL across diverse real-life domains with limited data.
△ Less
Submitted 19 April, 2024; v1 submitted 15 December, 2022;
originally announced January 2023.
-
Decreasing the Surgical Errors by Neurostimulation of Primary Motor Cortex and the Associated Brain Activation via Neuroimaging
Authors:
Yuanyuan Gao,
Lora Cavuoto,
Anirban Dutta,
Uwe Kruger,
Pingkun Yan,
Arun Nemani,
Jack E. Norfleet,
Basiel A. Makled,
Jessica Silvestri,
Steven Schwaitzberg,
Xavier Intes,
Suvranu De
Abstract:
Acquisition of fine motor skills is a time-consuming process as it requires frequent repetitions. Transcranial electrical stimulation is a promising means of enhancing simple motor skill development via neuromodulatory mechanisms. Here, we report that non-invasive neurostimulation facilitates the learning of complex fine bimanual motor skills associated with a surgical task. During the training of…
▽ More
Acquisition of fine motor skills is a time-consuming process as it requires frequent repetitions. Transcranial electrical stimulation is a promising means of enhancing simple motor skill development via neuromodulatory mechanisms. Here, we report that non-invasive neurostimulation facilitates the learning of complex fine bimanual motor skills associated with a surgical task. During the training of 17 medical students on the Fundamentals of Laparoscopic Surgery (FLS) pattern cutting task over a period of 12 days, we observed that transcranial direct current stimulation (tDCS) decreased the error level and the variability in performance, compared to the Sham group. By concurrently monitoring the cortical activations of the subjects via functional near-infrared spectroscopy (fNIRS), our study showed that the cortical activation significantly stimulated by tDCS. The lowered performance error and the increased brain activation were retained after one-month post-training. This work supports the use of tDCS to enhance performance accuracy in fine bimanual motor tasks.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.