Addressing
Socio-technical
Limitations
ov
LLMs
for
Medical
and
Social
Computing.

Unifying Expertise, Transforming Research: AI, Law, Medicine

Co-production and Criteria

Evaluation

Reasoning

Interactions

Use Cases

About

Our Vision

Large Language Models (LLMs), like those used in ChatGPT and virtual assistants, are cutting-edge
artificial intelligence algorithms trained on massive amounts of text data. They can generate human-like
text as well as creative content, translate across languages, and answer questions in an informative way.
However they have known technical limitations such as biases, privacy leaks, poor reasoning and lack of
explainability, which raises concerns about their use in critical domains such as healthcare and law.

Our vision addresses the socio-technical limitations of LLMs that challenge their responsible and
trustworthy use, particularly in the context of medical and legal use cases. Our goal is two-fold:

Firstly to create an extensive evaluation benchmark (including
suitable novel criteria, metrics and tasks) for assessing the limitations of LLMs in real world settings,
enabling our standards and policy partners to implement responsible regulations, and industry and third
sector partners to robustly assess their systems. To achieve this synergy we will be running co-creation
and evaluation workshops throughout the project to create a co-production feedback loop with our
stakeholders.

The second part is to devise novel mitigating solutions based on
new machine learning methodology, informed by expertise in law, ethics and healthcare, via co-creation
with domain experts, that can be incorporated in products and services. Such methodology includes
development of modules for temporal reasoning and situational awareness in long-form text, dialogue and
multi-modal data, as well as alignment with human-preferences, bias reduction and privacy preservation.

Partners

Work Stream 1: Co-production and Criteria for Responsible Research and Innovation (RRI) with LLMs

Background & challenges: Regulatory frameworks, guidance and proposed legislation have been introduced to address privacy, accuracy and explainability of automated systems (EU AI Act, UK Government’s National AI Strategy, UK Algorithmic Transparency, UK Government National Data Strategy, UK White Paper: A Pro-innovation Approach to AI Regulation, UK ICO guidance on explaining AI). It is not clear how domain experts can translate this regulation to meet needs for LLM technology use.

Foci & implementation: We will review and synthesise regulatory expectations and principles of ethical and responsible innovation into a set of requirement criteria for safe and responsible use of AI within medicine and the law. Such criteria are expected to include e.g. privacy preservation, inclusivity, factuality, situational awareness, explicability. The full set of criteria will be defined through interactions within our multi-disciplinary team, partners and relevant stakeholders, guided by the RRI AREA framework, and in the light of ethics literature that goes beyond the AREA framework. We will examine how the criteria translate to practical needs in measuring safety and effectiveness in law and healthcare, especially through scoping workshops with partners. A co-creation ‘by design’ approach will be key at this stage to shape the lifecycle of LLM-based legal and medical AI. Specifically the methods in this WP will involve: (a) surveying regulatory frameworks, the literature on ethical challenges and safety issues in the application of technology in legal practice and medical diagnostics as well as suitable analysis and design frameworks; (b) consultations, interviews, studies of work practices through observation and discussions with practitioners; (c) scoping workshops with partners and all co-Is, lead by the RAi and domain experts to establish requirement criteria that translate frameworks into practice in the context of legal and medical use cases. The workshops will help further refine the use cases; (d) A series of scenario-based co-creation workshops where use case prototypes will be progressively evolved and evaluated by stakeholders using benchmarks developed in Work Stream 2.

Outputs: We will obtain requirements criteria & associated publications for RRI with LLMs for real-world applications within the identified high stake medical and legal use cases. This Work Stream will drive a co-production feedback loop with stakeholders.

Addressing Socio-technical Limitations ov LLMs for Medical and Social Computing.