International multi-stakeholder consensus statement on clinical trial integrity

Objective To prepare a set of statements for randomised clinical trials (RCT) integrity through an international multi-stakeholder consensus. Methods The consensus was developed via multi-country multidisciplinary stakeholder group composition and engagement; evidence synthesis of 55 systematic reviews concerning RCT integrity; anonymized two-round modified Delphi survey with consensus threshold based on the average percentage of majority opinions; and a final consensus development meeting. Prospective registrations: (https:// osf. io/ bhncy, https:// osf. io/ 3ursn). Results There were 30 stakeholders representing 15 countries from five continents including trialists, ethicists, meth-odologists, statisticians, consumer representatives, industry representatives, systematic reviewers, funding body panel members, regulatory experts, authors, journal editors, peer reviewers and advisors for resolving integrity concerns. Delphi survey response rate was 86.7% (26/30 stakeholders). There were 111 statements (73 stakeholder-provided, 46 systematic review-generated, 8 supported by both) in the initial long list, with eight additional statements provided during the consensus rounds. Through consensus the final set consolidated 81 statements (49 stakeholder-provided, 41 systematic review-generated, 9 supported by both). The entire RCT life cycle was covered by the set of statements including general aspects ( n = 6), design and approval ( n = 11), conduct and monitoring ( n = 19), reporting of protocols and findings ( n = 20), post-publication concerns ( n = 12) and future research and development ( n = 13). Conclusion Implementation of this multi-stakeholder consensus statement is expected to enhance RCT integrity.


Introduction
The essence of the multiple concepts and terms related to research integrity [1][2][3][4][5] boils down to responsible research conduct through compliance with ethics and professional standards [1].A working definition of science integrity clarifies the crucial role of 'ensuring transparency at all stages of design, execution, and reporting' [3].Existing integrity initiatives [6][7][8] provide general statements about how to promote responsible research conduct.
In health effectiveness research, as randomised clinical trials (RCTs) and their systematic reviews are at the highest level of the evidence validity hierarchy, preserving RCT integrity is a priority [9][10][11].The high rates of questionable research practices in integrity surveys [11,12] and the growing number of allegations of data fabrication in retractions [13] have shaken practitioner and public confidence.Not all such cases are due to deliberate misconduct [14].RCT integrity, however, is under threat from a mix of unintentional errors, faulty methodology, lack of awareness of research ethics, poor writing skills, pressure to publish, etc [1,10,[15][16][17].To our knowledge, apart from the International Council on Harmonisation of technical requirements for registration of pharmaceuticals [18], the research integrity initiatives [6][7][8] are not specific to RCTs.This makes it difficult for the clinical academic institutions, research funding bodies and publishing organisations to target RCTs for improving their integrity standards.Thus, there is an urgent need for RCT community alignment in this area [19].
To address the need for an updated and specific set of integrity statements relating to responsible research conduct for RCTs, we undertook an international multi-stakeholder consensus focussing on the transparency required at the various stages of their planning, execution, and reporting.

(a) Establishment of the international multi-stakeholder group
In August 2021, 6 months ahead of the proposed consensus meeting, an international stakeholder group was carefully composed by selecting members based on their knowledge and experience to encompass all the critical aspects of the RCT research lifecycle.Our approach used snowballing that stopped searching for new participants once all relevant aspects of RCT lifecycle were saturated [25].Snowballing sought the input of the initially approached potential members for identifying further members until the entire RCT lifecycle was covered.A clinical trial was defined as a study design that randomly assigns human participants to one or more interventions and follows them up for critical outcomes to determine the effect of the interventions [9].Stakeholders were representatives from relevant professional societies; allied health professions; patient, public and consumer representatives; trialists, statisticians and methodologists; members and reviewers of ethics, data monitoring and funding committees; peer reviewers and biomedical journal editors.They were contacted via direct email (see the list of stakeholders and their roles in Table 1).We ensured that none of the participants had any RCT papers subjected to an active expression of concern nor retraction.All stakeholders explicitly declared their conflicts of interests using the International Committee of Medical Journal Editors (ICMJE) uniform disclosure form (Appendix S1).One non-voting member (DM), without any RCT experience, was invited to the group for advising on consensus methods and language.Two members of the group were selected as co-convenors (KSK and YK), charged with the responsibility to oversee the snowball sampling and to ensure that all participants developed ownership of the consensus scope and content, engaging them in discussions, constructive debates and resolution of disagreements.Following acceptance of the invitation, online or phone interviews were held with the stakeholders to inform them about the project objectives, and to ask them for their input to the integrity statements.

(b) Umbrella review for generating evidence-based statements
For the creation of the initial long list of statements, we conducted a review of systematic reviews on RCT research integrity.The prospectively registered umbrella review (https:// osf.io/ 3ursn) was carried out with a comprehensive search strategy covering major electronic databases (Pub-Med, Scopus, Cochrane Library and Google scholar) from inception to November 2021 to capture peer-reviewed and grey literature.The review's search and selecting strategy, data extraction, methods for assessing methodological quality and synthesis of findings have been reported [19].Building on the collated findings, a core group of four stakeholders (AB, PC, MF and KSK) drafted clear, precise and actionable statements.The statement drafting process was piloted using seven included reviews initially.The deliberations at this stage helped to clarify the distinction between review findings and the resulting statements.Each member of the core stakeholder group first independently drafted statements, aiming for one action or recommendation per statement, and then finalised them through discussion.

(c) Modified Delphi survey
The statements provided by stakeholders were added to those generated from the umbrella review without editing.Together they created the long list for the modified Delphi consensus survey among 30 stakeholders with voting rights deploying a web-based survey tool (www.surve ymonk ey.com).A 7-point scale was provided to assess the level of agreement with the content of each statement.The scale was anchored between 'strongly agree' and 'strongly disagree' , with 'agree' , 'somewhat agree' , 'neither agree nor disagree' , 'somewhat disagree' and 'disagree' included as the scaled options for responses.The same scale was used in both survey rounds administered on 30th January and 9th February 2022.The sum of the 'strongly agree' and 'agree' responses were used to compute an agreement rate for the approval of each individual statement.The responses of the individual stakeholders were kept anonymous throughout the whole process.
We used an objective method to determine the threshold or cut-off for approval of the statements, average percent of majority opinions (APMO) [24].For this computation, a statement was considered as agreed if the majority (> 50%) of stakeholders responded 'strongly agree' or 'agree' on the 7-point scale.A statement was considered as disagreed if the majority (> 50%) of stakeholders responded 'disagree' or 'strongly disagree' on the 7-point scale.The AMPO consensus threshold was calculated as: sum of majority agreed and majority disagreed statements / total number of responses received × 100%.Statements above the APMO threshold were considered as having reached consensus.For individual statements that reached consensus in each round, we computed the strength of the agreement among stakeholders using the interquartile range (IQR) [23].IQR was the difference between first and third quartiles of the stakeholders' responses on the 7-point scale.It was interpreted as follows: IQR 0 (> 50% stakeholders gave the same responses) indicated very good strength of agreement; IQR 1 (> 50% stakeholders' range of responses was ≤ 2 points of the scale) indicated good strength of agreement; IQR ≥ 2 (> 50% stakeholders' range of responses was > 2 points of the scale) indicated poor strength of agreement.As a sensitivity analysis, we used an arbitrary approval threshold of 70%.Results were analysed using Stata v16 software on February 6th and 18th, 2022 (StataCorp.2019, College Station, TX: StataCorp LLC).
Statements not having reached consensus in the first round using the APMO threshold were merged with new statements provided by stakeholders and subjected to the second round of the modified Delphi survey.The statements deemed to have failed to reach consensus because of lack of clarity in language had their wording improved.The statements containing similar information were merged to avoid duplication.First-round agreement rate was provided in the second survey round along with the references to the reviews supporting the statements generated via evidence synthesis.The minor rewording, statement merger and statistical approach in the second round was the same as that used in the first round.The statements that failed to reach consensus were taken for voting to the final consensus development meeting.
To consolidate the provisional statement set, a core group of stakeholders (AB, KSK, MNN, PC, MF) evaluated the statements that had reached consensus for exact or inexact duplications and clarity of meaning.Where the duplication was virtually exact, a single statement was created, making only minor wording changes to clarify or enhance the intended meaning.No major wording changes were introduced to any of the statements that had met the consensus threshold.The statements without consensus were revised in the same manner with a view to improving the clarity of their meaning and to assist in subsequent voting.Thus, an original statement may have been subjected to minor rewording or merger with other statements various times through the different consensus rounds.The list of statements resulting from the above process, both those having reached consensus and those not having done so, was tabulated and circulated to all the participants with the agreement ratings and the underpinning references to reviews for the consensus development meeting.

(d) Consensus development meeting
All stakeholders were invited to the meeting, which was attended by 24 participants in person, 6 participants virtually for the entire day, and DM in person as an advisor.The provisional statement set tabulated above was shared with the participants together with an initial draft of this manuscript.At the meeting, held in Cairo, Egypt, on the 22nd February 2022, statements that were classified as not having reached consensus in the two-round Delphi survey were individually discussed.Stakeholders decided on the agreement rate to be used as the threshold for exclusion and voted anonymously using an electronic system (Zoom meeting software) to select statements for the final set.The breakdown of statements into the various stages of the RCT research lifecycle was agreed with the stakeholder group.This included subheadings general, design and approval, conduct and monitoring, reporting of protocols and findings, post-publication concerns and future research and development.In tabulation of the final set, the strength of evidence assessed via a modified AMSTAR-2 score [26].was provided for the statements underpinned by systematic reviews.

Patient and public involvement
One patient representative was a stakeholder (NM) in the consensus group to provide input as a trial participant.Three stakeholders (NM, ABC, KSK) had prior experience in patient, public and consumer involvement in RCTs [27,28] (Fig. 1).In addition, three systematic reviews included in the evidence synthesis addressed RCT integrity issues related to patient, public and consumer involvement [29][30][31].This manuscript has been (2024) 29:20 prepared in accordance with the GRIPP-2 guideline (Appendix S2) [32].

Results
There were 30 stakeholders (Table 1) with voting rights from 15 countries in 5 continents including trialists, ethicists, methodologists, statisticians, consumer representative, industry representative, systematic reviewers, funding body panel members, regulatory experts, authors, journal editors, peer reviewers and advisors for resolving integrity concerns.Their combined wide and appropriate expertise, based on self-assessment, ranged broadly to include all aspects of the RCT research lifecycle from protocol development to knowledge transfer (Fig. 1).Taking all past relevant professional experience, not just posting at the time of undertaking the work, into account the geographic coverage included 22 countries and 6 continents (Fig. 2).
The initial long list of 111 statements (73 stakeholderprovided, 46 generated via evidence synthesis [19].and 8 supported by both) was submitted to consensus via the modified Delphi survey (Fig. 3).The first survey round had 26 out of 30 (86.7%) respondents and 64 statements were rated above the 76.5% APMO threshold for consensus.Among these, the strength of the agreement among stakeholders was good or very good in all the statements (Table 2).The remaining 47 statements along with the 7 new stakeholder-provided statements were subjected to revisions.After merging exact and inexact duplicates, 40 statements were submitted to the second survey round, where there were 26 out of 30 (86.7%) respondents and 24 statements were rated above the 68.4% APMO threshold for consensus.Among these, the strength of the agreement among stakeholders was good in 18 (75%) statements (Table 2).The 64 statements agreed in the first modified Delphi survey round were merged, removing exact and inexact duplications, to take forward 54 along with 24 agreed statements from second round to the consensus development meeting.The remaining 16 statements that lacked consensus after the second round were also taken forward.Sensitivity analysis for consensus threshold deploying the predefined arbitrary 70% cut-off showed that the APMO threshold was more conservative in the first round, permitting more statements to be re-examined (Table 2).
There was one new stakeholder-provided statement taking to total presented to 95 at this final stage.At the Fig. 1 Expertise and experience of the voting members of the stakeholder group in the international multi-stakeholder consensus statement on clinical trial integrity outset, the stakeholder group confirmed that statements below 50% agreement threshold were to be excluded.Following discussion, merging and voting in the consensus development meeting of the final shortlist contained 81 statements (49 stakeholder-provided, 41 systematic review-generated, 9 supported by both).Of the total, 32 (39.5%) were unique evidence-based statements.Of the 41 statements underpinned by evidence synthesis [19], two were based on at least one high-moderate quality systematic review [29,33].As shown in Table 3, the entire RCT lifecycle was covered with statements concerning general aspects (n = 6), design and approval (n = 11), conduct and monitoring (n = 19), reporting of protocols and findings (n = 20), post-publication concerns (n = 12) and future research and development (n = 13).

Main findings
Our international multi-stakeholder consensus provides the first specific integrity statement for promoting and protecting RCT integrity.It was developed in a robust and comprehensive manner, covering the entire RCT lifecycle.The general statements on RCT integrity emphasise the need for global harmonisation and action.The statements relating to RCT design, approval, conduct and monitoring make clear that integrity needs embedding throughout the research lifecycle.The responsibilities of the publishing community are covered in statements concerning manuscript submission, peer review, reporting and complaints.Further statements highlight the need for continuing research and development to advance responsible research conduct in RCTs.Drafted in a simple and clear language, the set of statements needs implementation by the clinical trialist community and related institutions to take forward the health research integrity agenda.

Limitations and strengths
There are several issues to consider in the weaknesses and strengths of this consensus development study.Defining research integrity to determine the statement scope was not straightforward.Although there is no agreed definition [3,4], it is important to recognise that there is no controversy.To confidently use research results, society expects that the highest ethics standards and professionalism are deployed to conduct and report research [1].Defining integrity narrowly, focusing on post-submission or post-publication dishonesty assessments, fails to recognise that the whole research journey needs addressing [34].Our work is subject to other limitations including the possibility that the consensus group, which may be seen as having been derived from convenience sampling with snowballing, risking selection bias that could lead to particular results, or may not have included all perspectives despite an extensive effort to capture the widest possible range (Fig. 1); our stakeholder group sample size was larger than the median of 22 experts included in previous reporting guideline development groups [35].Snowballing is a non-probability sampling technique where existing panel members select future members unlike random sampling methods that select members from curated lists.Those experts who consider themselves excluded will have the opportunity to enrich our work through their comments via correspondence following publication [36].The surveys and voting were, by nature of the consensus, opinionbased.Not every stakeholder endorsed every statement (see percentages of agreement in Table 3).For example, despite the high level of overall support (92.3% approval with good level of agreement among stakeholders in the first round), there was a strong individual objection to the role of data monitoring committee in providing oversight for data integrity (Table 3, statement 26).In another example, where two statistics experts disagreed over the interpretation of the underlying evidence [37,38].used to formulate the statement concerning statistical significance (Table 3, statement 33), the overall level of support just crossed the threshold for consensus (69.2% approval in the second round).For implementing this statement, examples of valid analytic strategies in the presence of Fig. 3 Flowchart of the development process for the international multi-stakeholder consensus statement on clinical trial integrity multiple outcomes reported in the published literature can be helpful [39][40][41].The use of the umbrella review [19] added breadth and objectivity [42].For example, the statement concerning the input of professional medical writers arose from a systematic review (Table 3, statement 40) [19].It did not emerge from the input of any stakeholder.If a reader suspects a conflict of interest, we provide all the disclosures of stakeholders' interests (Appendix S1).Another criticism may be that the stakeholders may have been too lenient, inclined to promote integrity softly, instead of creating challenges for researchers, committees, publishers, etc. through hardto-implement recommendations.By explicitly reporting the agreement levels and openly sharing the consensus data, we intended to maximise transparency for readers.The consensus statement would, no doubt, need updating and revisions in the future.
Our strength is that we captured integrity issues across the RCT lifecycle, advancing on previous general statements [2,3].Using established, scientifically based consensus techniques [20][21][22][23][24], we developed a specific statement that is comprehensive, methodologically replicable and transparently reported (see appendices concerning author contributions, disclosure statements and data sharing).The umbrella review [19] contributed a high proportion of statements to those provided by stakeholders, who had a wide and appropriate range of expertise and experience including consumer representation [43].It is important to note that stakeholders themselves were not authors of RCTs with active expressions of concerns or retractions related to integrity.We appreciate that the location of the final consensus meeting, Cairo, may bring Egyptian research under focus.In this regard it is important to factually examine the retraction landscape.The current distribution of numbers of retracted clinical studies in the Retraction Watch Database [44].shows that USA, Japan and China rank at the top, not Egypt (Fig. 4).The consensus statement is useable by any interested party as it gives general guidance applicable in the RCT research discipline.As an explanatory example, just because BJOGhas the word British in its name and the journal has a historical and physical base inside the British territory, this does not mean that its published articles only pertain to or have implications for British women or British obstetrics-gynaecology practice.Therefore, we do not anticipate that this will affect the generalizability of our consensus statement.The lay member of the stakeholder group (NM) had experience of representing patients and public in research [27], assisting trialists in design and conduct, serving as member of oversight committees and scoring RCT grant applications for funding.
Surveys were anonymised with objectively determined statement approval thresholds and subjected to sensitivity analysis.Several statistics are available in the literature to determine the degree of consensus among respondents within a panel, including stipulated number of rounds, subjective analysis, APMO, mode, mean/median rating and others [23].Our chosen statistics, APMO and the predefined arbitrary threshold, are among the most commonly used [23].Additionally, we used IQR to quantify the strength of agreement among the stakeholders Table 2 Statements reaching consensus according to the different approval thresholds for agreement in the multi-stakeholder international consensus concerning clinical trial integrity APMO Average percent of majority opinions, IQR Interquartile range a In this computation, a statement was considered as agreed if the majority (> 50%) of stakeholders responded 'strongly agree' or 'agree' on the 7-point scale.A statement was considered as disagreed if the majority (> 50%) of stakeholders responded 'disagree' or 'strongly disagree' on the 7-point scale.The APMO approval threshold was calculated as: sum of majority agreed and majority disagreed statements/total number of responses received × 100%.APMO approval thresholds were 76.4% in Delphi first round and 68.4% in Delphi second round b Interquartile range (IQR) of the responses in the 7-point scale.In this computation, IQR 0 (> 50% stakeholders gave the same responses) indicated very good strength of agreement; IQR 1 (> 50% stakeholders range of responses was ≤ 2 points of the scale) indicated good strength of agreement; IQR ≥ 2 (> 50% stakeholders gave responses > 2 points of the scale) indicated poor strength of agreement c Predefined arbitrary approval threshold was > 70%

Analysis
Number

Interpretation of the findings
Our statement provides the agreed set of values and concepts concerning integrity of RCT.For guiding behaviour, each stakeholder organisation would need to prepare manuals with specifications of the conduct that must be adhered to when participating in and carrying out RCTs [45].Thus, the principles summarised in our work serve as a basis for creating implementation plans, manuals, standards and policies at stakeholder institutions and organisations to help inculcate integrity in RCTs.Researchers, institutions, agencies and publishers have integrated and interconnected roles in maintaining RCT integrity.Collaboration and harmonisation are essential in dealing with the complexities and barriers.An example of an attempt to create such a standard operating procedure document already exists [46].which will need updating in light of our consensus statements.It is necessary to invest in the clinical research infrastructure required to support trustworthy RCTs.Protecting and promoting RCT integrity requires a multifaceted approach, e.g. a combination of continuing education in best research practice in clinical trials targeting a range of audiences, improved governance and audit, automation of integrity checks in manuscripts of RCTs, and editor and peer-reviewer training in methodology.(Un) intentional errors can be reduced but cannot completely be eliminated.Admission of mistakes without the risk of persecution is a key aspect of continuous improvement [47].To improve RCT credibility in health research, strategies to reduce the probability of errors are urgently required [48].something that our statement emphasises.As far as trial oversight is concerned, the statement suggests that ethics committees, in addition to their traditional protocol appraisal and approval function before a trial can begin, should be given a role in monitoring the conduct of the trial.Deliberations of the trial oversight committees should be formally documented and, in the future, may need to be made publicly available during the course of the trial to match the growing transparency demands.On completion of the trial, chairs of ethics and oversight committees may provide certificates of authenticity to the authors for submission with their trials' manuscripts.
The statement recognises biomedical journals as key stakeholders in RCT integrity, as is obvious from the proportion of editors and peer reviewers represented on our consensus group.It was recognised that majority of the journals' instructions to authors lacked sufficient detail to guide trialists to report their trial findings with integrity [49].This was specifically highlighted to be the case for the information related to reporting of ethics approval, sources of finding, potential conflict of interests, trial registration and statistical analysis plans [49][50][51][52][53].In this regard, it is also foreseeable that journals in the future will develop and implement automated checks for RCT integrity just as they have done for the detection of plagiarism [54,55].
When an allegation of possible scientific misconduct is made, journals have an obligation to investigate in an unbiased manner with an explicit policy about managing conflicts of interests of their editors, peer reviewers and advisors.Our statement advises authors to actively engage with journal investigation process and submit their de-identifiable raw data to be examined if required.As a matter of good practice with respect to promoting transparency, authors can voluntarily electronically submit their data in a repository at the same time as submission of the trial manuscript.There is no logical reason to not be proactive, waiting for this to be made a mandatory requirement, which no doubt is the natural next step in the development of the ICMJE data sharing statement [56].Hopefully, it will help in reducing the risk of complaints.
The reported prevalence of scientific misconduct is 2-14% [57].During an investigation misconduct may appear obvious, for example when repeated duplications of observations (coping and pasting of rows and columns) or a formula to generate false data in a spreadsheet raise suspicion.However, in every case before arriving at a decision about flagging an RCT as being fraudulent a careful investigation of the raw data is required.If tools for detecting misconduct perform poorly, this would lead to false positive findings [58].Wrongful accusations will damage science and healthcare [47,59].Accurately detecting misconduct should therefore be a focus of future research to support peer review and evaluation of post-publication concerns.Education in good research ethics, governance and monitoring may be currently more effective in generating trustworthy randomised evidence [60,61].

Conclusion
Implementation of this international multi-stakeholder consensus will contribute to the enhancement of clinical trial integrity.

Fig. 2
Fig. 2 Geographical distribution of the stakeholder group in the international multi-stakeholder consensus statement on clinical trial integrity: (A) according to posting at the time of the consensus; and (B) according to relevant professional experience (only data of voting members reported)

Fig. 4
Fig. 4 The number of retracted clinical studies per country based on Retraction Watch Database (http:// retra ction datab ase.org, data extracted on February 2nd, 2023) Page 2 of 15 Khan and for the Cairo Consensus Group on Research Integrity Middle East Fertility Society Journal (2024) 29:20

Table 3
Statements concerning clinical trial integrity from a multi-stakeholder international consensus (n = 81)

Table 3
(continued)The agreement percentage (78.9%, the median of 88.5, 84.6, 73.08 and 61.54%) represents data for a merged statement containing four statements, two approved in the first round (related to prospective registration, 88.5 and 84.6%) and the other two approved in the second round (related to the policy, 73.08 and 61.54% in the first round and they passed the approval threshold in the second round with 80.77 and 69.23%).The strength of agreement among stakeholders for those statements approved in the second round was poor in the first round and good/poor in the second round (see ' Methods' and Table2for details) The agreement percentage (76.9%, the median of 84.6 and 69.2%) represents data for a merged statement containing two statements, one approved in the first round (related to standard reporting guidelines, 84.6%) and the other approved in the second round (related to specific extensions, 69.2% in the first round and it passed the approval threshold in the second round with 69.2%).The strength of agreement among stakeholders for the specific extensions statement was good in the first round and poor in the second round (see ' Methods' and Table2for details) a Agreement (%) for the Delphi rounds is the percentage of the sum of the 'strongly agree' and 'agree' responses provided on the 7-point scale for the approval of each individual statement by the stakeholders.Agreement (%) for the consensus meeting is the percentage of votes casted in favour of the total votes b List of references is provided in Appendix S3; SPS: Statement provided by stakeholders c Median agreement (%) is shown for several merged statements d Strength of agreement among stakeholders poor (see ' Methods' and Table 2 for details) e f Systematic review classified as 'high' to 'moderate' quality according to modified AMSTAR-2 Núñez-Núñez M, Maes-Carballo M, Mignini LE, Chien PF, Khalaf Y, Fawzy M, et al.Research integrity in randomised clinical trials: a scoping umbrella review.IJGO.2023.https:// doi.org/ 10. 1002/ ijgo.14762 g n/a means not applicable, statement was provided by a stakeholder after the first or the second Delphi rounds h