Translate this page into:
Development & validation of a format for reporting endoscopic colonic biopsies
For correspondence: Dr Kishore Khatri, Department of Pathology, Dr S N Medical College, Jodhpur 342 001, Rajasthan, India e-mail: dr_kishore_khatri2002@yahoo.co.in
-
Received: ,
Accepted: ,
Abstract
Background & objectives
Non-neoplastic diseases make a considerable part of daily workload of gastroenterologist and an endoscopist. As there are only a few endoscopic findings in literature to suggest a large era of colonic diseases, endoscopic biopsy is a must, to reach a definitive diagnosis. This needs a checklist or a similar format that contains all the important histological features to be seen in a colonic biopsy which is currently lacking in published literature. Hence, this study aimed to develop a format for reporting endoscopic colonic biopsies a first of a kind as per our knowledge particularly for non-neoplastic colonic diseases using modified kappa statistics.
Methods
Seventy one questions were included in this format after searching in various search engines using various phrases. These questions were reviewed by experts and changes were done accordingly. The finalized questionnaire was further shared with 20 subject matter experts. Their feedback was utilized to determine the Content Validity Index (CVI), calculated at both the item level (I-CVI) and the overall scale level (S-CVI), along with the modified kappa coefficient. For studies involving more than six experts, an I-CVI of 0.78 and an S-CVI/average of 0.9 were considered acceptable benchmarks.
Results
Fourteen out of 20 experts responded. Mean I-CVI for relevance across all items was 0.933, S-CVI/Average (based on proportion data) across all experts was 0.94 and Mean I-CVI was well above 0.78 (0.928).
Interpretation & conclusions
The scores indicated a strong agreement among experts on various histological features to be seen in an endoscopic colonic biopsy. These findings clearly indicates that the format met the content validity criteria and hence histological sections of endoscopic colonic biopsies can be read using this format.
Keywords
Endoscopic colonic biopsy
non-neoplastic colonic diseases
questionnaire
survey
validation
Gastrointestinal diseases are common in normal population, especially in developing countries, therefore endoscopic biopsy is a must to reach a definitive diagnosis1,2. Due to this, endoscopic colonic biopsies form a significant bulk for a gastropathologist3. Colonoscopic biopsies not only aid in distinguishing inflammatory, infectious, and neoplastic conditions but also provide essential information for disease classification, staging, and therapeutic decision-making4. Early and accurate histopathological interpretation is crucial, as many gastrointestinal disorders present with overlapping clinical and endoscopic features.5
For a gastropathologist, at the beginning, there may be a tendency to think and see those features that are suggestive of common diseases encountered in routine practice. Hence many a time the uncommon histological features are ignored, especially if they are subtle.
For this we tried to find a comprehensive checklist or algorithm, but it was unavailable literature. Hence after searching many articles and having revisited the histological features, we undertook to develop a checklist/format that enlists the features which are to be looked for in every colonic biopsy regardless of what differential has been provided by the endoscopist.
This study undertook to outline the process of developing the format and evaluating its content validity6-8. The content validity index represents a statistical measure indicating how much agreement exists among the subject matter experts7. Typically, it is computed by having experts rate the relevance of each item6, followed by the calculation of the Content Validity Index (CVI) and the Modified Kappa Statistic (MKS), incorporating the probability of chance agreement as outlined by Polit et al6,7.
Materials & Methods
This study was conducted at the department of Pathology, Dr SN Medical College, Jodhpur, Rajasthan, India from September 2023 to December 2024.
Literature review to identity relevant questions
Two pathologists (KK and BN) with a special interest in gastropathology conducted a literature search across multiple databases, including PubMed, Cochrane Library, EMBASE, Google Scholar, and the Stanford Library. Based on the compiled diagnostic information, they developed the initial format by identifying relevant questions. The keywords and search terms employed included ‘Colon Biopsy Interpretation’, ‘Algorithm for Colon biopsy reporting’, ‘Inflammatory Bowel Disease’, ‘Ulcerative Colitis’, ‘Crohn’s Disease’, ‘Crohn’s vs. Tuberculosis’, ‘Tubercular Colitis’, ‘Granulomas in colon’, ‘Infectious Granulomas in Colon’, ‘Dysplasia colon’, ‘Histologic Activity index in IBD’, ‘Ischemic Colitis’, ‘GVHD in Colon’ etc. Few references were manually searched. Around 15 articles were found including original articles, reviews, meta-analyses, case reports, etc.
Identification of experts
Informed consent was taken telephonically from all experts before sending them Google form. Their participation was voluntary. Six participants didn’t respond as they believed that they didn’t have enough expertise.
Defining questions for checklist
Initially 27 questions (Items) were framed which were increased to 71 questions after various rounds of suggestions from experts and literature search to make it more extensive and comprehensive. Further suggestions from the participating researcher colleagues included in this study resulted in adding same multiple choice type answers for every questions to make it easy to respond. The final questionnaire was a Google form, and the responses were recorded as grades/scores on a 5 point Likert Scale for relevance. This was repeated for all the 71 items. One point was assigned for strongly disagree, 2 for disagree; 3 for neutral (don’t know/may or may not be included); 4 for agree and 5 for strongly agree. Further review of questionnaire resulted in making sections and adding related points under the relevant section. Then questionnaire was sent to junior and middle level faculty members of Pathology Department of Dr S N Medical College, Jodhpur to look for any spelling mistake, repeat question, question requiring reframing, etc., and the suggestions were accepted and necessary changes were done. The final questionnaire took four months to complete ready to be sent to experts to respond (From March 2024 to June 2024).
Questionnaire validation
The final version of questionnaire (supplementary material) was sent to 20 histopathologists across India, who either had a long experience in the respective subject or belonged to institutes of national importance. The experts responded and reverted after assessing and grading the questionnaire (Google forms shared on mail or WhatsApp). The expected response rate was 50 per cent and we got responses from 14 experts out of 20 (70%).
Content validation
Content validity assesses whether the items in the format sufficiently represent the domain of interest. In developing this format, the two-step methodology described by Armstrong et al9 was utilized. In the first stage we did synthesis of sections, followed by questions in each section; following this, the items were evaluated by experts, and their validity was quantitatively determined using the CVI and MKS.
For statistical calculation, Likert response point 4 and 5 (Agree and strongly agree) were considered as agreement and given 1 point, while Likert response points 1, 2 and 3 (Strongly disagree, disagree and neutral) were considered non-agreement and given 0 point. The data were analyzed using Microsoft Excel spreadsheet Mac Version 16.36 (Microsoft Corporation; Redmond, Washington; United States).
The various statistical calculations done as follows with their formulae: (i) Experts in agreement were determined by summing the relevance ratings assigned to each item; (ii) Universal Agreement (UA) An item received a score of ‘1’ if there was complete agreement among all experts, and a score of ‘0’ was assigned if even one expert disagreed. (iii) I-CVI (Item wise-content validity index): represents the fraction of experts who concurred on an item, calculated by dividing agreed items by the total experts.(iv) Scale-wise CVI (S-CVI/Ave): (based on I-CVI): Calculated by averaging the I-CVI scores across all items, i.e., the total of I-CVI values divided by the number of items; (v) S-CVI/Average (by proportional relevance method): obtained by summing the proportional relevance scores and dividing by the total number of experts; S-CVI/UA (Scale wise- content validity index based on universal agreement method): Sum of UA scores/No of items. So, in summary when the number of experts is at least nine, the Item-level Content Validity Index (I-CVI) for each item, as well as the Scale-level CVI (S-CVI) calculated either by averaging the I-CVIs across items (S-CVI/Ave) or by universal agreement among experts (S-CVI/UA), should all exceed the threshold of 0.78. This cutoff is widely accepted in the literature as indicating adequate content validity. Values below this level suggests insufficient expert agreement on the relevance of the items10,11.
To evaluate expert consensus on each question in all sections, the MKS was utilized.
Kappa was calculated using the following formula:
where Pc is the probability of chance agreement on relevance calculated by formula:6,7;
N= number of experts; A=No of experts in agreement.
Kappa values were interpreted on a scale ranging from 0 to 1, where values between 0 and 0.2 indicated no agreement, 0.21 to 0.39 represented minimal agreement, and 0.4 to 0.59 corresponded to weak agreement. Moderate agreement was reflected by values from 0.6 to 0.79, while strong agreement falls between 0.8 and 0.9. Values above 0.9 suggest almost perfect agreement, with a value of 1 indicating perfect agreement among experts.
In this study expected K value was >0.8 to consider for strong agreement among experts on relevance of various questions/histological features to be looked in for an endoscopic colonic biopsy.
Probability of chance agreement is typically experienced low to avoid inflation of the I-CVI by random consensus. In our study, we obtained Pc values below 0.006, indicating a negligible likelihood of experts agreeing purely by chance. This very low Pc contributed to a modified Kappa value (κ) greater than 0.80, reflecting excellent agreement beyond chance and confirming the robustness of the content validity.
Results
The reverting experts were from both Government (govt) and private teaching medical institutes as well as consultants in private hospitals and institutes of national importance like All India Institute of Medical sciences, Rajiv Gandhi Cancer Institute etc. All respondents had wide experience ranging from 10-50 yr with more than at least 10 yr of experience in reporting endoscopic GI biopsies. The calculated statistic for every question is given in table.
| Question number | Experts in agreement | UA (universal agreement score) | I-CVI | Probability chance agreement | Modified kappa statistics | Inference |
|---|---|---|---|---|---|---|
| 1 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 2 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 3 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 4 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 5 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 6 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 7 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 8 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 9 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 10 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 11 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 12 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 13 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 14 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 15 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 16 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 17 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 18 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 19 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 20 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 21 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 22 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 23 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 24 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 25 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 26 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 27 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 28 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 29 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 30 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 31 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 32 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 33 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 34 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 35 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 36 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 37 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 38 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 39 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 40 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 41 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 42 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 43 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 44 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 45 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 46 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 47 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 48 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 49 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 50 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 51 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 52 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 53 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 54 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 55 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 56 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 57 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 58 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 59 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 60 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 61 | 13/14 | 0 | 0.93 | 0.00085 | 0.92994 | Almost perfect |
| 62 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 63 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 64 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 65 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 66 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 67 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 68 | 12/14 | 0 | 0.86 | 0.0056 | 0.859212 | Strong agreement |
| 69 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 70 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
| 71 | 14/14 | 1 | 1.00 | 0.000061 | 1 | Perfect agreement |
S-CVI/Ave, derived from I-CVI, was obtained by averaging the I-CVI scores for all items and was found to be 0.933 which was above the cut-off of 0.78 and indicates strong agreement. The S-CVI/Ave, calculated using proportion data represents the average of relevance proportions across all experts was 0.94, which shows strong overall agreement among experts.
S-CVI/UA refers to the overall average of universal agreement scores across the entire set of items: 31/71=0.4366.
Mean I-CVI was well above 0.78 (0.928) which further indicates a strong agreement among experts on various histological features to be seen in endoscopic colonic biopsy.
Discussion
This questionnaire was developed in-house and no such comprehensive format for reporting endoscopic colonic biopsies is so far available in literature as per our knowledge. We have copyright for this format. There are certain histologic features (questions), where there is no universal agreement among all experts. This is explained by the fact that these features are of non-specific nature and do not indicate a specific diagnosis, but authors believe that reporting such features at least indicate pathogenic mechanisms for explaining patient’s symptoms. S-CVI/UA which is indicative of UA (Universal agreement) scores across all items was 0.4366 which was less than expected. This can be explained by the fact that as it was calculated by Sum of UA scores/No of items; our format contained 71 items hence a larger denominator probably resulted in low values. The use of S-CVI/UA is often criticized because it is overly conservative; even a single expert’s disagreement can result in a low index, especially when the number of experts is high. This can underestimate the actual content validity of the scale12. Agreement on some of the items is also subjective which could have resulted in low value for S-CVI/UA. For example, for an item like number of tissue sections examined, some expert may not prefer to mention it in report, but they didn’t give it a score of 1 or 2 (Disagree and strongly disagree), rather they scored it 3. This was the one which was scored as 0 in universal agreement score. However, it will not affect the basic message, as the item was still considered relevant by the majority, and the overall content validity indices (I-CVI, S-CVI/Ave, and modified Kappa) remained above the acceptable threshold. There is no established consensus regarding the ideal size of an expert panel. The Delphi literature indicates that panel sizes can vary widely, ranging from a small number of participants to several hundred. However, in the context of a relatively homogeneous group, a sample size of 10 to 15 experts is generally considered adequate for generating reliable input6,7,10-13.
Overall, this reporting format is an extensive and comprehensive format for reporting endoscopic colonic biopsies and it includes histological features of all common as well as uncommon non-neoplastic, inflammatory disorders of colon. The content of this format has been validated statistically. Hence authors believe that histological features seen with the help of this format and their interpretation in conjunction with clinical and endoscopic features will help not missing out any diagnosis and to reach a conclusive diagnosis.
Declaration
This format received copyright in the name of corresponding author (KK) from INDIA (L-129706/2023) and Patent from UK (No. 6397536-2024).
Financial support & sponsorship
None.
Conflicts of Interest
None.
Use of Artificial Intelligence (AI)-Assisted Technology for manuscript preparation
The authors confirm that there was no use of AI-assisted technology for assisting in the writing of the manuscript and no images were manipulated using AI.
References
- Endoscopy in inflammatory bowel disease: Indications, surveillance, and use in clinical practice. Clin Gastroenterol Hepatol. 2005;3:11-24.
- [Google Scholar]
- Endoscopy in inflammatory bowel disease when and why. World J Gastrointest Endosc. 2012;4:201-11.
- [Google Scholar]
- Gastrointestinal pathology: a continuing challenge. Arch Pathol Lab Med. 2010;134:812-4.
- [Google Scholar]
- Endoscopic activity in inflammatory bowel disease: clinical significance and application in practice. Clin Endosc. 2022;55:480-88.
- [Google Scholar]
- The content validity index: are you sure you know what’s being reported? Critique and recommendations. Res Nurs Health. 2006;29:489-97.
- [Google Scholar]
- Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Res Nurs Health. 2007;30:459-67.
- [Google Scholar]
- Pathology of inflammatory bowel diseases (IBD): variability with time and treatment. Colorectal Dis. 2001;3:2-12.
- [Google Scholar]
- Instrument review: getting the most from a panel of experts. Applied Nursing Research. 1992;5:194-7.
- [Google Scholar]
- Content validity of self-report measurement instruments: an illustration from the development of the brain tumor module of the M.D. Anderson symptom inventory. Oncol Nurs Forum. 2005;32:669-76.
- [Google Scholar]
- Challenges in content validity of health-related questionnaires: a methodological review. J Nurs Meas. 2021;29:345-360.
- [Google Scholar]
- Hsu CC, Sandford BA. The Delphi technique: making sense of consensus. Practical Assessment, Research, and Evaluation 2007; 12 : 10. Available from: https://doi.org/10.7275/pdz9-th90, accessed on August 20, 2025.
