The interpretability of deep neural networks has become a subject of greatinterest within the medical and healthcare domain. This attention stems fromconcerns regarding transparency, legal and ethical considerations, and themedical significance of predictions generated by these deep neural networks inclinical decision support systems. To address this matter, our study delvesinto the application of four well-established interpretability methods: LocalInterpretable Model-agnostic Explanations (LIME), Shapley Additive exPlanations(SHAP), Gradient-weighted Class Activation Mapping (Grad-CAM), and Layer-wiseRelevance Propagation (LRP). Leveraging the approach of transfer learning witha multi-label-multi-class chest radiography dataset, we aim to interpretpredictions pertaining to specific pathology classes. Our analysis encompassesboth single-label and multi-label predictions, providing a comprehensive andunbiased assessment through quantitative and qualitative investigations, whichare compared against human expert annotation. Notably, Grad-CAM demonstratesthe most favorable performance in quantitative evaluation, while the LIMEheatmap score segmentation visualization exhibits the highest level of medicalsignificance. Our research underscores both the outcomes and the challengesfaced in the holistic approach adopted for assessing these interpretabilitymethods and suggests that a multimodal-based approach, incorporating diversesources of information beyond chest radiography images, could offer additionalinsights for enhancing interpretability in the medical domain.