publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2026
- EACLPersona Prompting as a Lens on LLM Social ReasoningJing Yang, Moritz Hechtbauer, Elisabeth Khalilov, and 3 more authorsIn Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Mar 2026
For socially sensitive tasks like hate speech detection, the quality of explanations from Large Language Models (LLMs) is crucial for factors like user trust and model alignment. While Persona prompting (PP) is increasingly used as a way to steer model towards user-specific generation, its effect on model rationales remains underexplored. We investigate how LLM-generated rationales vary when conditioned on different simulated demographic personas. Using datasets annotated with word-level rationales, we measure agreement with human annotations from different demographic groups, and assess the impact of PP on model bias and human alignment. Our evaluation across three LLMs results reveals three key findings: (1) PP improving classification on the most subjective task (hate speech) but degrading rationale quality. (2) Simulated personas fail to align with their real-world demographic counterparts, and high inter-persona agreement shows models are resistant to significant steering. (3) Models exhibit consistent demographic biases and a strong tendency to over-flag content as harmful, regardless of PP. Our findings reveal a critical trade-off: while PP can improve classification in socially-sensitive tasks, it often comes at the cost of rationale quality and fails to mitigate underlying biases, urging caution in its application.
@inproceedings{yang-etal-2026-persona, title = {Persona Prompting as a Lens on {LLM} Social Reasoning}, author = {Yang, Jing and Hechtbauer, Moritz and Khalilov, Elisabeth and Brinkmann, Evelyn Luise and Schmitt, Vera and Feldhus, Nils}, editor = {Demberg, Vera and Inui, Kentaro and Marquez, Llu{\'i}s}, booktitle = {Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)}, month = mar, year = {2026}, address = {Rabat, Morocco}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2026.eacl-long.52/}, doi = {10.18653/v1/2026.eacl-long.52}, pages = {1152--1170}, }
2025
- Self-Rationalization in the Wild: A Large-scale Out-of-Distribution Evaluation on NLI-related tasksJing Yang, Max Glockner, Anderson Rocha, and 1 more authorTransactions of the Association for Computational Linguistics, Mar 2025
@article{yang2025self, title = {Self-Rationalization in the Wild: A Large-scale Out-of-Distribution Evaluation on NLI-related tasks}, author = {Yang, Jing and Glockner, Max and Rocha, Anderson and Gurevych, Iryna}, journal = {Transactions of the Association for Computational Linguistics}, volume = {13}, pages = {314--342}, year = {2025}, } - FEVERExploring Semantic Filtering Heuristics For Efficient Claim VerificationMax Upravitelev, Premtim Sahitaj, Arthur Hilbert, and 6 more authorsIn Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER), Jul 2025
Given the limited computational and financial resources of news agencies, real-life usage of fact-checking systems requires fast response times. For this reason, our submission to the FEVER-8 claim verification shared task focuses on optimizing the efficiency of such pipelines built around subtasks such as evidence retrieval and veracity prediction. We propose the Semantic Filtering for Efficient Fact Checking (SFEFC) strategy, which is inspired by the FEVER-8 baseline and designed with the goal of reducing the number of LLM calls and other computationally expensive subroutines. Furthermore, we explore the reuse of cosine similarities initially calculated within a dense retrieval step to retrieve the top 10 most relevant evidence sentence sets. We use these sets for semantic filtering methods based on similarity scores and create filters for particularly hard classification labels “Not Enough Information” and “Conflicting Evidence/Cherrypicking” by identifying thresholds for potentially relevant information and the semantic variance within these sets. Compared to the parallelized FEVER-8 baseline, which takes 33.88 seconds on average to process a claim according to the FEVER-8 shared task leaderboard, our non-parallelized system remains competitive in regard to AVeriTeC retrieval scores while reducing the runtime to 7.01 seconds, achieving the fastest average runtime per claim.
@inproceedings{upravitelev-etal-2025-exploring, title = {Exploring Semantic Filtering Heuristics For Efficient Claim Verification}, author = {Upravitelev, Max and Sahitaj, Premtim and Hilbert, Arthur and Solopova, Veronika and Yang, Jing and Feldhus, Nils and Anikina, Tatiana and Ostermann, Simon and Schmitt, Vera}, editor = {Akhtar, Mubashara and Aly, Rami and Christodoulopoulos, Christos and Cocarascu, Oana and Guo, Zhijiang and Mittal, Arpit and Schlichtkrull, Michael and Thorne, James and Vlachos, Andreas}, booktitle = {Proceedings of the Eighth Fact Extraction and {VER}ification Workshop ({FEVER})}, month = jul, year = {2025}, address = {Vienna, Austria}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.fever-1.17/}, doi = {10.18653/v1/2025.fever-1.17}, pages = {229--237}, } - SDPComparing LLMs and BERT-based Classifiers for Resource-Sensitive Claim Verification in Social MediaMax Upravitelev, Nicolau Duran-Silva, Christian Woerle, and 5 more authorsIn Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), Jul 2025
The overwhelming volume of content being published at any given moment poses a significant challenge for the design of automated fact-checking (AFC) systems on social media, requiring an emphasized consideration of efficiency aspects. As in other fields, systems built upon LLMs have achieved good results on different AFC benchmarks. However, the application of LLMs is accompanied by high resource requirements. The energy consumption of LLMs poses a significant challenge from an ecological perspective, while remaining a bottleneck in latency-sensitive scenarios like AFC within social media. Therefore, we propose a system built upon fine-tuned smaller BERT-based models. When evaluated on the ClimateCheck dataset against decoder-only LLMs, our best fine-tuned model outperforms Phi 4 and approaches Qwen3 14B in reasoning mode — while significantly reducing runtime per claim. Our findings demonstrate that small encoder-only models fine-tuned for specific tasks can still provide a substantive alternative to large decoder-only LLMs, especially in efficiency-concerned settings.
@inproceedings{upravitelev-etal-2025-comparing, title = {Comparing {LLM}s and {BERT}-based Classifiers for Resource-Sensitive Claim Verification in Social Media}, author = {Upravitelev, Max and Duran-Silva, Nicolau and Woerle, Christian and Guarino, Giuseppe and Mohtaj, Salar and Yang, Jing and Solopova, Veronika and Schmitt, Vera}, editor = {Ghosal, Tirthankar and Mayr, Philipp and Singh, Amanpreet and Naik, Aakanksha and Rehm, Georg and Freitag, Dayne and Li, Dan and Schimmler, Sonja and {De Waard}, Anita}, booktitle = {Proceedings of the Fifth Workshop on Scholarly Document Processing ({SDP} 2025)}, month = jul, year = {2025}, address = {Vienna, Austria}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.sdp-1.26/}, doi = {10.18653/v1/2025.sdp-1.26}, pages = {281--287}, } - CLEFdfkinit2b at CheckThat! 2025: Leveraging LLMs and Ensemble of Methods for Multilingual Claim NormalizationTatiana Anikina, Ivan Vykopal, Sebastian Kula, and 6 more authorsIn Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2025), Sep 2025
The rapid spread of misinformation on social media across languages presents a major challenge for fact-checking efforts. Social media posts are often noisy, informal, and unstructured, with irrelevant content, making it difficult to extract concise, verifiable claims. To address this, the CLEF 2025 CheckThat! Shared Task on Multilingual Claim Extraction and Normalization focuses on transforming social media posts into normalized claims, short, clear and check-worthy statements that capture the essence of potentially misleading content. In this paper, we investigate several approaches to this task, including parameter-efficient fine-tuning, prompting large language models (LLMs), and an ensemble of methods. We evaluate our approaches in two settings: monolingual, where we are provided with training and validation data, and the zero-shot setting, where no training data is available for the target language. Our approaches achieved first place in 6 out of 13 languages in the monolingual setting and ranked second or third in the remaining languages. In the zero-shot setting, we achieved the highest performance across all seven languages, demonstrating strong generalization to unseen languages.
@inproceedings{anikina-etal-2025-dfkinit2b, title = {{dfkinit2b} at {CheckThat}! 2025: Leveraging {LLM}s and Ensemble of Methods for Multilingual Claim Normalization}, author = {Anikina, Tatiana and Vykopal, Ivan and Kula, Sebastian and Chikkala, Ravi Kiran and Skachkova, Natalia and Yang, Jing and Solopova, Veronika and Schmitt, Vera and Ostermann, Simon}, booktitle = {Working Notes of the Conference and Labs of the Evaluation Forum ({CLEF} 2025)}, year = {2025}, month = sep, address = {Madrid, Spain}, publisher = {CEUR-WS.org}, pages = {804--825}, url = {https://ceur-ws.org/Vol-4038/paper_62.pdf}, }
2024
- WIFSTake It Easy: Label-Adaptive Self-Rationalization for Fact Verification and Explanation GenerationJing Yang, and Anderson RochaIn 2024 IEEE International Workshop on Information Forensics and Security (WIFS), Sep 2024
@inproceedings{yang2024take, title = {Take It Easy: Label-Adaptive Self-Rationalization for Fact Verification and Explanation Generation}, author = {Yang, Jing and Rocha, Anderson}, booktitle = {2024 IEEE International Workshop on Information Forensics and Security (WIFS)}, pages = {1--6}, year = {2024}, organization = {IEEE}, }
2023
- The age of synthetic realities: Challenges and opportunitiesJoão Phillipe Cardenuto, Jing Yang, Rafael Padilha, and 8 more authorsAPSIPA Transactions on Signal and Information Processing, Sep 2023
@article{cardenuto2023age, title = {The age of synthetic realities: Challenges and opportunities}, author = {Cardenuto, Jo{\~a}o Phillipe and Yang, Jing and Padilha, Rafael and Wan, Renjie and Moreira, Daniel and Li, Haoliang and Wang, Shiqi and Andal{\'o}, Fernanda and Marcel, S{\'e}bastien and Rocha, Anderson and others}, journal = {APSIPA Transactions on Signal and Information Processing}, volume = {12}, number = {1}, year = {2023}, publisher = {Now Publishers, Inc.} }
2022
- ICASSPExplainable Fact-checking through Question AnsweringJing Yang, Didier Vega-Oliveros, Taı́s Seibt, and 1 more authorIn IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Sep 2022
Misleading or false information has been creating chaos in some places around the world. To mitigate this issue, many researchers have proposed automated fact-checking methods to fight the spread of fake news. However, most methods cannot explain the reasoning behind their decisions, failing to build trust between machines and humans using such technology. Trust is essential for fact-checking to be applied in the real world. Here, we address fact-checking explainability through question answering. In particular, we propose generating questions and answers from claims and answering the same questions from evidence. We also propose an answer comparison model with an attention mechanism attached to each question. Leveraging question answering as a proxy, we break down automated fact-checking into several steps — this separation aids models’ explainability as it allows for more detailed analysis of their decision-making processes. Experimental results show that the proposed model can achieve state-of-the-art performance while providing reasonable explainable capabilities.
@inproceedings{yang2021explainable, title = {Explainable Fact-checking through Question Answering}, author = {Yang, Jing and Vega-Oliveros, Didier and Seibt, Ta{\'\i}s and Rocha, Anderson}, booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year = {2022}, organization = {IEEE}, } - WIFSFew-shot Learning for Multi-modal Social Media Event FilteringJosé Nascimento, João Phillipe Cardenuto, Jing Yang, and 1 more authorIn 2022 IEEE International Workshop on Information Forensics and Security (WIFS), Sep 2022
@inproceedings{nascimento2022few, title = {Few-shot Learning for Multi-modal Social Media Event Filtering}, author = {Nascimento, Jos{\'e} and Cardenuto, Jo{\~a}o Phillipe and Yang, Jing and Rocha, Anderson}, booktitle = {2022 IEEE International Workshop on Information Forensics and Security (WIFS)}, pages = {1--6}, year = {2022}, organization = {IEEE} }
2021
- WIFSScalable Fact-checking with Human-in-the-LoopJing Yang, Didier Vega-Oliveros, Tais Seibt, and 1 more authorIn 2021 IEEE International Workshop on Information Forensics and Security (WIFS), Sep 2021
Researchers have been investigating automated solutions for fact-checking in various fronts. However, current approaches often overlook the fact that information released every day is escalating, and a large amount of them overlap. Intending to accelerate fact-checking, we bridge this gap by proposing a new pipeline – grouping similar messages and summarizing them into aggregated claims. Specifically, we first clean a set of social media posts (e.g., tweets) and build a graph of all posts based on their semantics; Then, we perform two clustering methods to group the messages for further claim summarization. We evaluate the summaries both quantitatively with ROUGE scores and qualitatively with human evaluation. We also generate a graph of summaries to verify that there is no significant overlap among them. The results reduced 28,818 original messages to 700 summary claims, showing the potential to speed up the fact-checking process by organizing and selecting representative claims from massive disorganized and redundant messages.
@inproceedings{yang2021scalable, title = {Scalable Fact-checking with Human-in-the-Loop}, author = {Yang, Jing and Vega-Oliveros, Didier and Seibt, Tais and Rocha, Anderson}, booktitle = {2021 IEEE International Workshop on Information Forensics and Security (WIFS)}, year = {2021}, doi = {10.1109/WIFS53200.2021.9648388}, organization = {IEEE}, } - A Inteligência Artificial e os desafios da Ciência Forense Digital no século XXIRafael Padilha, Antônio Theóphilo, Fernanda A Andaló, and 6 more authorsEstudos Avançados, Sep 2021
@article{padilha2021inteligencia, title = {A Intelig{\^e}ncia Artificial e os desafios da Ci{\^e}ncia Forense Digital no s{\'e}culo XXI}, author = {Padilha, Rafael and The{\'o}philo, Ant{\o}nio and Andal{\'o}, Fernanda A and Vega-Oliveros, Didier A and Cardenuto, Jo{\~a}o P and Bertocco, Gabriel and Nascimento, Jos{\'e} and Yang, Jing and Rocha, Anderson}, journal = {Estudos Avan{\c{c}}ados}, volume = {35}, pages = {113--138}, year = {2021}, publisher = {SciELO Brasil} }
2019
- Source identification of 3D printed objects based on inherent equipment distortionFei Peng, Jing Yang, Zi-Xing Lin, and 1 more authorComputers & Security, Sep 2019
The widespread use of 3D printers introduces tremendous challenges for the regulation of illegal products. In the current situation, since it is impossible to completely prohibit users from using 3D printers to manufacture illegal products, source identification of 3D printed products is a possible alternative for regulators to trace the offenders. In this paper, a source identification scheme for 3D printed objects based on inherent equipment distortion is proposed. By investigating the 3D printing process, an equipment distortion model is constructed, and then the inherent equipment distortion is analyzed. Furthermore, in order to exhibit the inherent equipment distortion, a uniform mark is designed and the inherent equipment distortion is extracted. With the features of the inherent equipment distortion of the 3D printers, SVM classifier is employed for the source identification of the 3D printed objects. Experimental results and analysis show that it can obtain an average identification accuracy of 91.1% with the 3D printed objects from 9 printers, and the analysis also indicates that it can achieve satisfactory robustness and reliability.
@article{peng2019source, title = {Source identification of 3D printed objects based on inherent equipment distortion}, author = {Peng, Fei and Yang, Jing and Lin, Zi-Xing and Long, Min}, journal = {Computers \& Security}, doi = {10.1016/j.cose.2018.12.015}, volume = {82}, pages = {173--183}, year = {2019}, publisher = {Elsevier} }
2018
- 3-D printed object authentication based on printing noise and digital signatureFei Peng, Jing Yang, and Min LongIEEE Transactions on Reliability, Sep 2018
With the development of 3-D printing and reverse engineering, the protection of intellectual property of 3-D printed objects is becoming a prominent problem. In order to authenticate 3-D printed objects, an authentication scheme based on printing noise and digital signature is proposed. First, the noises introduced in the 3-D printing and observation are investigated. Thereafter, a special authentication mark is designed for extracting the printing noise. Based on this, a 3-D printed object authentication framework is built and it is composed of two processes-registration and verification. In the registration, the printing noise of the authentication mark is extracted and signed by digital signature. While in the verification, the signature is verified and then the printing noise of the authentication mark is extracted. After that, the extracted printing noise is matched with the one acquired in the registration. Experimental results and analysis show that the proposed scheme can reliably accomplish the authentication of the 3-D printed object with high precision and that it can achieve high security and good robustness.
@article{peng20183, title = {3-D printed object authentication based on printing noise and digital signature}, author = {Peng, Fei and Yang, Jing and Long, Min}, journal = {IEEE Transactions on Reliability}, doi = {10.1109/TR.2018.2869303}, volume = {68}, number = {1}, pages = {342--353}, year = {2018}, publisher = {IEEE} }