Avsnitt

  • ArXiv NLP research for Thursday, June 13, 2024.

    00:20: Chain-of-Though (CoT) prompting strategies for medical error detection and correction

    01:31: CoastTerm: a Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature

    02:52: RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL

    04:01: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

    05:24: Leveraging Explicit Reasoning for Inference Integration in Commonsense-Augmented Dialogue Models

    06:38: Investigating the translation capabilities of Large Language Models trained on parallel data only

    07:56: LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks

    09:09: DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation

    11:20: Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

    12:46: Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations

    13:53: Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn't

    14:47: ReadCtrl: Personalizing text generation with readability-controlled instruction learning

    16:32: Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models

    17:49: Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs

    19:18: End-to-end Streaming model for Low-Latency Speech Anonymization

    20:22: Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

    22:25: On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

    23:33: Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models

    24:35: Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech

    25:47: AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

    27:15: Transformers meet Neural Algorithmic Reasoners

    28:32: REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space

    30:02: Learning from Natural Language Explanations for Generalizable Entity Matching

    31:14: ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models

    32:29: DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

    33:43: Improving Autoregressive Training with Dynamic Oracles

  • ArXiv NLP research for Thursday, June 13, 2024.

    00:20: Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning

    01:53: Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models

    03:26: Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory

    04:33: Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination

    06:05: DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage

    07:26: Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

    08:41: ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

    10:07: An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants

    11:42: Plan, Generate and Complicate: Improving Low-resource Dialogue State Tracking via Easy-to-Difficult Zero-shot Data Augmentation

    12:42: No perspective, no perception!! Perspective-aware Healthcare Answer Summarization

    14:28: Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

    16:02: An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

    17:21: Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors

    18:48: Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning

    19:52: Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation

    21:12: Multi-Agent Software Development through Cross-Team Collaboration

    22:55: LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

    24:14: Bayesian Statistical Modeling with Predictors from LLMs

    25:39: ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models

    27:28: Language Models are Crossword Solvers

    28:32: MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

    29:51: CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts

    31:29: Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?

    32:59: 3M: Multi-modal Multi-task Multi-teacher Learning for Game Event Detection

    34:08: Modeling Comparative Logical Relation with Contrastive Learning for Text Generation

    35:42: SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models

  • Saknas det avsnitt?

    Klicka här för att uppdatera flödet manuellt.

  • ArXiv NLP research for Wednesday, June 12, 2024.

    00:19: VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

    02:05: BookSQL: A Large Scale Text-to-SQL Dataset for Accounting Domain

    03:15: Designing a Dashboard for Transparency and Control of Conversational AI

    04:46: Label-aware Hard Negative Sampling Strategies with Momentum Contrastive Learning for Implicit Hate Speech Detection

    05:51: Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions

    06:53: Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations

    07:52: Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

    08:55: DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning

    10:20: Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA

    11:35: Large Language Model Unlearning via Embedding-Corrupted Prompts

    13:17: Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation

    14:46: Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

    16:02: LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

    17:18: Guiding In-Context Learning of LLMs through Quality Estimation for Machine Translation

    18:37: It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

    20:02: Adversarial Evasion Attack Efficiency against Large Language Models

    21:06: Learning Job Title Representation from Job Description Aggregation Network

    21:59: Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey

    23:35: AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection

    24:38: Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

    25:56: Multimodal Table Understanding

    27:20: CoXQL: A Dataset for Parsing Explanation Requests in Conversational XAI Systems

    28:51: Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling

    30:36: Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets

    31:57: Semi-Supervised Spoken Language Glossification

    33:16: Underneath the Numbers: Quantitative and Qualitative Gender Fairness in LLMs for Depression Prediction

    34:37: A Dialogue Game for Eliciting Balanced Collaboration

    35:23: Transformer-based Model for ASR N-Best Rescoring and Rewriting

    36:16: SumHiS: Extractive Summarization Exploiting Hidden Structure

    36:53: Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling

    38:08: Leveraging Large Language Models for Web Scraping

    39:51: M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation

    41:15: Is Programming by Example solved by LLMs?

    42:29: Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques

    43:42: Towards Unsupervised Speech Recognition Without Pronunciation Models

    44:50: cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers

    45:57: Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models

    47:02: Tailoring Generative AI Chatbots for Multiethnic Communities in Disaster Preparedness Communication: Extending the CASA Paradigm

    48:12: Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

    49:56: TasTe: Teaching Large Language Models to Translate through Self-Reflection

    51:28: OLMES: A Standard for Language Model Evaluations

    52:47: Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

  • ArXiv NLP research for Tuesday, June 11, 2024.

    00:20: Scientific Computing with Large Language Models

    01:08: Speaking Your Language: Spatial Relationships in Interpretable Emergent Communication

    02:19: Bilingual Sexism Classification: Fine-Tuned XLM-RoBERTa and GPT-3.5 Few-Shot Learning

    03:51: Fine-tuning with HED-IT: The impact of human post-editing for dialogical language models

    05:26: Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

    07:03: Joint Learning of Context and Feedback Embeddings in Spoken Dialogue

    07:57: BertaQA: How Much Do Language Models Know About Local Culture?

    09:17: MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting

    10:20: CTC-based Non-autoregressive Textless Speech-to-Speech Translation

    11:21: Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities

    13:27: GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews

    14:40: BvSP: Broad-view Soft Prompting for Few-Shot Aspect Sentiment Quad Prediction

    16:32: When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

    18:01: Limited Out-of-Context Knowledge Reasoning in Large Language Models

    19:36: MINERS: Multilingual Language Models as Semantic Retrievers

    20:42: Learning Domain-Invariant Features for Out-of-Context News Detection

    22:03: Textual Similarity as a Key Metric in Machine Translation Quality Estimation

    23:02: On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations

    24:31: Multimodal Belief Prediction

    25:29: Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing

    26:56: Paraphrasing in Affirmative Terms Improves Negation Understanding

    27:37: CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization

    29:38: TextGrad: Automatic "Differentiation" via Text

    31:35: Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices

    32:35: THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report

    33:51: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

    35:22: Simple and Effective Masked Diffusion Language Models

    36:35: Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena

  • ArXiv NLP research for Tuesday, June 11, 2024.

    00:20: A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

    01:41: Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges

    02:32: A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation

    04:08: Evolving Subnetwork Training for Large Language Models

    05:31: Missingness-resilient Video-enhanced Multimodal Disfluency Detection

    06:37: Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models

    08:14: Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

    09:33: Delving into ChatGPT usage in academic writing through excess vocabulary

    10:53: Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model

    12:12: CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation

    13:26: Effectively Compress KV Heads for LLM

    15:00: Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

    16:54: Reading Miscue Detection in Primary School through Automatic Speech Recognition

    18:09: HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation

    20:01: DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs

    21:15: Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning

    22:35: Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

    24:42: Translating speech with just images

    25:35: Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement

    26:51: Teaching Language Models to Self-Improve by Learning from Language Feedback

    28:25: Merging Improves Self-Critique Against Jailbreak Attacks

    29:18: Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models

    30:11: Improving Autoformalization using Type Checking

    31:37: Improving Commonsense Bias Classification by Mitigating the Influence of Demographic Terms

    33:19: Decipherment-Aware Multilingual Learning in Jointly Trained Language Models

    34:20: DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

    35:20: On the Hallucination in Simultaneous Machine Translation

    36:07: MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs

    37:42: Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

  • ArXiv NLP research for Monday, June 10, 2024.

    00:19: Shoulders of Giants: A Look at the Degree and Utility of Openness in NLP Research

    00:59: HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs

    02:29: The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

    03:24: MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models

    04:51: A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications

    05:49: Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text

    07:10: Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

    09:08: Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

    10:35: Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation

    11:26: Verifiable Generation with Subsentence-Level Fine-Grained Citations

    12:36: Comparing Data Augmentation Methods for End-to-End Task-Oriented Dialog Systems

    13:55: Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation into German

    15:28: Can I understand what I create? Self-Knowledge Evaluation of Large Language Models

    16:28: Language Models Resist Alignment

    17:58: LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

    19:27: Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning

    20:27: Combining Embeddings and Domain Knowledge for Job Posting Duplicate Detection

    21:37: MaskLID: Code-Switching Language Identification through Iterative Masking

    22:49: Multi-Prompting Decoder Helps Better Language Understanding

    24:22: Tx-LLM: A Large Language Model for Therapeutics

    26:21: Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

    27:43: A Parameter-efficient Language Extension Framework for Multilingual ASR

    29:06: MedExQA: Medical Question Answering Benchmark with Multiple Explanations

    30:36: Sustained Vowels for Pre- vs Post-Treatment COPD Classification

    31:49: MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows

    33:40: Symmetric Dot-Product Attention for Efficient Training of BERT Language Models

    35:00: Annotation alignment: Comparing LLM and human annotations of conversational safety

    36:07: mHuBERT-147: A Compact Multilingual HuBERT Model

    37:27: Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue

    39:00: INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition

    40:06: Meta Learning Text-to-Speech Synthesis in over 7000 Languages

    40:59: Controlling Emotion in Text-to-Speech with Natural Language Prompts

    41:55: Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

    43:29: Multimodal Contextualized Semantic Parsing from Speech

    44:25: Interpretability of Language Models via Task Spaces

    45:45: Evaluating the Retrieval Component in LLM-Based Question Answering Systems

    46:52: Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies

    48:08: Can Language Models Serve as Text-Based World Simulators?

  • ArXiv NLP research for Sunday, June 09, 2024.

    00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

    01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

    03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

    05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations

    06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models

    08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

    09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation

    11:20: QGEval: A Benchmark for Question Generation Evaluation

    12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model

    13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization

    14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation

    18:14: Hidden Holes: topological aspects of language models

    19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper

    20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models

    22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering

    23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

    25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain

    26:27: Are Large Language Models Actually Good at Text Style Transfer?

    27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator

    28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction

    30:12: Why Don't Prompt-Based Fairness Metrics Correlate?

    31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

    33:12: Semisupervised Neural Proto-Language Reconstruction

    34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization

    35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training

    36:07: ThaiCoref: Thai Coreference Resolution Dataset

  • ArXiv NLP research for Saturday, June 08, 2024.

    00:19: MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

    01:44: Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets

    02:30: Flexible and Adaptable Summarization via Expertise Separation

    04:18: Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization

    06:07: CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

    07:23: Venn Diagram Prompting : Accelerating Comprehension with Scaffolding Effect

    08:45: VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

    10:19: Planning Like Human: A Dual-process Framework for Dialogue Planning

    11:48: Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

    12:57: Recent advancements in computational morphology : A comprehensive survey

    14:01: MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

    15:41: Design of reliable technology valuation model with calibrated machine learning of patent indicators

    17:08: Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition

    18:59: Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation

    20:25: Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities

    21:47: ThatiAR: Subjectivity Detection in Arabic News Sentences

    23:07: Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts

    24:49: Creativity Has Left the Chat: The Price of Debiasing Language Models

    25:57: CERET: Cost-Effective Extrinsic Refinement for Text Generation

    27:05: GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge?

    28:07: Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

    29:03: ATLAS: Improving Lay Summarisation with Attribute-based Control

  • ArXiv NLP research for Friday, June 07, 2024.

    00:19: Key-Element-Informed sLLM Tuning for Document Summarization

    01:22: Low-Resource Cross-Lingual Summarization through Few-Shot Learning with Large Language Models

    02:42: Large Language Model-guided Document Selection

    04:13: More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play

    05:24: DiNeR: a Large Realistic Dataset for Evaluating Compositional Generalization

    06:43: MATTER: Memory-Augmented Transformer Using Heterogeneous Knowledge Sources

    08:01: Mixture-of-Agents Enhances Large Language Model Capabilities

    09:09: AICoderEval: Improving AI Domain Code Generation of Large Language Models

    11:00: CRAG -- Comprehensive RAG Benchmark

    13:04: CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models

    14:52: Think out Loud: Emotion Deducing Explanation in Dialogues

    16:43: WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

    18:46: SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

    19:58: BERTs are Generative In-Context Learners

    20:43: Annotating FrameNet via Structure-Conditioned Language Generation

    21:49: Revisiting Catastrophic Forgetting in Large Language Model Tuning

    22:43: FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models

    24:33: Do Language Models Exhibit Human-like Structural Priming Effects?

    25:27: Uncertainty Aware Learning for Language Model Alignment

    26:50: The Russian Legislative Corpus

    27:24: ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering

    28:53: HateDebias: On the Diversity and Variability of Hate Speech Debiasing

    30:29: A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

    32:00: Sexism Detection on a Data Diet

    33:18: XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model

    34:21: Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models

    35:32: LLM-based speaker diarization correction: A generalizable approach

    36:52: TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models

    38:10: BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

    39:10: Quantifying Geospatial in the Common Crawl Corpus

    40:14: MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter

    41:47: Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences

    43:19: Compositional Generalization with Grounded Language Models

    44:26: Scenarios and Approaches for Situated Natural Language Explanations

    46:04: Are Large Language Models More Empathetic than Humans?

    47:38: SUMIE: A Synthetic Benchmark for Incremental Entity Summarization

    48:52: Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

    50:33: An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

  • ArXiv NLP research for Thursday, June 06, 2024.

    00:20: The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses

    02:17: Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models

    03:39: Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

    04:36: Intention and Face in Dialog

    05:48: Uncovering Limitations of Large Language Models in Information Seeking from Tables

    07:15: Are We Done with MMLU?

    08:41: Legal Judgment Reimagined: PredEx and the Rise of Intelligent AI Interpretation in Indian Courts

    09:53: Do Language Models Understand Morality? Towards a Robust Detection of Moral Content

    11:47: Every Answer Matters: Evaluating Commonsense with Probabilistic Measures

    12:49: Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness

    14:26: Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness

    15:35: Confabulation: The Surprising Value of Large Language Model Hallucinations

    16:42: DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning

    18:25: Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model

    19:32: ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

    20:50: mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans

    22:21: What Do Language Models Learn in Context? The Structured Task Hypothesis

    23:38: Rethinking LLM and Linguistic Steganalysis: An Efficient Detection of Strongly Concealed Stego

    24:58: BEADs: Bias Evaluation Across Domains

    26:41: FairytaleQA Translated: Enabling Educational Question and Answer Generation in Less-Resourced Languages

    28:03: Benchmark Data Contamination of Large Language Models: A Survey

    29:02: Transformers need glasses! Information over-squashing in language tasks

    30:26: Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

    31:58: Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People

    33:44: ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions

    35:19: What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

    36:41: PaCE: Parsimonious Concept Engineering for Large Language Models

  • ArXiv NLP research for Thursday, June 06, 2024.

    00:20: Efficient Knowledge Infusion via KG-LLM Alignment

    01:25: NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

    02:34: Character-Level Chinese Dependency Parsing via Modeling Latent Intra-Word Structure

    03:30: XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags

    04:59: End-to-End Trainable Soft Retriever for Low-resource Relation Extraction

    06:07: Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning

    07:37: Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

    08:52: ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

    10:29: Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies

    11:39: Lean Workbook: A large-scale Lean problem set formalized from natural language math problems

    12:56: Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

    14:18: Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As

    16:24: Recovering document annotations for sentence-level bitext

    17:40: BLSP-Emo: Towards Empathetic Large Speech-Language Models

    19:01: Decoder-only Streaming Transformer for Simultaneous Translation

    20:28: Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

    21:53: Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

    23:06: How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

    24:13: HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew

    25:19: ArMeme: Propagandistic Content in Arabic Memes

    26:26: Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art

    27:11: UltraMedical: Building Specialized Generalists in Biomedicine

    28:43: Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

    30:02: A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential

    31:29: On The Persona-based Summarization of Domain-Specific Documents

    33:14: Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing

    34:28: American Sign Language Handshapes Reflect Pressures for Communicative Efficiency

  • ArXiv NLP research for Wednesday, June 05, 2024.

    00:19: Improving In-Context Learning with Prediction Feedback for Sentiment Analysis

    01:24: MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge

    03:01: Text Injection for Neural Contextual Biasing

    04:16: 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

    06:03: Adversarial Moment-Matching Distillation of Large Language Models

    07:05: Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models

    08:48: Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese

    09:56: Evaluation of data inconsistency for multi-modal sentiment analysis

    10:55: BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents

    12:11: Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models

    13:16: From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation

    14:20: StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

    15:42: RadBARTsum: Domain Specific Adaption of Denoising Sequence-to-Sequence Models for Abstractive Radiology Report Summarization

    17:00: Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

    18:14: Cryptocurrency Frauds for Dummies: How ChatGPT introduces us to fraud?

    19:48: FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models

    20:59: Space Decomposition for Sentence Embedding

    22:00: Towards Real-world Scenario: Imbalanced New Intent Discovery

    23:40: Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation

    25:20: CSS: Contrastive Semantic Similarity for Uncertainty Quantification of LLMs

    27:03: StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

    28:10: Missci: Reconstructing Fallacies in Misrepresented Science

    29:43: ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction

    30:47: Linking Named Entities in Diderot's \textit{Encyclop\'edie} to Wikidata

    32:06: Error-preserving Automatic Speech Recognition of Young English Learners' Language

    33:37: Document-level Claim Extraction and Decontextualisation for Fact-Checking

    34:45: The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches

    36:09: LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback

    37:39: IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

    39:46: Automating Turkish Educational Quiz Generation Using Large Language Models

    41:34: Cycles of Thought: Measuring LLM Confidence through Stable Explanations

    42:57: Are language models rational? The case of coherence norms and belief revision

    43:58: What is the Best Way for ChatGPT to Translate Poetry?

    45:20: Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types

    46:14: MODABS: Multi-Objective Learning for Dynamic Aspect-Based Summarization

    47:09: BIPED: Pedagogically Informed Tutoring System for ESL Education

    48:24: Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends

    50:00: Wings: Learning Multimodal LLMs without Text-only Forgetting

  • ArXiv NLP research for Tuesday, June 04, 2024.

    00:20: Description Boosting for Zero-Shot Entity and Relation Classification

    01:44: Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning

    03:09: Enhancing Retrieval-Augmented LMs with a Two-stage Consistency Learning Compressor

    04:30: Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation

    05:41: mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models

    06:53: Technical Language Processing for Telecommunications Specifications

    08:09: On Affine Homotopy between Language Encoders

    09:25: Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering

    10:32: Probing the Category of Verbal Aspect in Transformer Language Models

    11:58: Linguistic Fingerprint in Transformer Models: How Language Variation Influences Parameter Selection in Irony Detection

    13:03: LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing

    14:33: Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs

    15:51: On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

    17:30: Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data

    19:08: The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding

    20:07: Representations as Language: An Information-Theoretic Framework for Interpretability

    21:32: Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding

    22:46: Hiding Text in Large Language Models: Introducing Unconditional Token Forcing Confusion

    24:21: Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition

    25:37: Deterministic Reversible Data Augmentation for Neural Machine Translation

    26:39: CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks

    28:14: Scalable MatMul-free Language Modeling

    30:03: SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

    31:37: Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

    33:10: TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

  • ArXiv NLP research for Tuesday, June 04, 2024.

    00:20: Conditional Language Learning with Context

    01:13: Zyda: A 1.3T Dataset for Open Language Modeling

    02:32: RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models

    03:50: Personalized Topic Selection Model for Topic-Grounded Dialogue

    05:20: Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue

    06:58: Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

    08:03: Why Would You Suggest That? Human Trust in Language Model Responses

    09:10: Multimodal Reasoning with Multimodal Knowledge Graph

    10:30: QROA: A Black-Box Query-Response Optimization Attack on LLMs

    11:55: Analyzing Social Biases in Japanese Large Language Models

    12:52: I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering

    13:47: PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

    15:16: Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks

    16:38: LongSSM: On the Length Extension of State-space Models in Language Modelling

    17:30: Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data

    18:40: MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset

    20:19: UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models

    22:03: Diver: Large Language Model Decoding with Span-Level Mutual Information Verification

    23:12: SimulTron: On-Device Simultaneous Speech to Speech Translation

    24:28: The current status of large language models in summarizing radiology report impressions

    26:10: Reinforcement Tuning for Detecting Stances and Debunking Rumors Jointly with Large Language Models

    27:17: Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models

    28:46: A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages

    29:40: FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models

    31:17: Self-Modifying State Modeling for Simultaneous Machine Translation

  • ArXiv NLP research for Monday, June 03, 2024.

    00:19: Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost

    01:38: Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

    03:06: Selectively Answering Visual Questions

    04:11: Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

    05:36: Predicting Drug-Gene Relations via Analogy Tasks with Word Embeddings

    06:51: SemCoder: Training Code Language Models with Comprehensive Semantics

    08:39: Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

    10:26: Combining Qualitative and Computational Approaches for Literary Analysis of Finnish Novels

    11:45: Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

    13:26: Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs

    14:34: MACT: Model-Agnostic Cross-Lingual Training for Discourse Representation Structure Parsing

    15:48: Guiding ChatGPT to Generate Salient Domain Summaries

    17:51: Synergizing Unsupervised and Supervised Learning: A Hybrid Approach for Accurate Natural Language Task Modeling

    19:30: TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine

    21:38: Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph

    22:51: Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization

    24:08: Are AI-Generated Text Detectors Robust to Adversarial Perturbations?

    25:42: Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression

    26:35: Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition

    28:01: Demonstration Augmentation for Zero-shot In-context Learning

    29:31: EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs

    31:05: Towards Scalable Automated Alignment of LLMs: A Survey

    32:19: EduNLP: Towards a Unified and Modularized Library for Educational Resources

    33:44: Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

    35:07: Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

    36:36: When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

    37:58: CodeR: Issue Resolving with Multi-Agent and Task Graphs

    38:54: Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

    40:10: FactGenius: Combining Zero-Shot Prompting and Fuzzy Relation Mining to Improve Fact Verification with Knowledge Graphs

    41:27: Probing Language Models for Pre-training Data Detection

    42:45: R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

    44:32: Privacy in LLM-based Recommendation: Recent Advances and Future Directions

    45:23: Linguistic Analysis, Description, and Typological Exploration with Categorial Grammar (TheBench Guide)

    46:52: D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

    48:52: Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function

    50:07: Sparsity-Accelerated Training for Large Language Models

    51:36: Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study

    53:34: Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models

    54:42: LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation

    55:55: Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach

    57:10: Understanding Token Probability Encoding in Output Embeddings

  • ArXiv NLP research for Sunday, June 02, 2024.

    00:19: Prompt Framework for Role-playing: Generation and Evaluation

    01:05: Transforming Computer Security and Public Trust Through the Exploration of Fine-Tuning Large Language Models

    02:18: Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

    03:54: Presence or Absence: Are Unknown Word Usages in Dictionaries?

    05:09: Topic Modeling for Short Texts with Large Language Models

    06:09: How well do distributed representations convey contextual lexical semantics: a Thesis Proposal

    07:05: Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction

    08:27: Automatic Instruction Evolving for Large Language Models

    09:25: Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

    10:26: Developing an efficient corpus using Ensemble Data cleaning approach

    11:51: BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

    13:15: FOCUS: Forging Originality through Contrastive Use in Self-Plagiarism for Language Models

    14:51: The Power of Summary-Source Alignments

    16:11: Formality Style Transfer in Persian

    17:39: Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

    19:08: YODAS: Youtube-Oriented Dataset for Audio and Speech

    20:13: MEDIQ: Question-Asking LLMs for Adaptive and Reliable Medical Reasoning

    22:15: A Survey of Useful LLM Evaluation

    23:31: Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution

    25:07: Annotation Guidelines-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text Classification

    27:18: Using RL to Identify Divisive Perspectives Improves LLMs Abilities to Identify Communities on Social Media

  • ArXiv NLP research for Saturday, June 01, 2024.

    00:19: Multi-Dimensional Optimization for Text Summarization via Reinforcement Learning

    01:41: CASE: Curricular Data Pre-training for Building Generative and Discriminative Assistive Psychology Expert Models

    03:25: Beyond Metrics: Evaluating LLMs' Effectiveness in Culturally Nuanced, Low-Resource Real-World Scenarios

    05:03: RoBERTa-BiLSTM: A Context-Aware Hybrid Model for Sentiment Analysis

    07:09: The Best of Both Worlds: Toward an Honest and Helpful Large Language Model

    09:02: Gender Bias Detection in Court Decisions: A Brazilian Case Study

    10:41: Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization

    11:54: A Survey on Large Language Models for Code Generation

    13:43: Guiding and Diversifying LLM-Based Story Generation via Answer Set Programming

    14:46: SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing

    15:43: LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

    17:24: LLMs Could Autonomously Learn Without External Supervision

  • ArXiv NLP research summaries for May 31, 2024.

    00:20 FineRadScore: A Radiology Report Line-by-Line Evaluation Technique Generating Corrections with Severity Scores

    01:37 Leveraging Large Language Models for Entity Matching

    02:27 Reward-based Input Construction for Cross-document Relation Extraction

    03:40 Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models

    05:04 DORY: Deliberative Prompt Recovery for LLM

    06:18 Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement

    07:35 It is Simple Sometimes: A Study On Improving Aspect-Based Sentiment Analysis Performance

    08:59 FinGen: A Dataset for Argument Generation in Finance

    09:42 Improving code-mixed hate detection by native sample mixing: A case study for Hindi-English code-mixed scenario

    11:26 Multilingual Text Style Transfer: Datasets & Models for Indian Languages

    13:01 An iterated learning model of language change that mixes supervised and unsupervised learning

    14:01 Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment

    15:29 That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses

    16:18 Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models

    17:20 Improving Reward Models with Synthetic Critiques

    18:29 Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

    19:49 clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

    21:05 A comparison of correspondence analysis with PMI-based word embedding methods

    22:05 Large Language Models: A New Approach for Privacy Policy Analysis at Scale

    23:36 Preemptive Answer "Attacks" on Chain-of-Thought Reasoning

    24:22 Learning to Estimate System Specifications in Linear Temporal Logic using Transformers and Mamba

    25:48 OR-Bench: An Over-Refusal Benchmark for Large Language Models

    27:20 Superlatives in Context: Explicit and Implicit Domain Restrictions for Superlative Frames

    28:41 SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

    30:33 Towards a Fluid computer

    31:33 You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

    33:01 LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models

    35:02 Direct Alignment of Language Models via Quality-Aware Self-Refinement

    36:19 Code Pretraining Improves Entity Tracking Abilities of Language Models