Avsnitt
-
ArXiv NLP research for Thursday, June 13, 2024.
00:20: Chain-of-Though (CoT) prompting strategies for medical error detection and correction
01:31: CoastTerm: a Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature
02:52: RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL
04:01: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
05:24: Leveraging Explicit Reasoning for Inference Integration in Commonsense-Augmented Dialogue Models
06:38: Investigating the translation capabilities of Large Language Models trained on parallel data only
07:56: LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks
09:09: DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation
11:20: Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
12:46: Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
13:53: Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn't
14:47: ReadCtrl: Personalizing text generation with readability-controlled instruction learning
16:32: Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models
17:49: Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs
19:18: End-to-end Streaming model for Low-Latency Speech Anonymization
20:22: Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
22:25: On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models
23:33: Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models
24:35: Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech
25:47: AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
27:15: Transformers meet Neural Algorithmic Reasoners
28:32: REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
30:02: Learning from Natural Language Explanations for Generalizable Entity Matching
31:14: ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models
32:29: DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
33:43: Improving Autoregressive Training with Dynamic Oracles
-
ArXiv NLP research for Thursday, June 13, 2024.
00:20: Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning
01:53: Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models
03:26: Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory
04:33: Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination
06:05: DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
07:26: Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning
08:41: ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
10:07: An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants
11:42: Plan, Generate and Complicate: Improving Low-resource Dialogue State Tracking via Easy-to-Difficult Zero-shot Data Augmentation
12:42: No perspective, no perception!! Perspective-aware Healthcare Answer Summarization
14:28: Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models
16:02: An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
17:21: Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors
18:48: Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning
19:52: Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation
21:12: Multi-Agent Software Development through Cross-Team Collaboration
22:55: LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models
24:14: Bayesian Statistical Modeling with Predictors from LLMs
25:39: ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
27:28: Language Models are Crossword Solvers
28:32: MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning
29:51: CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts
31:29: Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?
32:59: 3M: Multi-modal Multi-task Multi-teacher Learning for Game Event Detection
34:08: Modeling Comparative Logical Relation with Contrastive Learning for Text Generation
35:42: SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
-
Saknas det avsnitt?
-
ArXiv NLP research for Wednesday, June 12, 2024.
00:19: VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
02:05: BookSQL: A Large Scale Text-to-SQL Dataset for Accounting Domain
03:15: Designing a Dashboard for Transparency and Control of Conversational AI
04:46: Label-aware Hard Negative Sampling Strategies with Momentum Contrastive Learning for Implicit Hate Speech Detection
05:51: Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions
06:53: Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations
07:52: Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
08:55: DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning
10:20: Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA
11:35: Large Language Model Unlearning via Embedding-Corrupted Prompts
13:17: Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation
14:46: Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
16:02: LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
17:18: Guiding In-Context Learning of LLMs through Quality Estimation for Machine Translation
18:37: It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
20:02: Adversarial Evasion Attack Efficiency against Large Language Models
21:06: Learning Job Title Representation from Job Description Aggregation Network
21:59: Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey
23:35: AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection
24:38: Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation
25:56: Multimodal Table Understanding
27:20: CoXQL: A Dataset for Parsing Explanation Requests in Conversational XAI Systems
28:51: Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling
30:36: Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
31:57: Semi-Supervised Spoken Language Glossification
33:16: Underneath the Numbers: Quantitative and Qualitative Gender Fairness in LLMs for Depression Prediction
34:37: A Dialogue Game for Eliciting Balanced Collaboration
35:23: Transformer-based Model for ASR N-Best Rescoring and Rewriting
36:16: SumHiS: Extractive Summarization Exploiting Hidden Structure
36:53: Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling
38:08: Leveraging Large Language Models for Web Scraping
39:51: M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation
41:15: Is Programming by Example solved by LLMs?
42:29: Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
43:42: Towards Unsupervised Speech Recognition Without Pronunciation Models
44:50: cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers
45:57: Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
47:02: Tailoring Generative AI Chatbots for Multiethnic Communities in Disaster Preparedness Communication: Extending the CASA Paradigm
48:12: Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
49:56: TasTe: Teaching Large Language Models to Translate through Self-Reflection
51:28: OLMES: A Standard for Language Model Evaluations
52:47: Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
-
ArXiv NLP research for Tuesday, June 11, 2024.
00:20: Scientific Computing with Large Language Models
01:08: Speaking Your Language: Spatial Relationships in Interpretable Emergent Communication
02:19: Bilingual Sexism Classification: Fine-Tuned XLM-RoBERTa and GPT-3.5 Few-Shot Learning
03:51: Fine-tuning with HED-IT: The impact of human post-editing for dialogical language models
05:26: Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
07:03: Joint Learning of Context and Feedback Embeddings in Spoken Dialogue
07:57: BertaQA: How Much Do Language Models Know About Local Culture?
09:17: MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting
10:20: CTC-based Non-autoregressive Textless Speech-to-Speech Translation
11:21: Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities
13:27: GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews
14:40: BvSP: Broad-view Soft Prompting for Few-Shot Aspect Sentiment Quad Prediction
16:32: When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
18:01: Limited Out-of-Context Knowledge Reasoning in Large Language Models
19:36: MINERS: Multilingual Language Models as Semantic Retrievers
20:42: Learning Domain-Invariant Features for Out-of-Context News Detection
22:03: Textual Similarity as a Key Metric in Machine Translation Quality Estimation
23:02: On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations
24:31: Multimodal Belief Prediction
25:29: Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing
26:56: Paraphrasing in Affirmative Terms Improves Negation Understanding
27:37: CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization
29:38: TextGrad: Automatic "Differentiation" via Text
31:35: Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices
32:35: THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report
33:51: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
35:22: Simple and Effective Masked Diffusion Language Models
36:35: Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena
-
ArXiv NLP research for Tuesday, June 11, 2024.
00:20: A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation
01:41: Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges
02:32: A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
04:08: Evolving Subnetwork Training for Large Language Models
05:31: Missingness-resilient Video-enhanced Multimodal Disfluency Detection
06:37: Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models
08:14: Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference
09:33: Delving into ChatGPT usage in academic writing through excess vocabulary
10:53: Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model
12:12: CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation
13:26: Effectively Compress KV Heads for LLM
15:00: Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study
16:54: Reading Miscue Detection in Primary School through Automatic Speech Recognition
18:09: HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
20:01: DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs
21:15: Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning
22:35: Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
24:42: Translating speech with just images
25:35: Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement
26:51: Teaching Language Models to Self-Improve by Learning from Language Feedback
28:25: Merging Improves Self-Critique Against Jailbreak Attacks
29:18: Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models
30:11: Improving Autoformalization using Type Checking
31:37: Improving Commonsense Bias Classification by Mitigating the Influence of Demographic Terms
33:19: Decipherment-Aware Multilingual Learning in Jointly Trained Language Models
34:20: DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms
35:20: On the Hallucination in Simultaneous Machine Translation
36:07: MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs
37:42: Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway
-
ArXiv NLP research for Monday, June 10, 2024.
00:19: Shoulders of Giants: A Look at the Degree and Utility of Openness in NLP Research
00:59: HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs
02:29: The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models
03:24: MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
04:51: A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications
05:49: Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text
07:10: Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval
09:08: Recurrent Context Compression: Efficiently Expanding the Context Window of LLM
10:35: Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation
11:26: Verifiable Generation with Subsentence-Level Fine-Grained Citations
12:36: Comparing Data Augmentation Methods for End-to-End Task-Oriented Dialog Systems
13:55: Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation into German
15:28: Can I understand what I create? Self-Knowledge Evaluation of Large Language Models
16:28: Language Models Resist Alignment
17:58: LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages
19:27: Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
20:27: Combining Embeddings and Domain Knowledge for Job Posting Duplicate Detection
21:37: MaskLID: Code-Switching Language Identification through Iterative Masking
22:49: Multi-Prompting Decoder Helps Better Language Understanding
24:22: Tx-LLM: A Large Language Model for Therapeutics
26:21: Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
27:43: A Parameter-efficient Language Extension Framework for Multilingual ASR
29:06: MedExQA: Medical Question Answering Benchmark with Multiple Explanations
30:36: Sustained Vowels for Pre- vs Post-Treatment COPD Classification
31:49: MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows
33:40: Symmetric Dot-Product Attention for Efficient Training of BERT Language Models
35:00: Annotation alignment: Comparing LLM and human annotations of conversational safety
36:07: mHuBERT-147: A Compact Multilingual HuBERT Model
37:27: Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue
39:00: INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition
40:06: Meta Learning Text-to-Speech Synthesis in over 7000 Languages
40:59: Controlling Emotion in Text-to-Speech with Natural Language Prompts
41:55: Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain
43:29: Multimodal Contextualized Semantic Parsing from Speech
44:25: Interpretability of Language Models via Task Spaces
45:45: Evaluating the Retrieval Component in LLM-Based Question Answering Systems
46:52: Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
48:08: Can Language Models Serve as Text-Based World Simulators?
-
ArXiv NLP research for Sunday, June 09, 2024.
00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation
03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses
05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models
08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions
09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation
11:20: QGEval: A Benchmark for Question Generation Evaluation
12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model
13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization
14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation
18:14: Hidden Holes: topological aspects of language models
19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models
22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering
23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain
26:27: Are Large Language Models Actually Good at Text Style Transfer?
27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator
28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction
30:12: Why Don't Prompt-Based Fairness Metrics Correlate?
31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
33:12: Semisupervised Neural Proto-Language Reconstruction
34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization
35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training
36:07: ThaiCoref: Thai Coreference Resolution Dataset
-
ArXiv NLP research for Saturday, June 08, 2024.
00:19: MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
01:44: Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets
02:30: Flexible and Adaptable Summarization via Expertise Separation
04:18: Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization
06:07: CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation
07:23: Venn Diagram Prompting : Accelerating Comprehension with Scaffolding Effect
08:45: VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
10:19: Planning Like Human: A Dual-process Framework for Dialogue Planning
11:48: Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
12:57: Recent advancements in computational morphology : A comprehensive survey
14:01: MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature
15:41: Design of reliable technology valuation model with calibrated machine learning of patent indicators
17:08: Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition
18:59: Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation
20:25: Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
21:47: ThatiAR: Subjectivity Detection in Arabic News Sentences
23:07: Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts
24:49: Creativity Has Left the Chat: The Price of Debiasing Language Models
25:57: CERET: Cost-Effective Extrinsic Refinement for Text Generation
27:05: GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge?
28:07: Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
29:03: ATLAS: Improving Lay Summarisation with Attribute-based Control
-
ArXiv NLP research for Friday, June 07, 2024.
00:19: Key-Element-Informed sLLM Tuning for Document Summarization
01:22: Low-Resource Cross-Lingual Summarization through Few-Shot Learning with Large Language Models
02:42: Large Language Model-guided Document Selection
04:13: More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play
05:24: DiNeR: a Large Realistic Dataset for Evaluating Compositional Generalization
06:43: MATTER: Memory-Augmented Transformer Using Heterogeneous Knowledge Sources
08:01: Mixture-of-Agents Enhances Large Language Model Capabilities
09:09: AICoderEval: Improving AI Domain Code Generation of Large Language Models
11:00: CRAG -- Comprehensive RAG Benchmark
13:04: CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models
14:52: Think out Loud: Emotion Deducing Explanation in Dialogues
16:43: WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
18:46: SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals
19:58: BERTs are Generative In-Context Learners
20:43: Annotating FrameNet via Structure-Conditioned Language Generation
21:49: Revisiting Catastrophic Forgetting in Large Language Model Tuning
22:43: FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models
24:33: Do Language Models Exhibit Human-like Structural Priming Effects?
25:27: Uncertainty Aware Learning for Language Model Alignment
26:50: The Russian Legislative Corpus
27:24: ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering
28:53: HateDebias: On the Diversity and Variability of Hate Speech Debiasing
30:29: A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques
32:00: Sexism Detection on a Data Diet
33:18: XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
34:21: Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models
35:32: LLM-based speaker diarization correction: A generalizable approach
36:52: TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models
38:10: BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense
39:10: Quantifying Geospatial in the Common Crawl Corpus
40:14: MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter
41:47: Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences
43:19: Compositional Generalization with Grounded Language Models
44:26: Scenarios and Approaches for Situated Natural Language Explanations
46:04: Are Large Language Models More Empathetic than Humans?
47:38: SUMIE: A Synthetic Benchmark for Incremental Entity Summarization
48:52: Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
50:33: An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models
-
ArXiv NLP research for Thursday, June 06, 2024.
00:20: The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses
02:17: Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models
03:39: Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster
04:36: Intention and Face in Dialog
05:48: Uncovering Limitations of Large Language Models in Information Seeking from Tables
07:15: Are We Done with MMLU?
08:41: Legal Judgment Reimagined: PredEx and the Rise of Intelligent AI Interpretation in Indian Courts
09:53: Do Language Models Understand Morality? Towards a Robust Detection of Moral Content
11:47: Every Answer Matters: Evaluating Commonsense with Probabilistic Measures
12:49: Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness
14:26: Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness
15:35: Confabulation: The Surprising Value of Large Language Model Hallucinations
16:42: DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning
18:25: Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model
19:32: ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models
20:50: mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans
22:21: What Do Language Models Learn in Context? The Structured Task Hypothesis
23:38: Rethinking LLM and Linguistic Steganalysis: An Efficient Detection of Strongly Concealed Stego
24:58: BEADs: Bias Evaluation Across Domains
26:41: FairytaleQA Translated: Enabling Educational Question and Answer Generation in Less-Resourced Languages
28:03: Benchmark Data Contamination of Large Language Models: A Survey
29:02: Transformers need glasses! Information over-squashing in language tasks
30:26: Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
31:58: Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People
33:44: ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
35:19: What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
36:41: PaCE: Parsimonious Concept Engineering for Large Language Models
-
ArXiv NLP research for Thursday, June 06, 2024.
00:20: Efficient Knowledge Infusion via KG-LLM Alignment
01:25: NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human
02:34: Character-Level Chinese Dependency Parsing via Modeling Latent Intra-Word Structure
03:30: XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags
04:59: End-to-End Trainable Soft Retriever for Low-resource Relation Extraction
06:07: Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
07:37: Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
08:52: ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
10:29: Chaos with Keywords: Exposing Large Language Models Sycophancy to Misleading Keywords and Evaluating Defense Strategies
11:39: Lean Workbook: A large-scale Lean problem set formalized from natural language math problems
12:56: Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism
14:18: Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As
16:24: Recovering document annotations for sentence-level bitext
17:40: BLSP-Emo: Towards Empathetic Large Speech-Language Models
19:01: Decoder-only Streaming Transformer for Simultaneous Translation
20:28: Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation
21:53: Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models
23:06: How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?
24:13: HeSum: a Novel Dataset for Abstractive Text Summarization in Hebrew
25:19: ArMeme: Propagandistic Content in Arabic Memes
26:26: Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art
27:11: UltraMedical: Building Specialized Generalists in Biomedicine
28:43: Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech
30:02: A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential
31:29: On The Persona-based Summarization of Domain-Specific Documents
33:14: Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing
34:28: American Sign Language Handshapes Reflect Pressures for Communicative Efficiency
-
ArXiv NLP research for Wednesday, June 05, 2024.
00:19: Improving In-Context Learning with Prediction Feedback for Sentiment Analysis
01:24: MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge
03:01: Text Injection for Neural Contextual Biasing
04:16: 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders
06:03: Adversarial Moment-Matching Distillation of Large Language Models
07:05: Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models
08:48: Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese
09:56: Evaluation of data inconsistency for multi-modal sentiment analysis
10:55: BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
12:11: Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models
13:16: From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation
14:20: StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
15:42: RadBARTsum: Domain Specific Adaption of Denoising Sequence-to-Sequence Models for Abstractive Radiology Report Summarization
17:00: Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework
18:14: Cryptocurrency Frauds for Dummies: How ChatGPT introduces us to fraud?
19:48: FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models
20:59: Space Decomposition for Sentence Embedding
22:00: Towards Real-world Scenario: Imbalanced New Intent Discovery
23:40: Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation
25:20: CSS: Contrastive Semantic Similarity for Uncertainty Quantification of LLMs
27:03: StatBot.Swiss: Bilingual Open Data Exploration in Natural Language
28:10: Missci: Reconstructing Fallacies in Misrepresented Science
29:43: ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction
30:47: Linking Named Entities in Diderot's \textit{Encyclop\'edie} to Wikidata
32:06: Error-preserving Automatic Speech Recognition of Young English Learners' Language
33:37: Document-level Claim Extraction and Decontextualisation for Fact-Checking
34:45: The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches
36:09: LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback
37:39: IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
39:46: Automating Turkish Educational Quiz Generation Using Large Language Models
41:34: Cycles of Thought: Measuring LLM Confidence through Stable Explanations
42:57: Are language models rational? The case of coherence norms and belief revision
43:58: What is the Best Way for ChatGPT to Translate Poetry?
45:20: Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types
46:14: MODABS: Multi-Objective Learning for Dynamic Aspect-Based Summarization
47:09: BIPED: Pedagogically Informed Tutoring System for ESL Education
48:24: Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends
50:00: Wings: Learning Multimodal LLMs without Text-only Forgetting
-
ArXiv NLP research for Tuesday, June 04, 2024.
00:20: Description Boosting for Zero-Shot Entity and Relation Classification
01:44: Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning
03:09: Enhancing Retrieval-Augmented LMs with a Two-stage Consistency Learning Compressor
04:30: Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation
05:41: mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models
06:53: Technical Language Processing for Telecommunications Specifications
08:09: On Affine Homotopy between Language Encoders
09:25: Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering
10:32: Probing the Category of Verbal Aspect in Transformer Language Models
11:58: Linguistic Fingerprint in Transformer Models: How Language Variation Influences Parameter Selection in Irony Detection
13:03: LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing
14:33: Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs
15:51: On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
17:30: Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data
19:08: The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
20:07: Representations as Language: An Information-Theoretic Framework for Interpretability
21:32: Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding
22:46: Hiding Text in Large Language Models: Introducing Unconditional Token Forcing Confusion
24:21: Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition
25:37: Deterministic Reversible Data Augmentation for Neural Machine Translation
26:39: CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks
28:14: Scalable MatMul-free Language Modeling
30:03: SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
31:37: Mitigate Position Bias in Large Language Models via Scaling a Single Dimension
33:10: TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
-
ArXiv NLP research for Tuesday, June 04, 2024.
00:20: Conditional Language Learning with Context
01:13: Zyda: A 1.3T Dataset for Open Language Modeling
02:32: RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models
03:50: Personalized Topic Selection Model for Topic-Grounded Dialogue
05:20: Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue
06:58: Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
08:03: Why Would You Suggest That? Human Trust in Language Model Responses
09:10: Multimodal Reasoning with Multimodal Knowledge Graph
10:30: QROA: A Black-Box Query-Response Optimization Attack on LLMs
11:55: Analyzing Social Biases in Japanese Large Language Models
12:52: I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering
13:47: PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
15:16: Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks
16:38: LongSSM: On the Length Extension of State-space Models in Language Modelling
17:30: Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data
18:40: MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset
20:19: UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models
22:03: Diver: Large Language Model Decoding with Span-Level Mutual Information Verification
23:12: SimulTron: On-Device Simultaneous Speech to Speech Translation
24:28: The current status of large language models in summarizing radiology report impressions
26:10: Reinforcement Tuning for Detecting Stances and Debunking Rumors Jointly with Large Language Models
27:17: Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models
28:46: A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages
29:40: FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models
31:17: Self-Modifying State Modeling for Simultaneous Machine Translation
-
ArXiv NLP research for Monday, June 03, 2024.
00:19: Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost
01:38: Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
03:06: Selectively Answering Visual Questions
04:11: Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect
05:36: Predicting Drug-Gene Relations via Analogy Tasks with Word Embeddings
06:51: SemCoder: Training Code Language Models with Comprehensive Semantics
08:39: Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
10:26: Combining Qualitative and Computational Approaches for Literary Analysis of Finnish Novels
11:45: Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
13:26: Decompose, Enrich, and Extract! Schema-aware Event Extraction using LLMs
14:34: MACT: Model-Agnostic Cross-Lingual Training for Discourse Representation Structure Parsing
15:48: Guiding ChatGPT to Generate Salient Domain Summaries
17:51: Synergizing Unsupervised and Supervised Learning: A Hybrid Approach for Accurate Natural Language Task Modeling
19:30: TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine
21:38: Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph
22:51: Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
24:08: Are AI-Generated Text Detectors Robust to Adversarial Perturbations?
25:42: Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression
26:35: Improving Pseudo Labels with Global-Local Denoising Framework for Cross-lingual Named Entity Recognition
28:01: Demonstration Augmentation for Zero-shot In-context Learning
29:31: EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs
31:05: Towards Scalable Automated Alignment of LLMs: A Survey
32:19: EduNLP: Towards a Unified and Modularized Library for Educational Resources
33:44: Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
35:07: Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
36:36: When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
37:58: CodeR: Issue Resolving with Multi-Agent and Task Graphs
38:54: Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding
40:10: FactGenius: Combining Zero-Shot Prompting and Fuzzy Relation Mining to Improve Fact Verification with Knowledge Graphs
41:27: Probing Language Models for Pre-training Data Detection
42:45: R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models
44:32: Privacy in LLM-based Recommendation: Recent Advances and Future Directions
45:23: Linguistic Analysis, Description, and Typological Exploration with Categorial Grammar (TheBench Guide)
46:52: D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
48:52: Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function
50:07: Sparsity-Accelerated Training for Large Language Models
51:36: Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study
53:34: Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models
54:42: LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation
55:55: Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach
57:10: Understanding Token Probability Encoding in Output Embeddings
-
ArXiv NLP research for Sunday, June 02, 2024.
00:19: Prompt Framework for Role-playing: Generation and Evaluation
01:05: Transforming Computer Security and Public Trust Through the Exploration of Fine-Tuning Large Language Models
02:18: Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
03:54: Presence or Absence: Are Unknown Word Usages in Dictionaries?
05:09: Topic Modeling for Short Texts with Large Language Models
06:09: How well do distributed representations convey contextual lexical semantics: a Thesis Proposal
07:05: Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
08:27: Automatic Instruction Evolving for Large Language Models
09:25: Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
10:26: Developing an efficient corpus using Ensemble Data cleaning approach
11:51: BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling
13:15: FOCUS: Forging Originality through Contrastive Use in Self-Plagiarism for Language Models
14:51: The Power of Summary-Source Alignments
16:11: Formality Style Transfer in Persian
17:39: Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
19:08: YODAS: Youtube-Oriented Dataset for Audio and Speech
20:13: MEDIQ: Question-Asking LLMs for Adaptive and Reliable Medical Reasoning
22:15: A Survey of Useful LLM Evaluation
23:31: Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution
25:07: Annotation Guidelines-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text Classification
27:18: Using RL to Identify Divisive Perspectives Improves LLMs Abilities to Identify Communities on Social Media
-
ArXiv NLP research for Saturday, June 01, 2024.
00:19: Multi-Dimensional Optimization for Text Summarization via Reinforcement Learning
01:41: CASE: Curricular Data Pre-training for Building Generative and Discriminative Assistive Psychology Expert Models
03:25: Beyond Metrics: Evaluating LLMs' Effectiveness in Culturally Nuanced, Low-Resource Real-World Scenarios
05:03: RoBERTa-BiLSTM: A Context-Aware Hybrid Model for Sentiment Analysis
07:09: The Best of Both Worlds: Toward an Honest and Helpful Large Language Model
09:02: Gender Bias Detection in Court Decisions: A Brazilian Case Study
10:41: Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization
11:54: A Survey on Large Language Models for Code Generation
13:43: Guiding and Diversifying LLM-Based Story Generation via Answer Set Programming
14:46: SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing
15:43: LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models
17:24: LLMs Could Autonomously Learn Without External Supervision
-
ArXiv NLP research summaries for May 31, 2024.
00:20 FineRadScore: A Radiology Report Line-by-Line Evaluation Technique Generating Corrections with Severity Scores
01:37 Leveraging Large Language Models for Entity Matching
02:27 Reward-based Input Construction for Cross-document Relation Extraction
03:40 Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models
05:04 DORY: Deliberative Prompt Recovery for LLM
06:18 Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement
07:35 It is Simple Sometimes: A Study On Improving Aspect-Based Sentiment Analysis Performance
08:59 FinGen: A Dataset for Argument Generation in Finance
09:42 Improving code-mixed hate detection by native sample mixing: A case study for Hindi-English code-mixed scenario
11:26 Multilingual Text Style Transfer: Datasets & Models for Indian Languages
13:01 An iterated learning model of language change that mixes supervised and unsupervised learning
14:01 Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment
15:29 That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses
16:18 Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models
17:20 Improving Reward Models with Synthetic Critiques
18:29 Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning
19:49 clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
21:05 A comparison of correspondence analysis with PMI-based word embedding methods
22:05 Large Language Models: A New Approach for Privacy Policy Analysis at Scale
23:36 Preemptive Answer "Attacks" on Chain-of-Thought Reasoning
24:22 Learning to Estimate System Specifications in Linear Temporal Logic using Transformers and Mamba
25:48 OR-Bench: An Over-Refusal Benchmark for Large Language Models
27:20 Superlatives in Context: Explicit and Implicit Domain Restrictions for Superlative Frames
28:41 SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales
30:33 Towards a Fluid computer
31:33 You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
33:01 LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
35:02 Direct Alignment of Language Models via Quality-Aware Self-Refinement
36:19 Code Pretraining Improves Entity Tracking Abilities of Language Models
-
ArXiv NLP research summaries for May 30, 2024.
-
ArXiv NLP research summaries for May 29, 2024.
- Visa fler