AI 文摘



🎁 Resources


  • A Survey of Deep Learning for Mathematical Reasoning, ACL 2023 [paper]

  • Reasoning with Language Model Prompting: A Survey, ACL 2023 [paper]

  • A Survey for In-context Learning, arXiv.2301.00234 [paper]

  • A Survey of Large Language Models, arXiv.2303.18223 [paper]

  • Nature Language Reasoning, A Survey, arXiv.2303.14725 [paper]


  • How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, Dec 2022, Yao Fu’s Notion [blog]

  • Towards Complex Reasoning: the Polaris of Large Language Models, May 2023, Yao Fu’s Notion [blog]

💯 Benchmarks

Mathematical Reasoning

  • Learning to Solve Arithmetic Word Problems with Verb Categorization, EMNLP 2014 [paper]

  • Parsing Algebraic Word Problems into Equations, TACL 2015 [paper]

  • Solving General Arithmetic Word Problems, EMNLP 2015 [paper]

  • MAWPS: A Math Word Problem Repository, NAACL 2016 [paper]

  • Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, ACL 2017 [paper]

  • A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers, ACL 2020 [paper]

  • Are NLP Models really able to Solve Simple Math Word Problems?, ACL 2021 [paper]

  • Training Verifiers to Solve Math Word Problems, arxiv.2110.14168 [paper]

  • PAL: Program-aided Language Models, ICML 2023 [paper]

  • MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms, NAACL 2019 [paper]

  • DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. ACL 2019 [paper]

  • TheoremQA: A Theorem-driven Question Answering dataset, arXiv.2305.12524 [paper]

  • TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance, ACL 2021 [paper]

  • FinQA: A Dataset of Numerical Reasoning over Financial Data, EMNLP 2021 [paper]

  • ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering, EMNLP 2022 [paper]

  • Measuring Mathematical Problem Solving With the MATH Dataset, NeurIPS 2021 [paper]

  • NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks, ACL 2022 [paper]

  • LILA: A Unified Benchmark for Mathematical Reasoning, EMNLP 2022 [paper]

Commonsense Reasoning

  • Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI@ Reasoning Challenge, arxiv.2102.03315 [paper]

  • Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering, ACL 2018 [paper]

  • PIQA: Reasoning about Physical Commonsense in Natural Language, AAAI 2020 [paper]

  • CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge, NAACL 2019 [paper]

  • CommonsenseQA 2.0: Exposing the Limits of AI through Gamification, NeurIPS 2021 [paper]

  • Event2Mind: Commonsense Inference on Events, Intents, and Reactions, ACL 2018 [paper]

  • Going on a vacation" takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding, EMNLP 2019 [paper]

  • Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning, EMNLP 2019 [paper]

  • Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation, ACL 2019 [paper]

  • Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies, TACL 2021 [paper]

Symbolic Reasoning

  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022 [paper]

  • Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models, arXiv.2206.04615 [paper]

  • Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them, ACL 2023 [paper]

Logical Reasoning

  • ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning, ICLR 2020 [paper]

  • LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning, IJCAI 2020 [paper]

  • ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language, ACL 2021 [paper]

  • FOLIO: Natural Language Reasoning with First-Order Logic, arxiv.2209.00840 [paper]

  • Language Models as Inductive Reasoners, arxiv.2212.10923 [paper]

  • Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought, ICLR 2023 [paper]

Multi-modal Reasoning

Visual-Language (Image)

  • From Recognition to Cognition: Visual Commonsense Reasoning, CVPR 2019 [paper]

  • VisualCOMET: Reasoning About the Dynamic Context of a Still Image, ICCV 2020 [paper]

  • Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues, ACL 2022 [paper]

  • Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering, NeurIPS 2022 [paper]


  • What is More Likely to Happen Next? Video-and-Language Future Event Prediction, EMNLP 2020 [paper]

  • CLEVRER: Collision Events for Video Representation and Reasoning, ICLR 2020 [paper]

  • NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions, CVPR 2021 [paper]

  • STaR: Bootstrapping Reasoning With Reasoning, NeurIPS 2022 [paper]

  • From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering, CVPR 2022 [paper]

  • NewsKVQA: Knowledge-Aware News Video Question Answering, PAKDD 2022 [paper]

🚀 Advances

XoT Construction

Manual Construction

  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022 [paper]

  • PAL: Program-aided Language Models, PMLR 2023 [paper]

  • Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arxiv.2211.12588 [paper]

  • MathPrompter: Mathematical Reasoning using Large Language Models, ACL 2023 [paper]

  • Complexity-Based Prompting for Multi-step Reasoning, ICLR 2023 [paper]

Automatic Construction

  • Large Language Models are Zero-Shot Reasoners, NeurIPS 2022 [paper]

  • Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arxiv.2211.12588 [paper]

  • Automatic Chain of Thought Prompting in Large Language Models, ICLR 2023 [paper]

  • Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling, arxiv.2305.09993 [paper]

  • Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, ACL 2023 [paper]

Semi-Automatic Construction

  • Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning, ICLR 2023 [paper]

  • Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models, arxiv.2302.00618 [paper]

  • Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data, arxiv.2302.12822 [paper]

  • Explanation Selection Using Unlabeled Data for In-Context Learning, arxiv.2302.04813 [paper]

  • Boosted Prompt Ensembles for Large Language Models, arxiv.2304.05970 [paper]

XoT Structural Variants

Chain Structure

  • Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arxiv.2211.12588 [paper]

  • PAL: Program-aided Language Models, PMLR 2023 [paper]

  • Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models, arxiv.2305.10276 [paper]

  • Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, arxiv.2308.10379 [paper]

Tree Structure

  • Large Language Model Guided Tree-of-Thought, arxiv.2305.08291 [paper]

  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models, arxiv.2305.10601 [paper]

  • Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding, arxiv.2307.15337 [paper]

Graph Structure

  • Graph of Thoughts: Solving Elaborate Problems with Large Language Models, arxiv.2308.09687 [paper]

  • Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought, arxiv.2308.08614 [paper]

XoT Enhancement Methods

Verify and Refine

  • Making Language Models Better Reasoners with Step-Aware Verifier, ACL 2022 [paper]

  • Successive Prompting for Decomposing Complex Questions, EMNLP 2022 [paper]

  • Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework, ACL 2023 [paper]

  • Large language models are reasoners with self-verification, arxiv.2212.09561 [paper]

  • Reflexion: Language Agents with Verbal Reinforcement Learning, arxiv.2303.11366 [paper]

  • Self-Refine: Iterative Refinement with Self-Feedback, arxiv.2303.17651 [paper]

  • REFINER: Reasoning Feedback on Intermediate Representations, arxiv.2304.01940 [paper]

  • RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought, arxiv.2305.11499 [paper]

  • Deductive Verification of Chain-of-Thought Reasoning, arxiv.2306.03872 [paper]

  • Forward-Backward Reasoning in Large Language Models for Verification, arxiv.2308.07758 [paper]

Question Decomposition

  • Successive Prompting for Decomposing Complex Questions, EMNLP 2022 [paper]

  • Iteratively Prompt Pre-trained Language Models for Chain of Thought, EMNLP 2022 [paper]

  • Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, ICLR 2023 [paper]

  • Decomposed Prompting: A Modular Approach for Solving Complex Tasks, ICLR 2023 [paper]

  • Binding Language Models in Symbolic Languages, ICLR 2023 [paper]

  • Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based Reasoning, SIGIR 2023 [paper]

External Knowledge

  • Chain-of-Dictionary Prompting Elicits Translation in Large Language Models, arxiv.2305.06575 [paper]

  • MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts, arxiv.2305.05181 [paper]

  • Chain of Knowledge: A Framework for Grounding Large Language Models with Structured Knowledge Bases, arxiv.2305.13269 [paper]

  • Boosting Language Models Reasoning with Chain-of-Knowledge Prompting, arxiv.2306.06427 [paper]

  • Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering, arxiv.2308.13259 [paper]

Vote and Rank

  • Training Verifiers to Solve Math Word Problems, arxiv.2110.14168 [paper]

  • Self-Consistency Improves Chain of Thought Reasoning in Language Models, ICLR 2023 [paper]

  • Complexity-Based Prompting for Multi-step Reasoning, ICLR 2023 [paper]

  • Answering Questions by Meta-Reasoning over Multiple Chains of Thought, arxiv.2304.13007 [paper]

  • SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning, arxiv.2308.00436 [paper]


  • Active Prompting with Chain-of-Thought for Large Language Models, arxiv.2302.12246 [paper]

  • Let’s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs, arxiv.2305.11860 [paper]

  • Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding, arxiv.2307.15337 [paper]

🛸 Frontier Application

Tool Using

  • MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning, arxiv.2205.00445 [paper]

  • TALM: Tool Augmented Language Models, arxiv.2205.12255 [paper]

  • ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023 [paper]

  • Toolformer: Language Models Can Teach Themselves to Use Tools, arxiv.2302.04761 [paper]

  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, arxiv.2303.17580 [paper]

  • MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, arxiv.2303.11381 [paper]

  • API-Bank: A Benchmark for Tool-Augmented LLMs, arxiv.2304.08244 [paper]


  • Reflexion: Language Agents with Verbal Reinforcement Learning, arxiv.2303.11366 [paper]

  • Self-Refine: Iterative Refinement with Self-Feedback, arxiv.2303.17651 [paper]

  • LLM+P: Empowering Large Language Models with Optimal Planning Proficiency, arxiv.2304.11477 [paper]

  • Large Language Model Guided Tree-of-Thought, arxiv.2305.08291 [paper]

  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models, arxiv.2305.10601 [paper]

  • Reasoning with Language Model is Planning with World Model, arxiv.2305.14992 [paper]

  • Graph of Thoughts: Solving Elaborate Problems with Large Language Models, arxiv.2308.09687 [paper]

  • Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought, arxiv.2308.08614 [paper]

  • Dynamic Planning with a LLM, arxiv.2308.06391 [paper]


  • STaR: Bootstrapping Reasoning With Reasoning, NeurIPS 2022 [paper]

  • Large Language Models Can Self-Improve, arxiv.2210.11610 [paper]

  • Teaching Small Language Models to Reason, ACL 2023 [paper]

  • Large Language Models Are Reasoning Teachers, ACL 2023 [paper]

  • Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think, ACL 2023 [[paper](]

  • SCOTT: Self-Consistent Chain-of-Thought Distillation, ACL 2023 [paper]

  • Specializing Smaller Language Models towards Multi-Step Reasoning, ICML 2023 [paper]

  • Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes, arxiv.2305.02301 [paper]

  • Contrastive Decoding: Open-ended Text Generation as Optimizatio, ACL 2023 [paper]

  • Contrastive Decoding Improves Reasoning in Large Language Models, arxiv.2309.09117 [paper]

🔭 Future Prospect

Multi-modal XoT

  • Multimodal Chain-of-Thought Reasoning in Language Models, arxiv.2302.00923 [paper]

  • Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models, arxiv.2305.16582 [paper]

  • T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering, arxiv.2305.03453 [paper]

  • Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals, arxiv.2308.06207 [paper]

  • Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning, arxiv.2308.0965 [paper]

Faithful XoT

  • Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework, ACL 2023 [paper]

  • Rethinking with Retrieval: Faithful Large Language Model Inference, arxiv.2301.00303 [paper]

  • Faithful Chain-of-Thought Reasoning, arxiv.2301.13379 [paper]

  • Boosting Language Models Reasoning with Chain-of-Knowledge Prompting, arxiv.2306.06427 [paper]

  • Question Decomposition Improves the Faithfulness of Model-Generated Reasoning, arxiv.2307.11768 [paper]

  • Measuring Faithfulness in Chain-of-Thought Reasoning, arxiv.2307.13702 [paper]

CoT Theory

  • Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango, arxiv.2209.07686 [paper]

  • Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters, ACL 2023 [paper]

  • Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners, arxiv.2305.14825 [paper]

  • Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs, arxiv.2305.18869 [paper]

  • Towards Revealing the Mystery behind Chain of Thought: a TheoreticalPerspective, arxiv.2305.15408 [paper]

  • Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions, arxiv.2307.13339 [paper]

🚢 Other works

  • The Unreliability of Explanations in Few-Shot In-Context Learning, arxiv.2205.03401 [paper]

  • A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams, arxiv.2206.05442 [paper]

  • Rationale-Augmented Ensembles in Language Models, arxiv.2207.00747 [paper]

  • Can language models learn from explanations in context?, EMNLP 2022 [paper]

  • Inferring Implicit Relations in Complex Questions with Language Models, EMNLP 2022 [paper]

  • Language Models of Code are Few-Shot Commonsense Learners, EMNLP 2022 [paper]

  • Solving Quantitative Reasoning Problems with Language Models, NeurIPS 2022 [paper]

  • JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding, SIGKDD 2022 [paper]

  • Large Language Models are few(1)-shot Table Reasoners, EACL 2023 [paper]

  • Reasoning Implicit Sentiment with Chain-of-Thought Prompting, ACL 2023 [paper]

  • Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method, ACL 2023 [paper]

  • Tab-CoT: Zero-shot Tabular Chain of Thought, ACL 2023 [paper]

  • Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models, ACL 2023 [paper]

  • Language models are multilingual chain-of-thought reasoners, ICLR 2023 [paper]

  • Ask Me Anything: A simple strategy for prompting language models, ICLR 2023 [paper]

  • Large Language Models Can Be Easily Distracted by Irrelevant Context, ICLR 2023 [paper]

进技术交流群请添加AINLP小助手微信(id: ainlp2)




AINLP 是一个有趣有AI的自然语言处理社区,专注于 AI、NLP、机器学习、深度学习、推荐算法等相关技术的分享,主题包括LLM、预训练模型、自动生成、文本摘要、智能问答、聊天机器人、机器翻译、知识图谱、推荐系统、计算广告、招聘信息、求职经验分享等,欢迎关注!加技术交流群请添加AINLP小助手微信(id:ainlp2),备注工作/研究方向+加群目的。




