This repository is used to collect papers and code in the field of AI.
MIT License
This repository is used to collect papers and code in the field of AI. The contents contain the following parts:
├─ NLP/
│ ├─ Word2Vec/
│ ├─ Seq2Seq/
│ └─ Pretraining/
│ ├─ Large Language Model/
│ ├─ LLM Application/
│ ├─ AI Agent/
│ ├─ Academic/
│ ├─ Code/
│ ├─ Financial Application/
│ ├─ Information Retrieval/
│ ├─ Math/
│ ├─ Medicine and Law/
│ ├─ Recommend System/
│ └─ Tool Learning/
│ ├─ LLM Technique/
│ ├─ Alignment/
│ ├─ Context Length/
│ ├─ Corpus/
│ ├─ Evaluation/
│ ├─ Hallucination/
│ ├─ Inference/
│ ├─ MoE/
│ ├─ PEFT/
│ ├─ Prompt Learning/
│ ├─ RAG/
│ └─ Reasoning and Planning/
│ ├─ LLM Theory/
│ └─ Chinese Model/
├─ CV/
│ ├─ CV Application/
│ ├─ Contrastive Learning/
│ ├─ Foundation Model/
│ ├─ Generative Model (GAN and VAE)/
│ ├─ Image Editing/
│ ├─ Object Detection/
│ ├─ Semantic Segmentation/
│ └─ Video/
├─ Multimodal/
│ ├─ Audio/
│ ├─ BLIP/
│ ├─ CLIP/
│ ├─ Diffusion Model/
│ ├─ Multimodal LLM/
│ ├─ Text2Image/
│ ├─ Text2Video/
│ └─ Survey/
│─ Reinforcement Learning/
│─ GNN/
└─ Transformer Architecture/
Attention Is All You Need, Vaswani et al., NIPS 2017. [paper][code]
GPT: Improving language understanding by generative pre-training, Radford et al., preprint 2018. [paper][code]
GPT-2: Language Models are Unsupervised Multitask Learners, Radford et al., OpenAI blog 2019. [paper][code][llm.c]
GPT-3: Language Models are Few-Shot Learners, Brown et al., NeurIPS 2020. [paper][code][nanoGPT][build-nanogpt][gpt-fast][modded-nanogpt]
InstructGPT: Training language models to follow instructions with human feedback, Ouyang et al., NeurIPS 2022. [paper][MOSS-RLHF]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., NAACL 2019 Best Paper. [paper][code][BERT-pytorch][bert4torch][bert4keras]
RoBERTa: A Robustly Optimized BERT Pretraining Approach, Liu et al., arxiv 2019. [paper][code][Chinese-BERT-wwm]
What Does BERT Look At: An Analysis of BERT's Attention, Clark et al., arxiv 2019. [paper][code]
DeBERTa: Decoding-enhanced BERT with Disentangled Attention, He et al., ICLR 2021. [paper][code]
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Sanh et al., arxiv 2019. [paper][code][albert_pytorch]
BERT Rediscovers the Classical NLP Pipeline, Tenney et al., arxiv 2019. [paper][code]
How to Fine-Tune BERT for Text Classification?, Sun et al., arxiv 2019. [paper][code]
TinyStories: How Small Can Language Models Be and Still Speak Coherent English, Eldan and Li, arxiv 2023. [paper][dataset][phi-3][SmolLM]
[LLM101n][EurekaLabsAI][llm-course][intro-llm][llm-cookbook][hugging-llm][generative-ai-for-beginners][awesome-generative-ai-guide][LLMs-from-scratch][llm-action][llms_idx][tiny-universe]
[cs230-code-examples][victoresque/pytorch-template][songquanpeng/pytorch-template][Academic-project-page-template][WritingAIPaper]
[tokenizer_summary][minbpe][tokenizers][tiktoken][SentencePiece]
A Watermark for Large Language Models, Kirchenbauer et al., arxiv 2023. [paper][code][MarkLLM][Awesome-LLM-Watermark]
SeqXGPT: Sentence-Level AI-Generated Text Detection, Wang et al., EMNLP 2023. [paper][code][llm-detect-ai][detect-gpt][fast-detect-gpt]
AlpaGasus: Training A Better Alpaca with Fewer Data, Chen et al., ICLR 2024. [paper][code]
AutoMix: Automatically Mixing Language Models, Madaan et al., arxiv 2023. [paper][code]
ChipNeMo: Domain-Adapted LLMs for Chip Design, Liu et al., arxiv 2023. [paper][semikong][circuit_training]
GAIA: A Benchmark for General AI Assistants, Mialon et al., ICLR 2024. [paper][code]
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al., NeurIPS 2023. [paper][code]
MemGPT: Towards LLMs as Operating Systems, Packer et al., arxiv 2023. [paper][code]
UFO: A UI-Focused Agent for Windows OS Interaction, Zhang et al., arxiv 2024. [paper][code]
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement, Wu et al., ICLR 2024. [paper][code][WindowsAgentArena]
AIOS: LLM Agent Operating System, Mei et al., arxiv 2024. [paper][code]
DB-GPT: Empowering Database Interactions with Private Large Language Models, Xue et al., arxiv 2023. [paper][code][DocsGPT][privateGPT][localGPT]
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data, Wang et al., ICLR 2024. [paper][code]
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement, Zheng et al., arxiv 2024. [paper][code][code-interpreter]
Orca: Progressive Learning from Complex Explanation Traces of GPT-4, Mukherjee et al., arxiv 2023. [paper]
PDFTriage: Question Answering over Long, Structured Documents, Saad-Falcon et al., arxiv 2023. [paper][[code]]
Prompt2Model: Generating Deployable Models from Natural Language Instructions, Viswanathan et al., arxiv 2023. [paper][code]
Shepherd: A Critic for Language Model Generation, Wang et al., arxiv 2023. [paper][code]
Alpaca: A Strong, Replicable Instruction-Following Model, Taori et al., Stanford Blog 2023. [paper][code]
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality*, Chiang et al., 2023. [blog]
WizardLM: Empowering Large Language Models to Follow Complex Instructions, Xu et al., ICLR 2024. [paper][code]
WebCPM: Interactive Web Search for Chinese Long-form Question Answering, Qin et al., ACL 2023. [paper][code]
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences, Liu et al., KDD 2023. [paper][code][AutoWebGLM][AutoCrawler][gpt-crawler][webllama][gpt-researcher][skyvern][Scrapegraph-ai][crawl4ai][crawlee-python][Agent-E][CyberScraper-2077]
LLM4Decompile: Decompiling Binary Code with Large Language Models, Tan et al., arxiv 2024. [paper] [code]
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases, Liu et al., ICML 2024. [paper][code][Awesome-LLMs-on-device]
The Oscars of AI Theater: A Survey on Role-Playing with Language Models, Chen et al., arxiv 2024. [paper][code][RPBench-Auto][Hermes 3 Technical Report]
Apple Intelligence Foundation Language Models, Gunter et al., arxiv 2024. [blog][paper]
Controllable Text Generation for Large Language Models: A Survey, Liang et al., arxiv 2024. [paper][code][guidance][outlines]
[ray][dask][TaskingAI][gpt4all][ollama][llama.cpp][dify][mindsdb][bisheng][phidata][guidance][outlines][jsonformer][fabric][mem0][taipy]
[chatgpt-on-wechat][LLM-As-Chatbot][HuixiangDou][Streamer-Sales][Tianji][metahuman-stream][aiavatarkit][ai-getting-started]
LLM Powered Autonomous Agents, Lilian Weng, 2023. [blog][LLMAgentPapers][LLM-Agents-Papers][awesome-language-agents][Awesome-Papers-Autonomous-Agent]
A Survey on Large Language Model based Autonomous Agents, Wang et al., [paper][code][LLM-Agent-Paper-Digest]
The Rise and Potential of Large Language Model Based Agents: A Survey, Xi et al., arxiv 2023. [paper][code]
Agent AI: Surveying the Horizons of Multimodal Interaction, Durante et al., arxiv 2024. [paper]
Position Paper: Agent AI Towards a Holistic Intelligence, Huang et al., arxiv 2024. [paper]
AgentBench: Evaluating LLMs as Agents, Liu et al., ICLR 2024. [paper][code][VisualAgentBench][OSWorld][AgentGym]
Agents: An Open-source Framework for Autonomous Language Agents, Zhou et al., arxiv 2023. [paper][code]
AutoAgents: A Framework for Automatic Agent Generation, Chen et al., arxiv 2023. [paper][code]
AgentTuning: Enabling Generalized Agent Abilities for LLMs, Zeng et al., arxiv 2023. [paper][code]
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors, Chen et al., ICLR 2024. [paper][code]
AppAgent: Multimodal Agents as Smartphone Users, Zhang et al., arxiv 2023. [paper][code][digirl]
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception, Wang et al., arxiv 2024. [paper][code][Mobile-Agent-v2]
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security, Li et al., arxiv 2024. [paper][code]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, Wu et al., arxiv 2023. [paper][code]
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society, Li et al., NeurIPS 2023. [paper][code][crab]
ChatDev: Communicative Agents for Software Development, Qian et al., ACL 2024. [paper][code][gpt-pilot]
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework, Hong et al., ICLR 2024 Oral. [paper][code]
ProAgent: From Robotic Process Automation to Agentic Process Automation, Ye et al., arxiv 2023. [paper][code]
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation, Luo et al., arxiv 2024. [paper][code]
Generative Agents: Interactive Simulacra of Human Behavior, Park et al., arxiv 2023. [paper][code][GPTeam]
CogAgent: A Visual Language Model for GUI Agents, Hong et al., CVPR 2024. [paper][code]
OpenAgents: An Open Platform for Language Agents in the Wild, Xie et al., arxiv 2023. [paper][code]
TaskWeaver: A Code-First Agent Framework, Qiao et al., arxiv 2023. [paper][code]
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge, Fan et al., NeurIPS 2022 Outstanding Paper. [paper][code]
Voyager: An Open-Ended Embodied Agent with Large Language Models, Wang et al., arxiv 2023. [paper][code]
Eureka: Human-Level Reward Design via Coding Large Language Models, Ma et al., ICLR 2024. [paper][code][DrEureka]
LEGENT: Open Platform for Embodied Agents, Cheng et al., ACL 2024. [paper][code]
Mind2Web: Towards a Generalist Agent for the Web, Deng et al., NeurIPS 2023. [paper][code][AutoWebGLM]
WebArena: A Realistic Web Environment for Building Autonomous Agents, Zhou et al., ICLR 2024. [paper][code][visualwebarena][agent-workflow-memory][WindowsAgentArena]
SeeAct: GPT-4V(ision) is a Generalist Web Agent, if Grounded, Zheng et al., arxiv 2024. [paper][code]
Cradle: Empowering Foundation Agents Towards General Computer Control, Tan et al., arxiv 2024. [paper][code]
AgentScope: A Flexible yet Robust Multi-Agent Platform, Gao et al., arxiv 2024. [paper][code][modelscope-agent]
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments, Xi et al., arxiv 2024. [paper][code]
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence, Chen et al., arxiv 2024. [paper][code]
CLASI: Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent, ByteDance Research, 2024. [paper][translation-agent]
Automated Design of Agentic Systems, Hu et al., arxiv 2024. [paper][code][agent-zero][AgentK]
Foundation Models in Robotics: Applications, Challenges, and the Future, Firoozi et al., arxiv 2023. [paper][code]
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI, Liu et al., arxiv 2024. [paper][code]
RT-1: Robotics Transformer for Real-World Control at Scale, Brohan et al., arxiv 2022. [paper][code][IRASim]
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, Brohan et al., arxiv 2023. [paper][Unofficial Implementation][RT-H: Action Hierarchies Using Language]
Open X-Embodiment: Robotic Learning Datasets and RT-X Models, Open X-Embodiment Collaboration, arxiv 2023. [paper][code]
Shaping the future of advanced robotics, Google DeepMind 2024. [blog]
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, Wang et al., ICML 2024. [paper][code]
RL-GPT: Integrating Reinforcement Learning and Code-as-policy, Liu et al., arxiv 2024. [paper]
Genie: Generative Interactive Environments, Bruce et al., ICML 2024 Best Paper. [paper][GameNGen][GameGen-O]
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, Fu et al., arxiv 2024. [paper][code][Hardware Code][Learning Code][UMI][humanplus][TeleVision][Surgical Robot Transformer][lifelike-agility-and-play][ReKep]
Octo: An Open-Source Generalist Robot Policy, Ghosh et al., arxiv 2024. [paper][code][BodyTransformer][crossformer]
GRUtopia: Dream General Robots in a City at Scale, Wang et al., arxiv 2024. [paper][code]
HPT: Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers, Wang et al., NeurIPS 2024 Spotlight. [paper][code]
[LeRobot][DORA][awesome-ai-agents][IsaacLab][Awesome-Robotics-3D][AimRT]
XAgent: An Autonomous Agent for Complex Task Solving, [blog][code]
[crewAI][PraisonAI][llama_deploy][phidata][gpt-computer-assistant][agentic_patterns]
[translation-agent][agent-zero][AgentK][Twitter Personality][RD-Agent]
Galactica: A Large Language Model for Science, Taylor et al., arxiv 2022. [paper][code]
K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization, Deng et al., arxiv 2023. [paper][code][pdf_parser]
GeoGalactica: A Scientific Large Language Model in Geoscience, Lin et al., arxiv 2024. [paper][code][sciparser]
Scientific Large Language Models: A Survey on Biological & Chemical Domains, Zhang et al., arxiv 2024. [paper][code][sciknoweval]
SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning, Zhang et al., arxiv 2024. [paper][code]
ChemLLM: A Chemical Large Language Model, Zhang et al., arxiv 2024. [paper][model]
LangCell: Language-Cell Pre-training for Cell Identity Understanding, Zhao et al., ICML 2024. [paper][code][scFoundation]
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers, Pramanick et al., arxiv 2024. [paper][code]
STORM: Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models, Shao et al., NAACL 2024. [paper][code]
Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis, Yu et al., arxiv 2024. [paper][code]
OpenResearcher: Unleashing AI for Accelerated Scientific Research, Zheng et al., arxiv 2024. [paper][code][Paper Copilot][SciAgentsDiscovery][paper-qa][GraphReasoning]
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, Lu et al., arxiv 2024. [paper][code]
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers, Si et al., arxiv 2024. [paper][code]
[Awesome-Scientific-Language-Models][gpt_academic][ChatPaper][scispacy][awesome-ai4s][xVal]
Neural code generation, CMU 2024 Spring. [link]
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code, Zhang et al., arxiv 2023. [paper][Awesome-Code-LLM][MFTCoder][Awesome-Code-LLM]
Source Code Data Augmentation for Deep Learning: A Survey, Zhuo et al., arxiv 2023. [paper][code]
Codex: Evaluating Large Language Models Trained on Code, Chen et al., arxiv 2021. [paper][human-eval][CriticGPT][On scalable oversight with weak LLMs judging strong LLMs]
Code Llama: Open Foundation Models for Code, Rozière et al., arxiv 2023. [paper][code][model][llamacoder]
AlphaCode: Competition-Level Code Generation with AlphaCode, Li et al., arxiv 2022. [paper][dataset][AlphaCode2_Tech_Report]
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X, Zheng et al., KDD 2023. [paper][code][CodeGeeX2][CodeGeeX4]
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis, Nijkamp et al., ICLR 2022. [paper][code]
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages, Nijkamp et al., ICLR 2023. [paper][code]
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules, Le et al., arxiv 2023. [paper][code]
StarCoder: may the source be with you, Li et al., arxiv 2023. [paper][code][bigcode-project][model]
StarCoder 2 and The Stack v2: The Next Generation, Lozhkov et al., 2024. [paper][code][starcoder.cpp]
WizardCoder: Empowering Code Large Language Models with Evol-Instruct, Luo et al., ICLR 2024. [paper][code]
Magicoder: Source Code Is All You Need, Wei et al., arxiv 2023. [paper][code]
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering, Ridnik et al., arxiv 2024. [paper][code][pr-agent][cover-agent]
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence, Guo et al., arxiv 2024. [paper][code]
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence, Zhu et al., CoRR 2024. [paper][code][DeepSeek-V2.5]
Qwen2.5-Coder Technical Report, Hui et al., arxiv 2024. [paper][code]
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents, Yang et al., arxiv 2024. [paper]
Design2Code: How Far Are We From Automating Front-End Engineering?, Si et al., arxiv 2024. [paper][code]
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct, Lei et al., arxiv 2024. [paper][code]
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, Yang et al., arxiv 2024. [paper][code][swe-bench-technical-report][CodeR]
Agentless: Demystifying LLM-based Software Engineering Agents, Xia et al., arxiv 2024. [paper][code]
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions, Zhuo et al., arxiv 2024. [paper][code]
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents, Wang et al., arxiv 2024. [paper][code]
Planning In Natural Language Improves LLM Search For Code Generation, Wang et al., arxiv 2024. [paper]
Large Language Model-Based Agents for Software Engineering: A Survey, Liu et al., arxiv 2024. [paper][code]
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale, Phan et al., arxiv 2024. [paper][code]
[OpenDevin][devika][auto-code-rover][developer][aider][claude-engineer][SuperCoder]
DocLLM: A layout-aware generative language model for multimodal document understanding, Wang et al., arxiv 2024. [paper]
DocGraphLM: Documental Graph Language Model for Information Extraction, Wang et al., arxiv 2023. [paper]
FinBERT: A Pretrained Language Model for Financial Communications, Yang et al., arxiv 2020. [paper][Wiley paper][code][finBERT][valuesimplex/FinBERT]
FinGPT: Open-Source Financial Large Language Models, Yang et al., IJCAI 2023. [paper][code]
FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models, Yang et al., arxiv 2024. [paper][code]
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets, Wang et al., arxiv 2023. [paper][code]
Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models, Zhang et al., arxiv 2023. [paper][code]
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance, Liu et al., arxiv 2020. [paper][code]
FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning, Liu et al., NeurIPS 2022. [paper][code]
DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning, Chen et al., arxiv 2023. [paper][code]
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist, Zhang et al., arxiv 2024. [paper]
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters, Zhang et al., arxiv 2023. [paper][code]
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications, Xie et al., arxiv 2024. [paper][code]
StructGPT: A General Framework for Large Language Model to Reason over Structured Data, Jiang et al., arxiv 2023. [paper][code]
Large Language Model for Table Processing: A Survey, Lu et al., arxiv 2024. [paper][llm-table-survey][table-transformer][Awesome-Tabular-LLMs][Awesome-LLM-Tabular][Table-LLaVA]
rLLM: Relational Table Learning with LLMs, Li et al., arxiv 2024. [paper][code]
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow, Zhang et al., arxiv 2023. [paper][code]
Data Interpreter: An LLM Agent For Data Science, Hong et al., arxiv 2024. [paper][code]
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework, Li et al., COLING 2024. [paper][code]
LLMFactor: Extracting Profitable Factors through Prompts for Explainable Stock Movement Prediction, Wang et al., arxiv 2024. [paper][MIGA]
A Survey of Large Language Models in Finance (FinLLMs), Lee et al., arxiv 2024. [paper][code][Revolutionizing Finance with LLMs: An Overview of Applications and Insights]
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges, Nie et al., arxiv 2024. [paper]
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods, Wang et al., arxiv 2024. [paper][code][Stockagent]
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset, Zhu et al., ACL 2024. [paper][code]
[gpt-investor][FinGLM][agentUniverse][gs-quant][stockbot-on-groq][Real-Time-Stock-Market-Prediction-using-Ensemble-DL-and-Rainbow-DQN]
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, Khattab et al., SIGIR 2020. [paper][simbert][roformer-sim]
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction, Santhanam et al., NAACL 2022. [paper][code][RAGatouille][A Reproducibility Study of PLAID][Jina-ColBERT-v2]
ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval, Louis et al., arxiv 2024. [paper][code][model]
NCI: A Neural Corpus Indexer for Document Retrieval, Wang et al., NeurIPS 2022 Outstanding Paper. [paper][code][DSI-transformers][GDR EACL 2024 Oral]
HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels, Gao et al., ACL 2023. [paper][code]
Query2doc: Query Expansion with Large Language Models, Wang et al., EMNLP 2023. [paper][Query Expansion by Prompting Large Language Models]
RankGPT: Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents, Sun et al., EMNLP 2023 Outstanding Paper. [paper][code]
Large Language Models for Information Retrieval: A Survey, Zhu et al., arxiv 2023. [paper][code][YuLan-IR]
Large Language Models for Generative Information Extraction: A Survey, Xu et al., arxiv 2023. [paper][code][UIE][NERRE][uie_pytorch]
LLaRA: Making Large Language Models A Better Foundation For Dense Retrieval, Li et al., arxiv 2023. [paper][code]
UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models, Li et al., AAAI 2024. [paper]
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning, Zhu et al., ACL 2024. [paper][code][ChatRetriever]
GenIR: From Matching to Generation: A Survey on Generative Information Retrieval, Li et al., arxiv 2024. [paper][code]
D2LLM: Decomposed and Distilled Large Language Models for Semantic Search, Liao et al., ACL 2024. [paper][code]
BM25S: Orders of magnitude faster lexical search via eager sparse scoring, Xing Han Lù, arxiv 2024. [paper][code][rank_bm25][pyserini]
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, Chen et al., arxiv 2024. [paper][code]
Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express, Aroraa et al., arxiv 2024. [paper]
SIGIR-AP 2023 Tutorial: Recent Advances in Generative Information Retrieval [link]
SIGIR 2024 Tutorial: Large Language Model Powered Agents for Information Retrieval [link]
[search_with_lepton][LLocalSearch][FreeAskInternet][storm][searxng][Perplexica][rag-search][sensei]
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving, Gou et al., ICLR 2024. [paper][code]
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models, Yu et al., ICLR 2024. [paper][code]
MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models, Lu et al., ICLR 2024 Oral. [paper][code][MathBench][OlympiadBench]
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning, Ying et al., arxiv 2024. [paper][code]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, Shao et al., arxiv 2024. [paper][code][DeepSeek-Prover-V1.5]
Common 7B Language Models Already Possess Strong Math Capabilities, Li et al., arxiv 2024. [paper][code]
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline, Xu et al., arxiv 2024. [paper][code]
AlphaMath Almost Zero: process Supervision without process, Chen et al., arxiv 2024. [paper][code]
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models, Zhou et al., NeurIPS 2024. [paper][code]
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B, Zhang et al., arxiv 2024. [paper][code][LLaMA-Berry]
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models, Shi et al., arxiv 2024. [paper][code]
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?, Qiao et al., arxiv 2024. [paper][code]
MAVIS: Mathematical Visual Instruction Tuning, Zhang et al., arxiv 2024. [paper][code]
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement, Yang et al., arxiv 2024. [paper][code][Qwen2.5-Math-Demo]
AI Mathematical Olympiad - Progress Prize 1, Kaggle Competition 2024. [Numina 1st Place Solution][project-numina/aimo-progress-prize][How NuminaMath Won the 1st AIMO Progress Prize][NuminaMath-7B-TIR][AI achieves silver-medal standard solving International Mathematical Olympiad problems]
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge, Zhou et al., arxiv 2023. [paper][code][LLM-for-Healthcare][GMAI-MMBench]
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law, Chen et al., arxiv 2024. [paper][code]
PMC-LLaMA: Towards Building Open-source Language Models for Medicine, Wu et al., arxiv 2024. [paper][code][MMedLM]
HuatuoGPT, towards Taming Language Model to Be a Doctor, Zhang et al., arxiv 2023. [paper][code][HuatuoGPT-II][Medical_NLP][Zhongjing][MedicalGPT][huatuogpt-vision][Chain-of-Diagnosis]
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model, Cui et al., arxiv 2023. [paper][code]
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services, Yue et al., arxiv 2023. [paper][code]
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation, Bao et al., arxiv 2023. [paper][code]
BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT, Chen et al., arxiv 2023. [paper][code][SoulChat2.0]
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning, Tang et al., arxiv 2023. [paper][code]
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models, Chen et al., arxiv 2023. [paper][meditron]
Med-PaLM: Large language models encode clinical knowledge, Singhal et al., Nature 2023. [paper][Unofficial Implementation]
Capabilities of Gemini Models in Medicine, Saab et al., arxiv 2024. [paper]
AMIE: Towards Conversational Diagnostic AI, Tu et al., arxiv 2024. [paper][AMIE-pytorch]
Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People, Wang et al., arxiv 2024. [paper][code][Medical_NLP]
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents, Li et al., arxiv 2024. [paper]
AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents, Chen et al., arxiv 2024. [paper][code]
[openfold][alphafold3-pytorch][AlphaFold3][Ligo-Biosciences/AlphaFold3][LucaOne][esm][AlphaPPImd][visual-med-alpaca][chai-lab]
DIN: Deep Interest Network for Click-Through Rate Prediction, Zhou et al., KDD 2018. [paper][code][DIEN][x-deeplearning]
MMoE: Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts, Ma et al., KDD 2018. [paper][DeepCTR-Torch][pytorch-mmoe]
Recommender Systems with Generative Retrieval, Rajput et al., NeurIPS 2023. [paper][Methodologies for Improving Modern Industrial Recommender Systems]
Unifying Large Language Models and Knowledge Graphs: A Roadmap, Pan et al., arxiv 2023. [paper]
YuLan-Rec: User Behavior Simulation with Large Language Model based Agents, Wang et al., arxiv 2023. [paper][code]
SSLRec: A Self-Supervised Learning Framework for Recommendation, Ren et al., WSDM 2024 Oral. [paper][code][Awesome-SSLRec-Papers]
RLMRec: Representation Learning with Large Language Models for Recommendation, Ren et al., WWW 2024. [paper][code]
LLMRec: Large Language Models with Graph Augmentation for Recommendation, Wei et al., WSDM 2024 Oral. [paper][code][EasyRec]
XRec: Large Language Models for Explainable Recommendation, Ma et al., arxiv 2024. [paper][code][SelfGNN]
Agent4Rec_On Generative Agents in Recommendation, Zhang et al., arxiv 2023. [paper][code]
LLM-KERec: Breaking the Barrier: Utilizing Large Language Models for Industrial Recommendation Systems through an Inferential Knowledge Graph, Zhao et al., arxiv 2024. [paper]
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations, Zhai et al., ICML 2024. [paper][code][Transformers4Rec]
Wukong: Towards a Scaling Law for Large-Scale Recommendation, Zhang et al., ICML 2024. [paper][unofficial code]
RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems, Lian et al., arxiv 2024. [paper][code]
Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application, Jia et al., arxiv 2024. [paper]
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling, Chen et al., arxiv 2024. [paper][code]
[recommenders][Source code for Twitter's Recommendation Algorithm][Awesome-RSPapers][RecBole][RecSysDatasets][LLM4Rec-Awesome-Papers][Awesome-LLM-for-RecSys][Awesome-LLM4RS-Papers][ReChorus]
[fun-rec][RecommenderSystem][AI-RecommenderSystem][RecSysPapers][Algorithm-Practice-in-Industry][AlgoNotes]
Tool Learning with Foundation Models, Qin et al., arxiv 2023. [paper][code]
Tool Learning with Large Language Models: A Survey, Qu et al., arxiv 2024. [paper][code]
Toolformer: Language Models Can Teach Themselves to Use Tools, Schick et al., arxiv 2023. [paper][toolformer-pytorch][conceptofmind/toolformer][xrsrke/toolformer][Graph_Toolformer]
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, Qin et al., ICLR 2024 Spotlight. [paper][code][StableToolBench]
Gorilla: Large Language Model Connected with Massive APIs, Patil et al., arxiv 2023. [paper][code]
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction, Yang et al., arxiv 2023. [paper][code]
RestGPT: Connecting Large Language Models with Real-World RESTful APIs, Song et al., arxiv 2023. [paper][code]
LLMCompiler: An LLM Compiler for Parallel Function Calling, Kim et al., arxiv 2023. [paper][code]
Large Language Models as Tool Makers, Cai et al, arxiv 2023. [paper][code]
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases Tang et al., arxiv 2023. [paper][code][ToolQA][toolbench]
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search, Zhuang et al., arxiv 2023. [paper][[code]]
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models, Lu et al., NeurIPS 2023. [paper][code]
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios, Ye et al., arxiv 2024. [paper][code]
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls, Du et al., arxiv 2024. [paper][code]
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error, Wang et al., arxiv 2024. [paper][code]
What Are Tools Anyway? A Survey from the Language Model Perspective, Wang et al., arxiv 2024. [paper]
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities, Lu et al., arxiv 2024. [paper][code]
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval, Chen et al., arxiv 2024. [paper]
ToolACE: Winning the Points of LLM Function Calling, Liu et al., arxiv 2024. [paper]
How to Train Really Large Models on Many GPUs, Lilian Weng, 2021. [blog]
Training great LLMs entirely from ground zero in the wilderness as a startup, Yi Tay, 2024. [blog][What happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives][New LLM Pre-training and Post-training Paradigms]
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Shoeybi et al., arxiv 2019. [paper][code][GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism][Parameter Server OSDI 2014]
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Rajbhandari et al., arxiv 2019. [paper][DeepSpeed]
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training, Li et al., ICPP 2023. [paper][code]
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs, Jiang et al., NSDI 2024. [paper][veScale][blog][Parameter Server OSDI 2014][ps-lite][ByteCheckpoint]
A Theory on Adam Instability in Large-Scale Machine Learning, Molybog et al., arxiv 2023. [paper]
Loss Spike in Training Neural Networks, Zhang et al., arxiv 2023. [paper]
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling, Biderman et al., arxiv 2023. [paper][code]
Continual Pre-Training of Large Language Models: How to (re)warm your model, Gupta et al., [paper]
FLM-101B: An Open LLM and How to Train It with $100K Budget, Li et al., arxiv 2023. [paper][model][Tele-FLM]
Instruction Tuning with GPT-4, Peng et al., arxiv 2023. [paper][code]
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, Khattab et al., arxiv 2023. [paper][code][textgrad][appl][okhat/blog]
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training, Feng et al., ICML 2024. [paper][code]
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning, Ye et al., arxiv 2024. [paper][code]
Arcee's MergeKit: A Toolkit for Merging Large Language Models, Goddard et al., arxiv 2024. [paper][code][DistillKit][A Survey on Collaborative Strategies in the Era of Large Language Models][FuseAI]
A Survey on Self-Evolution of Large Language Models, Tao et al., arxiv 2024. [paper][code]
Adam-mini: Use Fewer Learning Rates To Gain More, Zhang et al., arxiv 2024. [paper][code]
RouteLLM: Learning to Route LLMs with Preference Data, Ong et al., arxiv 2024. [paper][code]
Instruction Pre-Training: Language Models are Supervised Multitask Learners, Cheng et al., arxiv 2024. [paper][code]
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training, Jaghouar et al., arxiv 2024. [paper][code][DiLoCo][DisTrO]
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models, Jin et al., arxiv 2024. [paper][code][jailbreak_llms]
LLM Pruning and Distillation in Practice: The Minitron Approach, Sreenivas et al., arxiv 2024. [paper][code]
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning, An et al., arxiv 2024. [paper][code][Parameter Server OSDI 2014][ps-lite]
[wandb][aim][tensorboardX][nvitop]
AI Alignment: A Comprehensive Survey, Ji et al., arxiv 2023. [paper][PKU-Alignment]
Large Language Model Alignment: A Survey, Shen et al., arxiv 2023. [paper]
Aligning Large Language Models with Human: A Survey, Wang et al., arxiv 2023. [paper][code]
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More, Wang et al., arxiv 2024. [paper]
Towards a Unified View of Preference Learning for Large Language Models: A Survey, Gao et al., arxiv 2024. [paper][code]
Self-Instruct: Aligning Language Models with Self-Generated Instructions, Wang et al., ACL 2023. [paper][code][open-instruct][Multi-modal-Self-instruct][evol-instruct][MMEvol][Automatic Instruction Evolving for Large Language Models]
Self-Alignment with Instruction Backtranslation, Li et al., ICLR 2024. [paper][unofficial implementation]
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, Liu et al., ICLR 2024. [paper][code][From Quantity to Quality NAACL'24][Reformatted Alignment][MAmmoTH2: Scaling Instructions from the Web]
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing, Xu et al., arxiv 2024. [paper][code]
RLHF: [hf blog][OpenAI blog][alignment blog][awesome-RLHF]
Secrets of RLHF in Large Language Models [MOSS-RLHF][Part I][Part II]
Safe RLHF: Safe Reinforcement Learning from Human Feedback, Dai et al., ICLR 2024 Spotlight. [paper][code][align-anything]
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization, Huang et al., arxiv 2024. [paper][code][blog][trl][trlx]
RLHF Workflow: From Reward Modeling to Online RLHF, Dong et al., arxiv 2024. [paper][code]
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework, Hu et al., arxiv 2024. [paper][code]
LIMA: Less Is More for Alignment, Zhou et al., NeurIPS 2023. [paper]
DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafailov et al., NeurIPS 2023 Runner-up Award. [paper][Unofficial Implementation][trl][dpo_trainer]
BPO: Black-Box Prompt Optimization: Aligning Large Language Models without Model Training, Cheng et al., arxiv 2023. [paper][code]
KTO: Model Alignment as Prospect Theoretic Optimization, Ethayarajh et al., arxiv 2024. [paper][code]
ORPO: Monolithic Preference Optimization without Reference Model, Hong et al., arxiv 2024. [paper][code]
TDPO: Token-level Direct Preference Optimization, Zeng et al., arxiv 2024. [paper][code][Step-DPO][FineGrainedRLHF][MCTS-DPO]
SimPO: Simple Preference Optimization with a Reference-Free Reward, Meng et al., arxiv 2024. [paper][code]
Constitutional AI: Harmlessness from AI Feedback, Bai et al., arxiv 2022. [paper][code]
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, Lee et al., arxiv 2023. [paper][[code]][awesome-RLAIF]
Direct Language Model Alignment from Online AI Feedback, Guo et al., arxiv 2024. [paper]
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models, Li et al., ICML 2024. [paper][code][policy_optimization]
Zephyr: Direct Distillation of LM Alignment, Tunstall et al., arxiv 2023. [paper][code]
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision, Burns et al., arxiv 2023. [paper][code][weak-to-strong-deception]
SPIN: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, Chen et al., arxiv 2024. [paper][code][unofficial implementation]
SPPO: Self-Play Preference Optimization for Language Model Alignment, Wu et al., arxiv 2024. [paper][code][A Survey on Self-play Methods in Reinforcement Learning]
CALM: LLM Augmented LLMs: Expanding Capabilities through Composition, Bansal et al., arxiv 2024. [paper][CALM-pytorch]
Self-Rewarding Language Models, Yuan et al., arxiv 2024. [paper][unofficial implementation][Meta-Rewarding Language Models][Self-Taught Evaluators]
Anthropic: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, Hubinger et al., arxiv 2024. [paper]
LongAlign: A Recipe for Long Context Alignment of Large Language Models, Bai et al., arxiv 2024. [paper][code]
Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction, Ji et al., arxiv 2024. [paper][code]
A Survey on Knowledge Distillation of Large Language Models, Xu et al., arxiv 2024. [paper][code]
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment, Shen et al., arxiv 2024. [paper][code][Nemotron-4 340B Technical Report][Mistral NeMo][MaskLLM][HelpSteer2]
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Ni et al., arxiv 2024. [paper][code]
Towards Scalable Automated Alignment of LLMs: A Survey, Cao et al., arxiv 2024. [paper][code]
Putting RL back in RLHF, Huang and Ahmadian, 2024. [blog]
Prover-Verifier Games improve legibility of language model outputs, Kirchner et al., 2024. [blog][paper]
Rule Based Rewards for Language Model Safety, Mu et al., OpenAI 2024. [blog][paper][code]
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning, Zhao et al., arxiv 2024. [paper][code][prompt2model]
*Thinking about High-Quality Human Data, Lilian Weng, 2024. [blog]
C4: Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus, Dodge et al., arxiv 2021. [paper][dataset]
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset, Laurençon et al., NeurIPS 2023. [paper][code][dataset]
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only, Penedo et al., arxiv 2023. [paper][dataset]
Data-Juicer: A One-Stop Data Processing System for Large Language Models, Chen et al., arxiv 2023. [paper][code]
UltraChat: Enhancing Chat Language Models by Scaling High-quality Instructional Conversations, Ding et al., EMNLP 2023. [paper][code][ultrachat]
UltraFeedback: Boosting Language Models with High-quality Feedback, Cui et al., ICML 2024. [paper][code][UltraInteract_sft]
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, Liu et al., ICLR 2024. [paper][code]
WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset, Qiu et al., arxiv 2024. [paper][dataset][LabelLLM][labelU][MinerU][PDF-Extract-Kit]
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research, Soldaini et al., ACL 2024. [paper][code][OLMo]
Datasets for Large Language Models: A Comprehensive Survey, Liu et al., arxiv 2024. [paper][Awesome-LLMs-Datasets]
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows, Patel et al., arxiv 2024. [paper][code]
Large Language Models for Data Annotation: A Survey, Tan et al., arxiv 2024. [paper][code]
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance, Ye et al., arxiv 2024. [paper][code]
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning, Bai et al., arxiv 2024. [paper][dataset]
Best Practices and Lessons Learned on Synthetic Data for Language Models, Liu et al., arxiv 2024. [paper]
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale, HuggingFace, 2024. [paper][blogpost][fineweb][fineweb-edu]
DataComp: In search of the next generation of multimodal datasets, Gadre et al., arxiv 2023. [paper][code]
DataComp-LM: In search of the next generation of training sets for language models, Li et al., arxiv 2024. [paper][code][apple/DCLM-7B-8k]
Scaling Synthetic Data Creation with 1,000,000,000 Personas, Chan et al., arxiv 2024. [paper][code]
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale, Zhou et al., arxiv 2024. [paper][code]
MinerU: An Open-Source Solution for Precise Document Content Extraction, Wang et al., arxiv 2024. [paper][code]
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models, Lai et al., arxiv 2024. [paper][BLIP]
[RedPajama-Data][xland-minigrid-datasets][OmniCorpus][dclm][Infinity-Instruct][MNBVC][LMSYS-Chat-1M]
[Awesome-LLM-Eval][LLM-eval-survey][llm_benchmarks][Awesome-LLMs-Evaluation-Papers]
MMLU: Measuring Massive Multitask Language Understanding, Hendrycks et al., ICLR 2021. [paper][code]
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks, Wang et al., EMNLP 2022. [paper][code]
HELM: Holistic Evaluation of Language Models, Liang et al., arxiv 2022. [paper][code]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, Zheng et al., arxiv 2023. [paper][code]
SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark, Xu et al., arxiv 2023. [paper][code][SuperCLUE-RAG]
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models, Huang et al., NeurIPS 2023. [paper][code][chinese-llm-benchmark]
CMMLU: Measuring massive multitask language understanding in Chinese, Li et al., arxiv 2023. [paper][code]
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark, Zhang et al., arxiv 2024. [paper][code]
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference, Chiang et al., ICML 2024. [paper][demo]
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models, Kim et al., arxiv 2024. [paper][code]
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models, Zhang et al., arxiv 2024. [paper][code]
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark, Yue et al., arxiv 2024. [paper][code]
Law of the Weakest Link: Cross Capabilities of Large Language Models, Zhong et al., arxiv 2024. [paper][code]
How to make LLMs go fast, 2023. [blog]
A Visual Guide to Quantization, 2024. [blog]
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, Miao et al., arxiv 2023. [paper][Awesome-Quantization-Papers][awesome-model-quantization][qllm-eval]
Full Stack Optimization of Transformer Inference: a Survey, Kim et al., arxiv 2023. [paper]
A Survey on Efficient Inference for Large Language Models, Zhou et al., arxiv 2024. [paper]
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale, Dettmers et al., NeurIPS 2022. [paper][code]
LLM-FP4: 4-Bit Floating-Point Quantized Transformers, Liu et al., arxiv 2023. [paper][code]
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models, Shao et al., ICLR 2024 Spotlight. [paper][code][smoothquant][ABQ-LLM][VPTQ]
BitNet: Scaling 1-bit Transformers for Large Language Models, Wang et al., arxiv 2023. [paper][code][unofficial implementation][BitNet b1.58][T-MAC][BitBLAS][BiLLM]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers, Frantar et al., ICLR 2023. [paper][code][AutoGPTQ]
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models, Frantar et al., arxiv 2023. [paper][code]
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration, Lin et al., arxiv 2023. [paper][code][AutoAWQ][qserve]
LLM in a flash: Efficient Large Language Model Inference with Limited Memory, Alizadeh et al., arxiv 2023. [paper][air_llm]
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models, Jiang et al., EMNLP 2023. [paper][code]
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU, Sheng et al., ICML 2023. [paper][code]
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU, Song et al., arxiv 2023. [paper][code][llama.cpp][airllm][PowerInfer-2]
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness, Dao et al., NeurIPS 2022. [paper][code]
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning, Tri Dao, ICLR 2024. [paper][code][xformers][SageAttention]
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision, Shah et al., arxiv 2024. [paper][code]
vllm: Efficient Memory Management for Large Language Model Serving with PagedAttention, Kwon et al., arxiv 2023. [paper][code][FastChat][Nanoflow]
SGLang: Fast and Expressive LLM Inference with RadixAttention and SGLang, Zheng et al., Stanford blog 2024. [blog][paper][code]
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads, Cai et al., ICML 2024. [paper][code]
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty, Li et al., ICML 2024. [paper][code][LLMSpeculativeSampling][Sequoia]
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding, Xia et al., arxiv 2024. [paper][code][Spec-Bench]
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding, Liu et al., arxiv 2024. [paper][[code]][Ouroboros]
CLLMs: Consistency Large Language Models, Kou et al., ICML 2024. [paper][code][LookaheadDecoding][Lookahead]
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention, Jiang et al., arxiv 2024. [paper][code]
Sarathi-Serve: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve, Agrawal et al., OSDI 2024. [paper][code][ORCA OSDI 2022][continuous batching blog]
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving, Zhong et al., OSDI 2024. [paper][code]
Prompt Cache: Modular Attention Reuse for Low-Latency Inference, Gim et al., ICML 2024. [paper][code]
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention, Brandon et al., arxiv 2024. [paper][YOCO]
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving, Qin et al., arxiv 2024. [paper][code][ktransformers]
[TensorRT-LLM][FasterTransformer][TritonServer][GenerativeAIExamples][TensorRT-Model-Optimizer][TensorRT][OpenVINO]
[text-generation-inference][quantization][optimum-quanto][huggingface-inference-toolkit][torchao]
[OpenLLM][mlc-llm][ollama][open-webui][torchchat]
[ggml][exllamav2][llama.cpp][gpt-fast][lightllm][fastllm][CTranslate2][ipex-llm][rtp-llm][KsanaLLM]
Mixture of Experts Explained, Sanseviero et al., Hugging Face Blog 2023. [blog]
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, Shazeer et al., arxiv 2017. [paper][Re-Implementation]
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Lepikhin et al., arxiv 2020. [paper][mixture-of-experts]
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts, Gale et al., arxiv 2022. [paper][code]
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models, Shen et al., arxiv 2023. [paper][[code]]
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity, Fedus et al., arxiv 2021. [paper][code]
Fast Inference of Mixture-of-Experts Language Models with Offloading, Eliseev and Mazur, arxiv 2023. [paper][code]
Mixtral-8×7B: Mixtral of Experts, Jiang et al., arxiv 2023. [paper][code][megablocks-public][model][blog][Chinese-Mixtral-8x7B][Chinese-Mixtral]
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models, Dai et al., ACL 2024. [paper][code]
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, DeepSeek-AI, arxiv 2024. [paper][code][DeepSeek-V2.5]
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models, Wang et al., ACL 2024. [paper][code][Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts]
Evolutionary Optimization of Model Merging Recipes, Akiba et al., arxiv 2024. [paper][code]
A Closer Look into Mixture-of-Experts in Large Language Models, Lo et al., arxiv 2024. [paper][code]
A Survey on Mixture of Experts, Cai et al., arxiv 2024. [paper][code]
HMoE: Heterogeneous Mixture of Experts for Language Modeling, Wang et al., arxiv 2024. [paper]
OLMoE: Open Mixture-of-Experts Language Models, Muennighoff et al., arxiv 2024. [paper][code]
[llama-moe][Aurora][OpenMoE][makeMoE][PEER-pytorch][GRIN-MoE]
[PEFT][trl][accelerate][LLaMA-Factory][LMFlow][unsloth][xtuner][MFTCoder][llm-foundry][swift][Liger-Kernel]
LoRA: Low-Rank Adaptation of Large Language Models, Hu et al., ICLR 2022. [paper][code][LoRA From Scratch][lora][dora][MoRA][ziplora-pytorch][alpaca-lora]
QLoRA: Efficient Finetuning of Quantized LLMs, Dettmers et al., NeurIPS 2023 Oral. [paper][code][bitsandbytes][unsloth][ir-qlora][fsdp_qlora]
S-LoRA: Serving Thousands of Concurrent LoRA Adapters, Sheng et al., arxiv 2023. [paper][code][AdaLoRA][LoRAMoE][lorahub][O-LoRA][qa-lora]
LoRA-GA: Low-Rank Adaptation with Gradient Approximation, Wang et al., arxiv 2024. [paper][code][LoRA-Pro blog][dora]
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection, Zhao et al., arxiv 2024. [paper][code][Q-GaLore][WeLore]
Prefix-Tuning: Optimizing Continuous Prompts for Generation, Li et al., ACL 2021. [paper][code]
Adapter: Parameter-Efficient Transfer Learning for NLP, Houlsby et al., ICML 2019. [paper][code][unify-parameter-efficient-tuning]
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning, Poth et al., EMNLP 2023. [paper][code][A Survey on LoRA of Large Language Models]
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models, Hu et al., EMNLP 2023. [paper][code]
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, Zhang et al., ICLR 2024. [paper][code]
LLaMA Pro: Progressive LLaMA with Block Expansion, Wu et al., arxiv 2024. [paper][code]
P-Tuning: GPT Understands, Too, Liu et al., arxiv 2021. [paper][code]
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks, Liu et al., ACL 2022. [paper][code][pet][PrefixTuning]
Towards a Unified View of Parameter-Efficient Transfer Learning, He et al., ICLR 2022. [paper][code]
Parameter-efficient fine-tuning of large-scale pre-trained language models, Ding et al., Nature Machine Intelligence 2023. [paper][code]
Mixed Precision Training, Micikevicius et al., ICLR 2018. [paper]
8-bit Optimizers via Block-wise Quantization Dettmers et al., ICLR 2022. [paper][code]
FP8-LM: Training FP8 Large Language Models Peng et al., arxiv 2023. [paper][code]
NEFTune: Noisy Embeddings Improve Instruction Finetuning, Jain et al., ICLR 2024. [paper][code][NoisyTune][transformer_arithmetic]
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey, Han et al., arxiv 2024. [paper]
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models, Diao et al., NAACL 2024. [paper][code]
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models, Zheng et al., ACL 2024. [paper][code]
ReFT: Representation Finetuning for Language Models, Wu et al., arxiv 2024. [paper][code]
OpenPrompt: An Open-source Framework for Prompt-learning, Ding et al., arxiv 2021. [paper][code]
Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning, Su et al., arxiv 2022. [paper]
Large Language Models Are Human-Level Prompt Engineers, Zhou et al., ICLR 2023. [paper][code]
Large Language Models as Optimizers, Yang et al., arxiv 2023. [paper][code]
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4, Bsharat et al., arxiv 2023. [paper][code]
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding, Suzgun and Kalai, arxiv 2024. [paper][code]
AutoPrompt: Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases, Levi et al., arxiv 2024. [paper][code][automatic_prompt_engineer][appl][sammo][prompt-poet][ell]
LangGPT: Rethinking Structured Reusable Prompt Design Framework for LLMs from the Programming Language, Wang et al., arxiv 2024. [paper][code]
The Prompt Report: A Systematic Survey of Prompting Techniques, Schulhoff et al., arxiv 2024. [paper][code][A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks][A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications]
[PromptPapers][OpenAI Docs][ChatGPT Prompt Engineering for Developers][Prompt Engineering Guide][k12promptguide][gpt-prompt-engineer][awesome-chatgpt-prompts][awesome-chatgpt-prompts-zh]
The Power of Scale for Parameter-Efficient Prompt Tuning, Lester et al., EMNLP 2021. [paper][code][soft-prompt-tuning][Prompt-Tuning]
A Survey on In-context Learning, Dong et al., arxiv 2023. [paper][code]
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work, Min et al., EMNLP 2022. [paper][code]
Larger language models do in-context learning differently, Wei et al., arxiv 2023. [paper]
PAL: Program-aided Language Models, Gao et al., ICML 2023. [paper][code]
A Comprehensive Survey on Instruction Following, Lou et al., arxiv 2023. [paper][code]
RLHF: Deep reinforcement learning from human preferences, Christiano et al., NIPS 2017. [paper]
RLHF: Fine-Tuning Language Models from Human Preferences, Ziegler et al., arxiv 2019. [paper][code]
RLHF: Learning to summarize from human feedback, Stiennon et al., NeurIPS 2020. [paper][code]
RLHF: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al., arxiv 2022. [paper][code]
Finetuned Language Models Are Zero-Shot Learners, Wei et al., ICLR 2022. [paper]
Instruction Tuning for Large Language Models: A Survey, Zhang et al., arxiv 2023. [paper][code]
What learning algorithm is in-context learning? Investigations with linear models, Akyürek et al., ICLR 2023. [paper]
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers, Dai et al., arxiv 2022. [paper][code]
Retrieval-Augmented Generation for Large Language Models: A Survey, Gao et al., arxiv 2023. [paper][code][Modular RAG]
Retrieval-Augmented Generation for AI-Generated Content: A Survey, Zhao et al., arxiv 2024. [paper][code]
A Survey on Retrieval-Augmented Text Generation for Large Language Models, Huang et al., arxiv 2024. [paper][Retrieval-Augmented Generation for Natural Language Processing: A Survey][A Survey on RAG Meeting LLMs]
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, Hu et al., arxiv 2024. [paper][code]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al., NeurIPS 2020. [paper][code][model][docs][FAISS]
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, Asai et al., ICLR 2024 Oral. [paper][code][CRAG][Golden-Retriever]
Dense Passage Retrieval for Open-Domain Question Answering, Karpukhin et al., EMNLP 2020. [paper][code]
Internet-Augmented Dialogue Generation Komeili et al., arxiv 2021. [paper]
RETRO: Improving language models by retrieving from trillions of tokens, Borgeaud et al., arxiv 2021. [paper][RETRO-pytorch]
FLARE: Active Retrieval Augmented Generation, Jiang et al., EMNLP 2023. [paper][code]
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation, Vu et al., arxiv 2023. [paper][code]
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models, Yu et al., arxiv 2023. [paper]
Learning to Filter Context for Retrieval-Augmented Generation, Wang et al., arxiv 2023. [paper][code]
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval, Sarthi et al., ICLR 2024. [paper][code][tree2retriever][GoMate]
When Large Language Models Meet Vector Databases: A Survey, Jing et al., arxiv 2024. [paper][A Comprehensive Survey on How to Make your LLMs use External Data More Wisely]
RAFT: Adapting Language Model to Domain Specific RAG, Zhang et al., arxiv 2024. [paper][code]
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback, Liu et al., arxiv 2024. [paper][code]
RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation, Chan et al., arxiv 2024. [paper][code][Adaptive-RAG][Advanced RAG 11: Query Classification and Refinement]
Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers, Sawarkar et al., arxiv 2024. [paper][code][infinity]
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research, Jin et al., arxiv 2024. [paper][code]
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models, Gutiérrez et al., arxiv 2024. [paper][code]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization, Edge et al., arxiv 2024. [paper][code][GraphRAG-Local-UI][nano-graphrag][graph-rag][llm-graph-builder][Triplex][knowledge_graph_maker][itext2kg]
Graph Retrieval-Augmented Generation: A Survey, Peng et al., arxiv 2024. [paper]
Searching for Best Practices in Retrieval-Augmented Generation, Wang et al., arxiv 2024. [paper][code][Seven Failure Points When Engineering a Retrieval Augmented Generation System][Improving Retrieval Performance in RAG Pipelines with Hybrid Search][15 Advanced RAG Techniques from Pre-Retrieval to Generation]
Self-Reasoning: Improving Retrieval Augmented Language Model with Self-Reasoning, Xia et al., arxiv 2024. [paper]
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, Fleischer et al., arxiv 2024. [paper][code][fastRAG]
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework, Zhu et al., arxiv 2024. [paper][ragas][RAGChecker][rageval]
A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning, Yuan et al., arxiv 2024. [paper][code][ind_kdd_2024/][KDD2024-WhoIsWho-Top3]
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery, Qian et al., arxiv 2024. [paper][code][mem0][Memary][MemoryScope]
Introducing Contextual Retrieval, Anthropic, 2024. [blog]
ACL 2023 Tutorial: Retrieval-based Language Models and Applications, Asai et al., ACL 2023. [link]
[Advanced RAG Techniques: an Illustrated Overview][Chinese Version][RAG_Techniques][Controllable-RAG-Agent][GenAI_Agents]
[LlamaIndex][llama_deploy][A Cheat Sheet and Some Recipes For Building Advanced RAG][Fine-Tuning Embeddings for RAG with Synthetic Data]
[ragas]
Browse the web with GPT-4V and Vimium [vimGPT]
[QAnything][ragflow][fastRAG][anything-llm][FastGPT][mem0][Memary]
[trt-llm-rag-windows][history_rag][gpt-crawler][R2R][rag-notebook-to-microservices][MaxKB][Verba][cognita][quivr][kotaemon][RAGMeUp]
[RAG-Retrieval][FlashRank][rank_bm25][PGRAG][CRUD_RAG][PlanRAG][DPA-RAG][LongRAG][Controllable-RAG-Agent][structured-rag][RAGLab][autogluon-rag][VARAG]
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models, Thakur et al., NeurIPS 2021. [paper][code]
MTEB: Massive Text Embedding Benchmark, Muennighoff et al., arxiv 2022. [paper][code][leaderboard]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Reimers et al., EMNLP 2019. [paper][code][model][vec2text]
SimCSE: Simple Contrastive Learning of Sentence Embeddings, Gao et al., EMNLP 2021. [paper][code][AnglE ACL 2024]
OpenAI: Text and Code Embeddings by Contrastive Pre-Training, Neelakantan et al., arxiv 2022. [paper][blog]
MRL: Matryoshka Representation Learning, Kusupati et al., NeurIPS 2022. [paper][code]
BGE: C-Pack: Packaged Resources To Advance General Chinese Embedding, Xiao et al., SIGIR 2024. [paper][code][bge reranker][FlagEmbedding]
LLM-Embedder: Retrieve Anything To Augment Large Language Models, Zhang et al., arxiv 2023. [paper][code][ACL 2024][llm_reranker][FlagEmbedding]
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation, Chen et al., ACL 2024. [paper][code][FlagEmbedding][blog]
[m3e-base][acge_text_embedding][xiaobu-embedding-v2][stella_en_1.5B_v5][Conan-embedding-v1]
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents, Günther et al., arxiv 2023. [paper][jina-embeddings-v2][jina-reranker-v2][pe_rank][Jina CLIP][jina-embeddings-v3]
GTE: Towards General Text Embeddings with Multi-stage Contrastive Learning, Li et al., arxiv 2023. [paper][model][gte-Qwen2-7B-instruct][gte-large-en-v1.5]
[CohereV3]
One Embedder, Any Task: Instruction-Finetuned Text Embeddings, Su et al., ACL 2023. [paper][code]
E5: Improving Text Embeddings with Large Language Models, Wang et al., ACL 2024. [paper][code][model][llm2vec]
Nomic Embed: Training a Reproducible Long Context Text Embedder, Nussbaum et al., Nomic AI 2024. [paper][code]
GritLM: Generative Representational Instruction Tuning, Muennighoff et al., arxiv 2024. [paper][code][OneGen]
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders, BehnamGhader et al., arxiv 2024. [paper][code]
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models, Lee et al., arxiv 2024. [paper][model]
PE-Rank: Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models, Liu et al., arxiv 2024. [paper][code]
Making Text Embedders Few-Shot Learners, Li et al., arxiv 2024. [paper][code]
Few-Shot-CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al., NeurIPS 2022. [paper][chain-of-thought-hub]
Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al., ICLR 2023. [paper]
Zero-Shot-CoT: Large Language Models are Zero-Shot Reasoners, Kojima et al., NeurIPS 2022. [paper][code]
Auto-CoT: Automatic Chain of Thought Prompting in Large Language Models, Zhang et al., ICLR 2023. [paper][code]
Multimodal Chain-of-Thought Reasoning in Language Models, Zhang et al., arxiv 2023. [paper][code]
Chain-of-Thought Reasoning Without Prompting, Wang et al., arxiv 2024. [paper]
ReAct: Synergizing Reasoning and Acting in Language Models, Yao et al., ICLR 2023. [paper][code]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, Yang et al., arxiv 2023. [paper][code]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Yao et al., NeurIPS 2023. [paper][code][Plug in and Play Implementation][tree-of-thought-prompting]
Graph of Thoughts: Solving Elaborate Problems with Large Language Models, Besta et al., arxiv 2023. [paper][code]
Cumulative Reasoning with Large Language Models, Zhang et al., arxiv 2023. [paper][code][On the Diagram of Thought]
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, Sel et al., arxiv 2023. [paper][unofficial code]
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation, Ding et al., arxiv 2023. [paper][code]
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models, Ye et al., arxiv 2024. [paper][code]
Large Language Models Are Reasoning Teachers, Ho et al., ACL 2023. [paper][code]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, Zhou et al., ICLR 2023. [paper]
DEPS: Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents, Wang et al., arxiv 2023. [paper][code]
RAP: Reasoning with Language Model is Planning with World Model, Hao et al., EMNLP 2023. [paper][code][LLM Reasoners COLM 2024]
LEMA: Learning From Mistakes Makes LLM Better Reasoner, An et al., arxiv 2023. [paper][code]
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, Chen et al., TMLR 2023. [paper][code]
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator, Li et al., arxiv 2023. [paper][[code]]
The Impact of Reasoning Step Length on Large Language Models, Jin et al., arxiv 2024. [paper][code]
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, Wang et al., ACL 2023. [paper][code][maestro]
Improving Factuality and Reasoning in Language Models through Multiagent Debate, Du et al., arxiv 2023. [paper][code][Multi-Agents-Debate]
Self-Refine: Iterative Refinement with Self-Feedback, Madaan et al., arxiv 2023. [paper][code][MCT Self-Refine]
Reflexion: Language Agents with Verbal Reinforcement Learning, Shinn et al., NeurIPS 2023. [paper][code]
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, Gou et al., ICLR 2024. [paper][code]
LATS: Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models, Zhou et al., ICML 2024. [paper][code]
Self-Discover: Large Language Models Self-Compose Reasoning Structures, Zhou et al., NeurIPS 2024. [paper][unofficial implementation][SELF-DISCOVER]
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation, Wang et al., arxiv 2024. [paper][code]
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents, Zhu et al., arxiv 2024. [paper][code][KnowLM]
Advancing LLM Reasoning Generalists with Preference Trees, Yuan et al., arxiv 2024. [paper][code]
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models, Yang et al., arxiv 2024. [paper][code][SymbCoT]
ReST-EM: Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models, Singh et al., arxiv 2023. [paper][unofficial code]
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent, Aksitov et al., arxiv 2023. [paper][[code]]
Searchformer: Beyond A: Better Planning with Transformers via Search Dynamics Bootstrapping*, Lehnert et al., COLM 2024. [paper][code]
How Far Are We from Intelligent Visual Deductive Reasoning?, Zhang et al., arxiv 2024. [paper][code]
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, Lee et al., arxiv 2024. [paper][code]
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning, Kim et al., arxiv 2024. [paper][code]
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning, Wang et al., arxiv 2024. [paper][code]
QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback-based Self-Correction, Huang et al., ACL 2024. [paper][code]
Internal Consistency and Self-Feedback in Large Language Models: A Survey, Liang et al., arxiv 2024. [paper][code]
Prover-Verifier Games improve legibility of language model outputs, Kirchner et al., 2024. [blog][paper]
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning, Wang et al., ACL 2024. [paper][code]
ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search*, Zhang et al., arxiv 2024. [paper][code]
rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers, Qi et al., arxiv 2024. [paper][code][Orca 2][Quiet-STaR]
OpenAI o1: Learning to Reason with LLMs, OpenAI, 2024. [blog][Agent Q][Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters][Let's Verify Step by Step][Awesome-LLM-Strawberry][O1-Journey]
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment, Kazemnejad et al., arxiv 2024. [paper][code]
[llm-reasoners][g1][Open-O1][show-me]
Scaling Laws for Neural Language Models, Kaplan et al., arxiv 2020. [paper][unofficial code]
Emergent Abilities of Large Language Models, Wei et al., TMRL 2022. [paper]
Chinchilla: Training Compute-Optimal Large Language Models, Hoffmann et al., NeurIPS 2022. [paper]
Scaling Laws for Autoregressive Generative Modeling, Henighan et al., arxiv 2020. [paper]
Are Emergent Abilities of Large Language Models a Mirage, Schaeffer et al., NeurIPS 2023 Outstanding Paper. [paper]
Understanding Emergent Abilities of Language Models from the Loss Perspective, Du et al., arxiv 2024. [paper]
S2A: System 2 Attention (is something you might need too), Weston et al., arxiv 2023. [paper][Distilling System 2 into System 1][system-2-research]
Memory3: Language Modeling with Explicit Memory, Yang et al., arxiv 2024. [paper]
Scaling Laws for Downstream Task Performance of Large Language Models, Isik et al., arxiv 2024. [paper]
Scalable Pre-training of Large Autoregressive Image Models, El-Nouby et al., arxiv 2024. [paper][code]
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method, Zhang et al., ICLR 2024. [paper]
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws, Allen-Zhu et al, arxiv 2024. [paper]
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process, Ye et al., arxiv 2024. [paper][project page]
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems, Ye et al., arxiv 2024. [paper]
Language Modeling Is Compression, Delétang et al., arxiv 2023. [paper]
Language Models Represent Space and Time, Gurnee and Tegmark, ICLR 2024. [paper][code]
The Platonic Representation Hypothesis, Huh et al., arxiv 2024. [paper][code]
Observational Scaling Laws and the Predictability of Language Model Performance, Ruan et al., arxiv 2024. [paper][code]
Language models can explain neurons in language models, OpenAI, 2023. [blog][code][transformer-debugger]
Scaling and evaluating sparse autoencoders, Gao et al., arxiv 2024. [OpenAI Blog][paper][code][sae-auto-interp]
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, Anthropic, 2023. [blog]
Mapping the Mind of a Large Language Model, Anthropic, 2024. [blog]
Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era, Wu et al., arxiv 2024. [paper][code]
LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models, Tufanov et al., arxiv 2024. [paper][code]
Transformer Explainer: Interactive Learning of Text-Generative Models, Cho et al., arxiv 2024. [paper][code][demo]
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation, Singh et al., ICML 2024 Spotlight. [paper][code]
[Transformer Circuits Thread][colah's blog][Transformer Interpretability][Awesome-Interpretability-in-Large-Language-Models][TransformerLens][inseq]
ROME: Locating and Editing Factual Associations in GPT, Meng et al., NeurIPS 2022. [paper][code][FastEdit]
Editing Large Language Models: Problems, Methods, and Opportunities, Yao et al., EMNLP 2023. [paper][code][Knowledge Mechanisms in Large Language Models: A Survey and Perspective]
A Comprehensive Study of Knowledge Editing for Large Language Models, Zhang et al., arxiv 2024. [paper][code]
MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, He et al., CVPR 2020. [paper][code]
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, Chen et al., PMLR 2020. [paper][code]
CoCa: Contrastive Captioners are Image-Text Foundation Models, Yu et al., arxiv 2024. [paper][CoCa-pytorch][multimodal]
DINOv2: Learning Robust Visual Features without Supervision, Oquab et al., arxiv 2023. [paper][code]
FeatUp: A Model-Agnostic Framework for Features at Any Resolution, Fu et al., ICLR 2024. [paper][code]
InfoNCE Loss: Representation Learning with Contrastive Predictive Coding, Oord et al., arxiv 2018. [paper][unofficial code]
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, Mildenhall et al., ECCV 2020. [paper][code][nerf-pytorch][NeRF-Factory][LERF][LangSplat]
GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior, Wang et al., CVPR 2021. [paper][code]
CodeFormer: Towards Robust Blind Face Restoration with Codebook Lookup Transformer, Zhou et al., NeurIPS 2022. [paper][code][APISR][EvTexture][video2x]
BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers, Li et al., ECCV 2022. [paper][code][occupancy_networks][VoxFormer][TPVFormer][GeMap]
UniAD: Planning-oriented Autonomous Driving, Hu et al., CVPR 2023 Best Paper. [paper][code]
Nougat: Neural Optical Understanding for Academic Documents, Blecher et al., arxiv 2023. [paper][code][marker][MixTeX-Latex-OCR][kosmos-2.5][gptpdf][omniparse][llama_parse][PDF-Extract-Kit]
FaceChain: A Playground for Identity-Preserving Portrait Generation, Liu et al., arxiv 2023. [paper][code]
MGIE: Guiding Instruction-based Image Editing via Multimodal Large Language Models, Fu et al., ICLR 2024 Spotlight. [paper][code]
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding, Li et al., CVPR 2024. [paper][code][AnyDoor]
InstantID: Zero-shot Identity-Preserving Generation in Seconds, Wang et al., arxiv 2024. [paper][code][InstantStyle][ID-Animator][ConsistentID][PuLID][ComfyUI-InstantID]
ReplaceAnything as you want: Ultra-high quality content replacement, [link][OutfitAnyone][IDM-VTON][IMAGDressing][CatVTON]
LayerDiffusion: Transparent Image Layer Diffusion using Latent Transparency, Zhang et al., arxiv 2024. [paper][code][sd-forge-layerdiffusion][LayerDiffuse_DiffusersCLI][IC-Light][Paints-UNDO]
Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image, Wu et al., arxiv 2024. [paper][code][MeshAnything][MeshAnythingV2][InstantMesh][prolificdreamer][Metric3D][ReconX]
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement, Boss et al., arxiv 2024. [paper][code][ViewCrafter][3DTopia-XL]
Sapiens: Foundation for Human Vision Models, Khirodkar et al., ECCV 2024 Oral. [paper][code]
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model, Wei et al., arxiv 2024. [paper][code][PaddleOCR][EasyOCR][llm_aided_ocr]
[deepfakes/faceswap][DeepFaceLab][DeepFaceLive][deepface][Deep-Live-Cam][DeepFakeDefenders][HivisionIDPhotos]
[IOPaint][SPADE][PowerPaint]
[MuseV][ToonCrafter]
ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy et al., ICLR 2021. [paper][code][vit-pytorch][efficientvit][EfficientFormer][ViT-Adapter]
ViT-Adapter: Vision Transformer Adapter for Dense Predictions, Chen et al., ICLR 2023 Spotlight. [paper][code]
Vision Transformers Need Registers, Darcet et al., ICLR 2024 Outstanding Paper. [paper]
DeiT: Training data-efficient image transformers & distillation through attention, Touvron et al., ICML 2021. [paper][code]
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision, Kim et al., ICML 2021. [paper][code]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, Liu et al., ICCV 2021. [paper][code]
MAE: Masked Autoencoders Are Scalable Vision Learners, He et al., CVPR 2022. [paper][code][FLIP]
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks, Xiao et al., CVPR 2024 Oral. [paper][model][Inference code]
LVM: Sequential Modeling Enables Scalable Learning for Large Vision Models, Bai et al., arxiv 2023. [paper][code]
GLEE: General Object Foundation Model for Images and Videos at Scale, Wu wt al., CVPR 2024 Highlight. [paper][code]
Tokenize Anything via Prompting, Pan et al., arxiv 2023. [paper][code]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Zhu et al., ICML 2024. [paper][code][VMamba][mambaout]
MambaVision: A Hybrid Mamba-Transformer Vision Backbone, Hatamizadeh and Kautz, arxiv 2024. [paper][code]
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, Yang et al., arxiv 2024. [paper][code][Depth-Anything-V2][ml-depth-pro]
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models, Guo et al., arxiv 2024. [paper][code]
TiTok: An Image is Worth 32 Tokens for Reconstruction and Generation, Yu et al., arxiv 2024. [paper][titok-pytorch]
Theia: Distilling Diverse Vision Foundation Models for Robot Learning, Shang et al., arxiv 2024. [paper][code]
InstructPix2Pix: Learning to Follow Image Editing Instructions, Brooks et al., CVPR 2023 Highlight. [paper][code]
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold, Pan et al., SIGGRAPH 2023. [paper][code]
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing, Shi et al., arxiv 2023. [paper][code]
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models, Mou et al., ICLR 2024 Spolight. [paper][code]
LEDITS++: Limitless Image Editing using Text-to-Image Models, Brack et al., arxiv 2023. [paper][code][demo]
Diffusion Model-Based Image Editing: A Survey, Huang et al., arxiv 2024. [paper][code]
MimicBrush: Zero-shot Image Editing with Reference Imitation, Chen et al., arxiv 2024. [paper][code][EchoMimic]
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models, Shuai et al., arxiv 2024. [paper][code]
DETR: End-to-End Object Detection with Transformers, Carion et al., arxiv 2020. [paper][code]
Focus-DETR: Less is More_Focus Attention for Efficient DETR, Zheng et al., arxiv 2023. [paper][code]
U2-Net_Going Deeper with Nested U-Structure for Salient Object Detection, Qin et al., arxiv 2020. [paper][code]
YOLO: You Only Look Once: Unified, Real-Time Object Detection Redmon et al., arxiv 2015. [paper]
YOLOX: Exceeding YOLO Series in 2021, Ge et al., arxiv 2021. [paper][code]
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism, Wang et al., arxiv 2023. [paper][code]
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, Liu et al., ECCV 2024. [paper][code][OV-DINO][OmDet]
YOLO-World: Real-Time Open-Vocabulary Object Detection, Cheng et al., CVPR 2024. [paper][code]
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, Wang et al., arxiv 2024. [paper][code]
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy, Jiang et al., arxiv 2024. [paper][code]
YOLOv10: Real-Time End-to-End Object Detection, Wang et al., arxiv 2024. [paper][code]
[detectron2][yolov5][mmdetection][detrex][ultralytics][AlphaPose]
U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger et al., MICCAI 2015. [paper][Pytorch-UNet][xLSTM-UNet-Pytorch]
Segment Anything, Kirillov et al., ICCV 2023. [paper][code][SAM-Adapter-PyTorch]
SAM 2: Segment Anything in Images and Videos, Ravi et al., SIGGRAPH 2024. [blog][paper][code]
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything, Xiong et al., CVPR 2024. [paper][code][FastSAM][RobustSAM][MobileSAM]
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks, Ren et al., arxiv 2024. [paper][code][Grounded-SAM-2]
LISA: Reasoning Segmentation via Large Language Model, Lai et al., arxiv 2023. [paper][code][VideoLISA]
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding, Zhang et al., arxiv 2024. [paper][code]
Whisper: Robust Speech Recognition via Large-Scale Weak Supervision, Radford et al., arxiv 2022. [paper][code][whisper.cpp][faster-whisper][WhisperFusion][whisper-diarization]
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio, Bain et al., arxiv 2023. [paper][code]
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling,Gandhi et al., arxiv 2023. [paper][code]
Speculative Decoding for 2x Faster Whisper Inference, Sanchit Gandhi, HuggingFace Blog 2023. [blog][paper]
VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers, Wang et al., arxiv 2023. [paper][code]
VALL-E-X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling, Zhang et al., arxiv 2023. [paper][code]
Seamless: Multilingual Expressive and Streaming Speech Translation, Seamless Communication et al., arxiv 2023. [paper][code][audiocraft]
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation, Seamless Communication et al., arxiv 2023. [paper][code]
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models, Li et al., NeurIPS 2023. [paper][code]
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit, Zhang et al., arxiv 2023. [paper][code][FoleyCrafter][vta-ldm]
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech, Kim et al., ICML 2021. [paper][code][Bert-VITS2][so-vits-svc-fork][GPT-SoVITS][VITS-fast-fine-tuning]
OpenVoice: Versatile Instant Voice Cloning, Qin et al., arxiv 2023. [paper][code][MockingBird][clone-voice][Real-Time-Voice-Cloning]
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models, Ju et al., arxiv 2024. [paper][e2-tts-pytorch]
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild, Peng et al., arxiv 2024. [paper][code]
WavLLM: Towards Robust and Adaptive Speech Large Language Model, Hu et al., arxiv 2024. [paper][code]
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation, Xu et al., arxiv 2024. [paper][code][champ]
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning, Zhang et al., ACL 2024. [paper][code][LLaMA-Omni][SpeechGPT]
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs, Tongyi Speech Team, arxiv 2024. [paper][code][CosyVoice]
Qwen2-Audio Technical Report, Chu et al., arxiv 2024. [blog][paper][code][Qwen-Audio]
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling, Ji et al., arxiv 2024. [paper][code]
Language Model Can Listen While Speaking, Ma et al., arxiv 2024. [paper][demo]
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming, Xie et al., arxiv 2024. [paper][code][moshi][LLaMA-Omni]
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications, Guo et al., arxiv 2024. [paper][code][Seed-TTS]
Github Repositories
[coqui-ai/TTS][suno-ai/bark][ChatTTS][WhisperSpeech][MeloTTS][parler-tts][fish-speech][MARS5-TTS][metavoice-src]
[stable-audio-tools][Qwen-Audio][pyannote-audio][ims-toucan][AudioLCM][speech-to-speech]
[FunASR][FunClip][FunAudioLLM][TeleSpeech-ASR][EmotiVoice][wenet]
[SadTalker][Wav2Lip][video-retalking][SadTalker-Video-Lip-Sync][AniPortrait][GeneFacePlusPlus][V-Express][MuseTalk][EchoMimic]
Tutorial on Diffusion Models for Imaging and Vision, Stanley H. Chan, arxiv 2024. [paper][diffusion-models-class]
Denoising Diffusion Probabilistic Models, Ho et al., NeurIPS 2020. [paper][code][Pytorch Implementation][RDDM]
Improved Denoising Diffusion Probabilistic Models, Nichol and Dhariwal, ICML 2021. [paper][code]
Diffusion Models Beat GANs on Image Synthesis, Dhariwal and Nichol, NeurIPS 2021. [paper][code]
Classifier-Free Diffusion Guidance, Ho and Salimans, NeurIPS 2021. [paper][code]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, Nichol et al., arxiv 2021. [paper][code]
DALL-E2: Hierarchical Text-Conditional Image Generation with CLIP Latents, Ramesh et al., arxiv 2022. [paper][code][dalle-mini]
Stable-Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models, Rombach et al., CVPR 2022. [paper][code][CompVis/stable-diffusion][Stability-AI/stablediffusion][ml-stable-diffusion]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis, Podell et al., arxiv 2023. [paper][code][SDXL-Lightning]
Introducing Stable Cascade, Stability AI, 2024. [link][code][model]
SDXL-Turbo: Adversarial Diffusion Distillation, Sauer et al., arxiv 2023. [paper][code]
LCM: Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference, Luo et al., arxiv 2023. [paper][code][Hyper-SD][DMD2][ddim]
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module, Luo et al., arxiv 2023. [paper][code][diffusion-forcing]
Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, Esser et al., ICML 2024 Best Paper. [paper][model][mmdit]
SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation, Sauer et al., arxiv 2024. [paper]
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation, Kodaira et al., arxiv 2023. [paper][code]
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models, Marjit et al., arxiv 2024. [paper][code]
Video Diffusion Models, Ho et al., arxiv 2022. [paper][code]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets, Blattmann et al., arxiv 2023. [paper][code][Stable Video 4D][VideoCrafter][Video-Infinity]
Consistency Models, Song et al., arxiv 2023. [paper][code][Consistency Decoder]
A Survey on Video Diffusion Models, Xing et al., srxiv 2023. [paper][code]
Diffusion Models: A Comprehensive Survey of Methods and Applications, Yang et al., arxiv 2023. [paper][code]
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation, Yu et al., ICLR 2024. [paper][magvit2-pytorch][LlamaGen]
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models, Avrahami et al., arxiv 2023. [paper][code]
U-ViT: All are Worth Words: A ViT Backbone for Diffusion Models, Bao et al., CVPR 2023. [paper][code]
UniDiffuser: One Transformer Fits All Distributions in Multi-Modal Diffusion, Bao et al., arxiv 2023. [paper][code]
Matryoshka Diffusion Models, Gu et al., arxiv 2023. [paper][code]
SEDD: Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution, Lou et al., ICML 2024 Best Paper. [paper][code]
l-DAE: Deconstructing Denoising Diffusion Models for Self-Supervised Learning, Chen et al., arxiv 2024. [paper]
DiT: Scalable Diffusion Models with Transformers, Peebles et al., ICCV 2023 Oral. [paper][code][OpenDiT][VideoSys][MDT][PipeFusion][fast-DiT]
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers, Ma et al., arxiv 2024. [paper][code]
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis, Ren et al., arxiv 2024. [paper][model]
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer, Yang et al., arxiv 2024. [paper][code]
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion, Chen et al., arxiv 2024. [paper][code]
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget, Sehwag et al., arxiv 2024. [paper][code]
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model, Zhou et al. arxiv 2024. [paper][transfusion-pytorch][chameleon][MonoFormer]
Github Repositories
[stable-diffusion-webui][stable-diffusion-webui-colab][sd-webui-controlnet][stable-diffusion-webui-forge][automatic]
[ComfyUI][streamlit][gradio][ComfyUI-Workflows-ZHO][ComfyUI_Bxb]
LLaVA: Visual Instruction Tuning, Liu et al., NeurIPS 2023 Oral. [paper][code][vip-llava][LLaVA-pp][TinyLLaVA_Factory][LLaVA-RLHF]
LLaVA-1.5: Improved Baselines with Visual Instruction Tuning, Liu et al., arxiv 2023. [paper][code][LLaVA-UHD][LLaVA-HR]
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models, Li et al., arxiv 2024. [paper][code][Open-LLaVA-NeXT][MG-LLaVA][LongVA][LongLLaVA]
LLaVA-OneVision: Easy Visual Task Transfer, Li et al., arxiv 2024. [paper][code]
LLaVA-Video: Video Instruction Tuning With Synthetic Data, Zhang et al., arxiv 2024. [paper][code][LLaVA-Critic]
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day, Li et al., arxiv 2023. [paper][code]
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection, Lin et al., arxiv 2023. [paper][code][PLLaVA]
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models, Lin et al., arxiv 2024. [paper][code]
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models, Zhu et al., arxiv 2023. [paper][code][MiniGPT-4-ZH]
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning, Chen et al., arxiv 2023. [paper][code]
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens, Ataallah et al., arxiv 2024. [paper][code]
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens, Zheng et al., arxiv 2023. [paper][code]
Flamingo: a Visual Language Model for Few-Shot Learning, Alayrac et al., NeurIPS 2022. [paper][open-flamingo][flamingo-pytorch]
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding, Zhang et al., EMNLP 2023. [paper][code][VideoLLaMA2][VideoLLM-online]
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs, Zhao et al., arxiv 2023. [paper][code][AnyGPT]
Emu: Generative Pretraining in Multimodality, Sun et al., ICLR 2024. [paper][code]
Emu3: Next-Token Prediction is All You Need, Wang et al., arxiv 2024. [paper][code]
EVE: Unveiling Encoder-Free Vision-Language Models, Diao et al., arxiv 2024. [paper][code]
CogVLM: Visual Expert for Pretrained Language Models, Wang et al., arxiv 2023. [paper][code][VisualGLM-6B][CogCoM]
CogVLM2: Visual Language Models for Image and Video Understanding, Hong et al., arxiv 2024. [paper][code]
DreamLLM: Synergistic Multimodal Comprehension and Creation, Dong et al., ICLR 2024 Spotlight. [paper][code][dreambench_plus]
Meta-Transformer: A Unified Framework for Multimodal Learning, Zhang et al., arxiv 2023. [paper][code]
NExT-GPT: Any-to-Any Multimodal LLM, Wu et al., arxiv 2023. [paper][code]
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models, Wu et al., arxiv 2023. [paper][code]
SoM: Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V, Yang et al., arxiv 2023. [paper][code]
Ferret: Refer and Ground Anything Anywhere at Any Granularity, You et al., arxiv 2023. [paper][code][Ferret-UI]
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities, Bachmann et al., arxiv 2024. [paper][code][MM1.5]
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond, Bai et al., arxiv 2023. [paper][code]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution, Wang et al., arxiv 2024. [paper][code][modeling_qwen2_vl.py][finetune-Qwen2-VL][Oryx]
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition, Zhang et al., arxiv 2023. [paper][code]
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks, Chen et al., CVPR 2024 Oral. [paper][code][InternVideo][InternVid][InternVL1.5 paper]
DeepSeek-VL: Towards Real-World Vision-Language Understanding, Lu et al., arxiv 2024. [paper][code]
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions, Chen et al., arxiv 2023. [paper][code]
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions, Chen et al., arxiv 2024. [paper][code]
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones, Yuan et al., arxiv 2023. [paper][code]
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models, Li et al., CVPR 2024. [paper][code]
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models, Wei et al., arxiv 2023. [paper][code]
Vary-toy: Small Language Model Meets with Reinforced Vision Vocabulary, Wei et al., arxiv 2024. [paper][code]
VILA: On Pre-training for Visual Language Models, Lin et al., CVPR 2024. [paper][code][LongVILA][Eagle][NVLM]
POINTS: Improving Your Vision-language Model with Affordable Strategies, Liu et al., arxiv 2024. [paper]
LWM: World Model on Million-Length Video And Language With RingAttention, Liu et al., arxiv 2024. [paper][code]
Chameleon: Mixed-Modal Early-Fusion Foundation Models, Chameleon Team, arxiv 2024. [paper][code]
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts, Li et al., arxiv 2024. [paper][code]
RL4VLM: Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning, Zhai et al., arxiv 2024. [paper][code][RLHF-V][RLAIF-V]
OpenVLA: An Open-Source Vision-Language-Action Model, Kim et al., arxiv 2024. [paper][code]
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis, Fu et al., arxiv 2024. [paper][code][lmms-eval][VLMEvalKit][multimodal-needle-in-a-haystack][MM-NIAH][VideoNIAH][ChartMimic][WildVision]
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities, Yu et al., ICML 2024. [paper][code][UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling]
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs, Tong et al., arxiv 2024. [paper][code]
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models, Sun et al., ICML 2024. [paper][code]
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation, Chern et al., arxiv 2024. [paper][code]
PaliGemma: A versatile 3B VLM for transfer, Beyer et al., arxiv 2024. [paper][code][pytorch-paligemma][Pixtral-12B-2409]
MiniCPM-V: A GPT-4V Level MLLM on Your Phone, Yao et al., arxiv 2024. [paper][code][VisCPM][RLHF-V][RLAIF-V]
VITA: Towards Open-Source Interactive Omni Multimodal LLM, Fu et al., arxiv 2024. [paper][code]
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation, Xie et al., arxiv 2024. [paper][code][Transfusion][VILA-U][LWM]
MIO: A Foundation Model on Multimodal Tokens, Wang et al., arxiv 2024. [paper]
[MiniCPM-V][moondream][MobileVLM][OmniFusion][Bunny][MiCo][Vitron][mPLUG-Owl][mPLUG-DocOwl][Ovis]
DALL-E: Zero-Shot Text-to-Image Generation, Ramesh et al., arxiv 2021. [paper][code]
DALL-E3: Improving Image Generation with Better Captions, Betker et al., OpenAI 2023. [paper][code][blog][Glyph-ByT5]
ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models, Zhang et al., ICCV 2023 Marr Prize. [paper][code][ControlNet_Plus_Plus][ControlNeXt][ControlAR]
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models, Mou et al., AAAI 2024. [paper][code]
AnyText: Multilingual Visual Text Generation And Editing, Tuo et al., arxiv 2023. [paper][code]
RPG: Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs, Yang et al., ICML 2024. [paper][code]
LAION-5B: An open large-scale dataset for training next generation image-text models, Schuhmann et al., NeurIPS 2022. [paper][code][blog][laion-coco]
DeepFloyd IF: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Saharia et al., arxiv 2022. [paper][code]
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Saharia et al., NeurIPS 2022. [paper][unofficial code]
Instruct-Imagen: Image Generation with Multi-modal Instruction, Hu et al., arxiv 2024. [paper][Imagen 3]
CogView: Mastering Text-to-Image Generation via Transformers, Ding et al., NeurIPS 2021. [paper][code][ImageReward]
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers, Ding et al., arxiv 2022. [paper][code]
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion, Zheng et al., ECCV 2024. [paper][code]
TextDiffuser: Diffusion Models as Text Painters, Chen et al., arxiv 2023. [paper][code]
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering, Chen et al., arxiv 2023. [paper][code]
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, Chen et al., arxiv 2023. [paper][code]
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models, Chen et al., arxiv 2024. [paper][code]
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation, Chen et al., arxiv 2024. [paper][code]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models, Ye et al., arxiv 2023. [paper][code][ID-Animator][InstantID]
Controllable Generation with Text-to-Image Diffusion Models: A Survey, Cao et al., arxiv 2024. [paper][code]
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation, Zhou et al., arxiv 2024. [paper][code][AutoStudio]
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding, Li et al., arxiv 2024. [paper][code][xDiT]
[Kolors][Kolors-Virtual-Try-On][EVLM: An Efficient Vision-Language Model for Visual Understanding]
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, Hu et al., arxiv 2023. [paper][code][Open-AnimateAnyone][Moore-AnimateAnyone][AnimateAnyone][UniAnimate]
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions, Tian et al., arxiv 2024. [paper][code][V-Express]
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation, Wei wt al., arxiv 2024. [paper][code]
DreaMoving: A Human Video Generation Framework based on Diffusion Models, Feng et al., arxiv 2023. [paper][code]
MagicAnimate:Temporally Consistent Human Image Animation using Diffusion Model, Xu et al., arxiv 2023. [paper][code][champ][MegActor]
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors, Xing et al., ECCV 2024. [paper][code]
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control, Guo et al., arxiv 2024. [paper][code][FasterLivePortrait][FollowYourEmoji]
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis, Liang et al., arxiv 2023. [paper][code]
Video Diffusion Models, Ho et al., arxiv 2022. [paper][video-diffusion-pytorch]
Make-A-Video: Text-to-Video Generation without Text-Video Data, Singer et al., arxiv 2022. [paper][make-a-video-pytorch]
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation, Wu et al., ICCV 2023. [paper][code]
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators, Khachatryan et al., ICCV 2023 Oral. [paper][code][StreamingT2V]
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers, Hong et al., ICLR 2023. [paper][code]
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer, Yang et al., arxiv 2024. [paper][code]
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos, Ma et al., AAAI 2024. [paper][code][Follow-Your-Pose v2][Follow-Your-Emoji]
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts, Ma et al., arxiv 2024. [paper][code]
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning, Guo et al., arxiv 2023. [paper][code][AnimateDiff-Lightning]
StableVideo: Text-driven Consistency-aware Diffusion Video Editing, Chai et al., ICCV 2023. [paper][code]
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models, Zhang et al., arxiv 2023. [paper][code]
TF-T2V: A Recipe for Scaling up Text-to-Video Generation with Text-free Videos, Wang et al., arxiv 2023. [paper][code]
Lumiere: A Space-Time Diffusion Model for Video Generation, Bar-Tal et al., arxiv 2024. [paper][lumiere-pytorch]
Sora: Creating video from text, OpenAI, 2024. [blog][Generative Models for Image and Long Video Synthesis][Generative Models of Images and Neural Networks][Open-Sora][VideoSys][Open-Sora-Plan][minisora][SoraWebui][MuseV][PhysDreamer][easyanimate]
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models, Liu et al., arxiv 2024. [paper][code]
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework, Yuan et al., arxiv 2024. [paper][code]
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution, Dehghani et al., NeurIPS 2024. [paper][unofficial code]
VideoPoet: A Large Language Model for Zero-Shot Video Generation, Kondratyuk et al., ICML 2024 Best Paper. [paper]
Latte: Latent Diffusion Transformer for Video Generation, Ma et al., arxiv 2024. [paper][code][LaVIT][LaVie][VBench][Vchitect-2.0][LiteGen]
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis, Menapace et al., arxiv 2024. [paper][articulated-animation]
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance, Feng et al., arxiv 2024. [paper][code]
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos, Hu et al., arxiv 2024. [paper][code]
Loong: Generating Minute-level Long Videos with Autoregressive Language Models, Wang et al., arxiv 2024. [paper]
Movie Gen: A Cast of Media Foundation Models, The Movie Gen team @ Meta, 2024. [blog][paper][unofficial code]
Deep Reinforcement Learning: Pong from Pixels, Andrej Karpathy, 2016. [blog][reinforcement-learning-an-introduction][easy-rl][deep-rl-course][wangshusen/DRL]
DQN: Playing Atari with Deep Reinforcement Learning, Mnih et al., arxiv 2013. [paper][code]
DQNNaturePaper: Human-level control through deep reinforcement learning, Mnih et al., Nature 2015. [paper][DQN-tensorflow][DQN_pytorch]
DDQN: Deep Reinforcement Learning with Double Q-learning, Hasselt et al., AAAI 2016. [paper][RL-Adventure][deep-q-learning][Deep-RL-Keras]
Rainbow: Combining Improvements in Deep Reinforcement Learning, Hesssel et al., AAAI 2018. [paper][Rainbow]
DDPG: Continuous control with deep reinforcement learning, Lillicrap et al., ICLR 2016. [paper][pytorch-ddpg]
PPO: Proximal Policy Optimization Algorithms, Schulman et al., arxiv 2017. [paper][code][trl ppo_trainer][PPO-PyTorch][implementation-matters][PPOxFamily]
Diffusion Models for Reinforcement Learning: A Survey, Zhu et al., arxiv 2023. [paper][code][diffusion_policy]
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations, Matthias Lehmann, arxiv 2024. [paper][code]
[tianshou][rlkit][pytorch-a2c-ppo-acktr-gail][Safe-Reinforcement-Learning-Baselines]
A Gentle Introduction to Graph Neural Networks, Sanchez-Lengeling et al., Distill 2021. [paper]
CS224W: Machine Learning with Graphs, Stanford. [link]
GCN: Semi-Supervised Classification with Graph Convolutional Networks, Kipf and Welling, ICLR 2017. [paper][code][pygcn]
GAE: Variational Graph Auto-Encoders, Kipf and Welling, arxiv 2016. [paper][code][gae-pytorch]
GAT: Graph Attention Networks, Veličković et al., ICLR 2018. [paper][code][pyGAT][pytorch-GAT]
GIN: How Powerful are Graph Neural Networks?, Xu et al., ICLR 2019. [paper][code]
Graphormer: Do Transformers Really Perform Bad for Graph Representation, Ying et al., NeurIPS 2021. [paper][code]
GraphGPT: Graph Instruction Tuning for Large Language Models, Tang et al., SIGIR 2024. [paper][code][Graph-Bert]
OpenGraph: Towards Open Graph Foundation Models, Xia et al., arxiv 2024. [paper][code][AnyGraph][openspg]
A Survey of Large Language Models for Graphs, Ren et al., KDD 2024. [paper][code]