Paper¶
Abstract
这里用来放读论文的笔记,写的不好还请见谅>_<|||
TODO
- 重构页面,对论文分类
- T-MAC + BitNet 系列
- CoT
- InstructGPT
- 量化系列(Atom、QuaRot)
- 投机解码
- BiS-KM: Enabling Any-Precision K-Means on FPGAs
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
- Mixed Precision Training
- FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding
- Softmax Acceleration with Adaptive Numeric Format for both Training and Inference
- Efficient Memory Management for Large Language Model Serving with PagedAttention
- VecPAC: A Vectorizable and Precision-Aware CGRA
- NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
- ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
- Improving Language Understanding by Generative Pre-Training
- Language Models are Unsupervised Multitask Learners
- Language Models are Few-Shot Learners
- APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis
- QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
- Learning Transferable Visual Models From Natural Language Supervision
- LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders