4 terms
Showing all terms starting with Q
A technique that reduces model size and speeds up inference by representing weights with lower-precision numbers (e.g., 4-bit instead of 32-bit).
In AI contexts, a question or instruction sent to a model or retrieval system to obtain a relevant result.
A model-free reinforcement learning algorithm that learns the value of taking each action in each state to find an optimal policy.
Quantised LoRA - a memory-efficient fine-tuning method combining 4-bit quantisation with LoRA adapters, enabling LLM fine-tuning on consumer GPUs.