Five Inference Optimization Techniques to Double or Quadruple LLM Serving Throughput on the Same GPU — From Quantization to Speculative Decoding | DEV BAK - 기술블로그