Inference

Overview

Inference is the 'production' phase of AI. While training is about learning, inference is about applying that learning to solve real-world problems.

Optimization

Inference needs to be fast and efficient. Techniques like quantization and pruning are used to make models run faster during inference on devices like smartphones or edge nodes.

Cost

For large-scale applications, the cost of running inference (compute power) can be significant, leading to the development of specialized inference chips.

Overview

Optimization

Cost

Related Terms