Overview
Superscalar processors implement a form of parallelism called instruction-level parallelism (ILP). By having multiple ALUs, FPUs, and other execution units, the processor can dispatch multiple instructions from a single thread in a single cycle.
Mechanism
The processor's hardware includes logic to dynamically identify which instructions can be executed simultaneously without violating data dependencies.
Comparison
While a simple pipelined processor executes one instruction per cycle (in the ideal case), a superscalar processor can achieve an IPC (Instructions Per Cycle) greater than one.