My understanding is they are plenty pipelined, though the GPU is working on a more predictable workload so the order is more likely to be rewritten by the compiler than by the silicon -- that is, the CPU tries as hard as it can to maximize single threaded performance for branchy workload and "wastes" transistors and power on that, the GPU expects branches and memory access to be more predictable and spends the transistors and power it saves to add more cores.
Yes GPUs process a (one) computational task on a vast array of data in parallel. But it cannot process two independent tasks concurrently (except, perhaps, by reducing compute power for each task).
My understanding is they are plenty pipelined, though the GPU is working on a more predictable workload so the order is more likely to be rewritten by the compiler than by the silicon -- that is, the CPU tries as hard as it can to maximize single threaded performance for branchy workload and "wastes" transistors and power on that, the GPU expects branches and memory access to be more predictable and spends the transistors and power it saves to add more cores.
GPUs do scale because they are parallel processors. Software tools like CUDA and ROCm are very specifically designed for parallel compute on GPU.
Yes GPUs process a (one) computational task on a vast array of data in parallel. But it cannot process two independent tasks concurrently (except, perhaps, by reducing compute power for each task).