Vector processing in Computer Architecture

In this world of the 20th century, we need to work on large amounts of data as quickly as possible. For e.g., the weather department has to work on a large amount of collected data in the form of integer or floating-point arrays in order to accurately predict the weather of the coming week or month. But a general personal computer cannot be used for this because a normal computer would use loops in order to traverse the array. This would cause overhead to the processor due to the large data array sizes.

Hence comes VECTOR PROCESSING to the rescue. Vector Processing operates on the entire array in just one operation. Each element is operated on in parallel. But in order to do Vector Processing, the operations being performed on each element of the array has to be independent of other operations being performed in parallel. For e.g., if we have two arrays of size 100 and we want to add the elements of the two arrays of the same index and store the result at the same index in a third array then using general approach, a general computer would use a loop to fetch the element at ith position of the two arrays and then add them together and store the result at the ith position of the third array, increase i by 1 and move on to the next iteration. But a vector processor would fetch all 100 elements from both arrays at once and add them to store them in a third array all in the same operation.

Fetching and operating on multiple data in one instruction is called Simple Instruction Multiple Data (SIMD) or Vector Instructions. The data for the vector instruction are stored inside vector registers. VECTOR REGISTERS are able to store multiple data elements at once. These data elements are given the term vector operand.

The computers in which vector processing was implemented later evolved into the SUPERCOMPUTERS. Supercomputers work with billions of data and even more calculations per second. This is done with the help of vector processing implemented by a method called Pipelining or Vector Pipelining. The pipelined vector processors can be classified into two sorts dependent on where the operand is being fetched from for vector handling. The two design characterizations are Memory-to-Memory and Register-to-Register.

In Memory-to-Memory vector processor the operands for vector instruction, the intermediary result and the final output all of these are recovered from the primary memory. TI-ASC, CDC STAR-100, and Cyber-205 use memory-to-memory design for vector instructions.

In the Register-to-Register vector processor the source operands for instruction, the intermediary result, and the final output all are recovered from vector or scalar registers. Cray-1 and Fujitsu VP-200 use register-to-register design for vector instructions.

In vector processing, we have two overheads called arrangement time and flushing time. At the point when the vector processing is pipelined, the time needed to course the vector operands to the functional unit is called Set up time. Flushing time is the time that a vector instruction takes directly from its decoding until its first output is out from the pipeline.

The vector length additionally influences the proficiency of processing as the more extended vector length would cause overhead of partitioning the long vector for processing.

For getting the better performance the optimized object code must be created to use pipeline assets to its greatest.

1. Improving the vector instruction – We can improve the vector instructions by decreasing the memory access, and augment the use of resources.
2. Incorporate the scalar instruction – The scalar instruction of a similar sort must be coordinated as a batch as it will diminish the overhead of reconfiguring the pipeline over and over.
3. Algorithm – Pick the algorithm that would execute faster and quicker for vector pipelined processing.
4. Vectorizing Compiler – A vectorizing compiler must regenerate the parallelism by utilizing the high-level programming language.