Pen to Paper

Parallel Programming : What is the present scenario?
In this article Dr. Suhas Phadke briefly reflects on the trends in parallel processing and C-DAC’s role in the same

There is a strong consensus amongst computer professionals, that the greatest gains in price/performance can only be achieved through multiple processor parallel systems. Parallel computers are characterized by two or more processing elements and memory, tied together by some interconnection network. Abundance of relatively slow processors, working together to solve one problem, provides the necessary performance.

The trend in parallel computing is to move away from specialized traditional supercomputing platforms, such as Cray / SGI T3E, to cheaper and general purpose systems consisting of loosely coupled components built up from single or multiprocessor PCs or workstations. This approach has a number of advantages, including being able to build a platform for a given budget, which is suitable for a large class of applications and workloads.

The hardware technology and economic forces are right for an explosion of parallel processing into the market at all levels. But if parallel computing is so wonderful, why aren’t we doing it on a large scale. The main problem lies in the lack of software for parallel machines.

Parallel processing, or concurrent computing as it is sometimes termed, is not conceptually new. The jobs that can be broken into multiple tasks that in turn be handed out to individual workers for simultaneous execution, are most suitable for parallel machines.

What is different about parallel programming?

Software development is intrinsically difficult and time consuming for both sequential and parallel computing applications. However, designing and writing software for parallel computers is even more difficult because parallel programmers must keep in mind details of non-determinism, synchronization and scheduling, as well as traditional details of sequential programming.

It is much easier to program sequentially because humans tend to think sequentially rather than concurrently. Unfortunately sequential programming is incapable of directly making effective use of parallel computers. Parallelizing compilers do not exist in the practical sense, and even if they did, the greatest performance is often achieved by rethinking about the underlying algorithm. Tools, which can aid in “thinking in parallel” can offer greatest prospect for improvement.

What options do we have for parallel software development? Either we must discard decades of sequential software development and embark on a long journey to (re)write all software in parallel form, or device tools to convert past applications into a form that can exploit the powers of parallel computers.

In order to effectively exploit the power of parallel computers or a cluster of workstations, good programming environments and development tools are a must. This is particularly necessary for parallel processing and distributed computing to become the preferred programming model for a typical programmer, as opposed to being limited to a group of experts. The last two decades have seen a significant development of various kinds of programming environment, together with a plethora of associated programming aids including parallel debuggers and monitoring tools. Computer professionals are zeroing on a standard, which in turn will lead to portability of codes. Presently, only a small population of programmers has the knowledge to use parallel and distributed systems for executing large production codes. Parallel programming technology is still not popular with the average sequential programmers. These programmers lack the enthusiasm about moving into a different programming environment with increased difficulties, though they are aware of the potential performance gains.

Portability and scheduling

Another major concern for parallel programmers is of portability and scheduling. In the past the development of parallel program was architecture dependent. For example low level synchronization was done using locks in a shared memory architecture and via message passing in a distributed memory architecture. Thanks to the development of library function calls, called MPI (Message Passing Interface), it is now possible to address the problem of portability to a great extent. MPI is an architecture independent higher abstraction, which allows program designers to express their algorithms in a high level structure without having to worry about details like synchronization.

The second major problem is scheduling – the process of allocating tasks to physical processors, and specifying the order of execution of these tasks. How does a parallel programmer schedule tasks onto a particular parallel computer in a pattern that guarantees the shortest execution time? This problem is mathematically complex, often requiring exponential time to solve for the absolute best schedule. Spending more time on scheduling an application than running it, can defeat the purpose of parallel computing.

Thus parallel programming has all the problems associated with traditional programming. But the programmer must also be concerned with the architectural details and tuning through scheduling of the parallel tasks onto the multiple processors. Parallel programming also requires assistance at all levels – debuggers, performance analyzers and reusable components.

Implicit and explicit parallelism

There are two approaches for programming parallel computers, (1) implicit parallelization (2) explicit parallelization. Each has its merits and disadvantages. Implicit parallelism uses existing languages and conceals the underlying parallel computer from the programmer. Intelligent, high-level compilers are required to automatically translate the application into parallel form. Some research is being done in parallelizing compilers and parallel languages but their functionality is still very limited. Parallelizing compilers are only useful for applications that exhibit regular parallelism, such as computations in loops. For shared-memory multiprocessor systems, parallelizing compilers have proven to be relatively successful. However, for distributed memory machines they are largely unproven. Thus automatic parallelization is very limited in scope and only rarely provides adequate speedup.

In explicit parallelization, the programming language incorporates all the explicit parallel control statements in its syntax and the programmer must know about parallelism. Therefore, the explicit approach requires a clever programmer. Since it is easier to develop tools and techniques to help the programmer be clever than to develop a smart compiler, most of the progress has been made in this direction.

Some parallel languages came into existence, but never gained popularity because users are not willing to learn a completely new language for parallel processing. They would like to use the traditional languages (like C and Fortran). Low level communication libraries, like MPI and PVM have now become popular with the parallel software developers, because they provide an interface for C and Fortran. Programmers can now write efficient parallel programs using a traditional language of their choice and using MPI or PVM.

Interconnection network

Parallel computers require interprocessor communication to perform sufficiently well, so that multiple processors can execute an application more quickly than a single processor acting alone can execute that application. There are cases where 32 processors are slower than 16 processors working on the same problem. This is not because the problem is insufficiently parallel, but because the interprocessor communication is too high. Communication speeds have not grown in the same proportion as the processor speeds. However if an application has the right ratio for computation and communication, good performance gains can be obtained.

Parallel Programming on PARAM 10000

The National Param Supercomputer Facility (NPSF) located at Pune, is sufficiently equipped with the hardware, programming tools, optimized message passing libraries and necessary compilers for writing parallel application. C-DAC is now in the process of installing PARAM 10000 at 12 premier institutes within the country. The researchers at these institutes can now develop applications on these smaller configurations and later can run large problems at NPSF. In order to exploit the full potential of PARAM 10000, a user needs to overcome the inertia and start “thinking in parallel”. Both, the industry and academic institutions can derive benefits from parallel programming and solve their large problems.

Some parting thoughts

The above description points to the fact that programming parallel computers is relatively difficult and requires more expertise than programming uniprocessor machines. Parallel machines address the big problems of their time. Because they are expensive, they need a computationally intensive application to warrant their use. Fundamental problems in science and engineering, so called Grand Challenge Applications (GCAs), with broad economic and scientific impact, require parallel machines. Some typical examples of GCA’s include applications from the areas of Meteorology, Computational Fluid Dynamics (CFD), Chemistry, Biotechnology, Seismic Data Processing, Optimization etc.

Sequential programming evolved from architecture specific low-level languages. After the development of architecture independent languages, programmers are now not worried about portability. Extensions to these high-level languages were made, to make them more structured, leading to programs that are easier to develop, test and maintain. Parallel programming is, I believe, also following the same direction.

Most parallel programming problems can be solved by a clever programmer, but they require intimate knowledge of both the programming language and the machine hardware. Some form of automatic or semi automatic assistance is desirable for development purposes. The final goal is to have an architecture independent tool to “glue together” sequential code segments into synchronized, highly parallel, machine efficient parallel programs. This will guarantee the portability across a diversity of parallel computer architectures. A considerable effort has already gone into development of tools and programming environments and we are moving in the right direction. Till then, we must learn to enjoy parallel programming with the existing tools and message passing libraries.

Dr. Suhas Phadke is the Group Coordinator, Scientific & Engineering Computing Group. He has 15 years of Research and Industrial experience in the area of Seismic Data Processing and High Performance computing applications which includes experience with the ONGC, TOTAL- France, Western Geophysical, USA.