|
The National PARAM Supercomputing
Facility (NPSF) of C-DAC can be utilized by researchers
for scientific
and engineering applications. A number of prestigious
organisations have been using the NPSF facility. Some of
the key research undertaken by them and typical results
obtained, are reproduced here.
Coupled Cluster Calculations
Scientists of Indian Institute
of Astrophysics, Bangalore have developed a FORTRAN code
to perform large-scale Coupled Cluster (CC) calculations
on heavy atoms in order to investigate effects of parity
non-conservation (PNC) in atomic systems. This code has
been tested on smaller systems and heavy systems with small
basis sets. In order to reach the required accuracy, heavy
systems have to be evaluated with a large basis set; this
is possible only on a supercomputer. The parallel version
of this code was made to use the shared memory on PARAM
machine.
Scaling
A couple of test runs using
a small system (Na+) and a small basis set (Nc = 4 core
orbital, Nv = 21 virtual orbital) were performed in order
to test the scaling of the execution time with the number
of processors. The results are shown in Table 1.
Table 1: Number of
processors Vs Time
|
No. of processors
|
Time in seconds
|
|
4
|
259
|
|
8
|
142
|
|
16
|
68.8
|
|
32
|
37.5
|
The nodes, which are used
for the experiments were not in dedicated mode and other
processes may have influenced the timings. However, an excellent
scaling behavior is visible. For larger basis sets, this
is likely to improve even more, since the size of data to
be broadcasted scales with (Nv * Nc)2 whereas the number
of operations scales with (Nv * Nc)4 so that for large systems
the communication becomes negligible.

Runs using Linear CCSD (LCCSD) approximation
Three systems have been
evaluated in LCCSD approximation. The calculations have
been done on a moderate number of processors (15 - 30) within
a few hours each. However, the memory requirements are scaling
with (Nv * Nc)4 due to the number of Coulomb integrals which
have to be stored. The nonlinear CCSD calculations are much
more time consuming than the LCCSD approximation. It is
therefore essential to use a maximum number of processors
on each node to reduce the execution time. To achieve this
a shared memory version of the CCSD code has been developed.
Nonlinear run
For the first time a nonlinear
calculation for the Tl3 system has been performed. It was
implemented on 28 nodes in dedicated mode and run the shared
memory CCSD on 112 processors. This computation proves that
the MPI runs stable on many nodes and for a long time and
that the shared memory approach was suitable to make the
code run on 4 processors per node despite the large memory
requirements. The run also showed that further optimization
of the nonlinear part will be required to reduce the execution
time. An estimated speed up of 30% can be achieved by optimizing
the load sharing on the different processes, another major
speed up will be achieved by storing the 3j and 6j symbols
in RAM instead of evaluating them on the fly. With the estimated
time and memory usage, the other systems Ca+, Ba2+ and Tl+
were also tested. The results are summarized below with
the appropriate time taken on PARAM
10000.
Table 2: Results
of linear and non-linear systems
|
System
|
#Basis
|
# of
unknowns
|
# nodes
|
Time
(linear)
|
Time
(non-linear)
|
|
Ca2+
|
7 core,
71 virtual
|
42705
|
1
|
54 min
|
7.7 hr
|
|
Ba2+
|
13 core,
58 virtual
|
126653
|
14
|
15 min
|
20.9 hr
|
|
Ba2+
|
13 core,
66 virtual
|
155333
|
15
|
22 min
|
37.4 hr
|
|
Ba2+
|
16 core,
58 virtual
|
183462
|
5
|
66 min
|
-----
|
|
Ba2+
|
17 core,
58 virtual
|
201197
|
2
|
156 min
|
-----
|
|
Tl+
|
13 core,
60 virtual
|
147917
|
15
|
31 min
|
35.7 hr
|
|
Tl+
|
18 core,
60 virtual
|
278368
|
15
|
1 hr
|
60 hr*
|
|
Tl+
|
22 core,
60 virtual
|
381390
|
4
|
8.9 hr
|
-----
|
* Only 2 nonlinear iterations have
been completed for this system

Multiple sequence alignment

In collaboration with C-DAC,
Scientists from Anna University, Chennai have developed
a parallel genetic algorithm code on PARAM 10000 for multiple
sequence alignment problem. The algorithm searches for an
alignment among the independent isolated evolving populations
by optimizing weighted sums-of- pairs an objective function,
which measures the alignment quality. Using isolated sub
- populations of alignments in a quasi - evolutionary manner,
this approach gradually improves the fitness of the sub
- populations as measured by the objective function. This
approach performs better than the sequential approach and
an alternative method CLUSTAL W.
Results
To assess the efficiency
of optimization by parallel genetic algorithm on the weighted
sum - of - pairs function, 8 test cases were designed. They
are based on sequences with known three - dimensional structures,
for which a structural alignment is available. All of the
test cases were extracted from the 3D_ali Release 2.0. We
chose test cases of varying length (52 - 248 residues) and
various numbers of sequences (6-37). The results of PGA
are compared with the best - known results of other algorithms
for multiple alignment test cases. PGA was executed 110
times per test cases with varying parameters. Table 3 presents
the best results for all the algorithms. Figure 1 shows
the behavior in the context of all subpopulations, that
is, it presents the convergence behavior of the best individuals
in each of the parallel evolving subpopulations. It indicates
the importance of migration to avoid premature stagnation
by implementing new genetic material into a stagnating subpopulation.
Furthermore, the plot points out the "stabilizing" effect
of migration as expressed in the limited variation among
the best subpopulations.
Table 3: Comparison
of PGA with other methods
|
Test
Case
|
Nseq
|
Length
|
Clustal
W
Score
|
SGA
Score
|
PGA
Score
|
CPU
time
(secs)
|
N.G.
|
|
ac port
|
14
|
248
|
18239760
|
18052396
|
17905996
|
4563
|
485
|
|
Gcr
|
8
|
52
|
1201472
|
1187558
|
1186959
|
93
|
48
|
|
Globin A
|
15
|
183
|
10898878
|
10857787
|
10832517
|
2176
|
320
|
|
Globin B
|
17
|
183
|
14050372
|
14030690
|
13930367
|
2799
|
360
|
|
Igb
|
37
|
194
|
44756936
|
43984368
|
43306467
|
62744
|
545
|
|
Lzm
|
6
|
213
|
1724257
|
1694431
|
1677190
|
262
|
120
|
|
Sbt
|
7
|
331
|
2357581
|
2349134
|
2320023
|
670
|
160
|
|
s prot
|
15
|
229
|
21284220
|
21005754
|
20915316
|
6959
|
660
|
Test case: generic name
of the test case, taken from 3d_ali (ac prot: acid proteases,
gcr: crystallins, globin A and globin B: globins/phycocyanins/collicins,
igb: immunoglobulin fold, lzm: lyzozymes/lactalbumin, sbt:
subtilisin, s prot: serine protease fold). Nseq: number
of sequences in the alignment. Length: length of the reference
alignment. Score: score measured with weighted sum of pairs
objective function. CPU time: cpu time in seconds using
a Sun Ultra 450 workstation. N.G.: number of generations
needed by PGA to find the solution.

Catalyst Modeling
Scientists from National
Chemical Laboratory, Pune have used GAMESS for designing
reforming (worldwide capacity = 8.7 m bbl/day) catalysts
by accurate quantum chemical calculations needing high -
end computing power of PARAM 10000 and for identifying the
active sites in zeolite so that catalysts with higher conversion
and selectively are possible. GAMESS is a program for ab
initio quantum chemistry. Briefly, GAMESS can compute wave
functions ranging from RHF, ROHF, UHF, GVB, and MCSCF, with
CI and MP2 energy corrections available for some of these.
Analytic gradients are available for these SCF functions,
for automatic geometry optimization, transition state searches,
or reaction path following. Computation of the energy Hessian
permits prediction of vibration frequencies. A variety of
molecular properties, ranging from simple dipole moments
to frequency dependent hyper polarizabilities may be computed.
Many basis sets are stored internally, and together with
effective core potentials, all elements up to Radon may
be included in molecules. Many of the computational functions
can be performed using direct techniques, or in parallel
on appropriate hardware.
GAMESS BENCHMARK
GAMESS is a public domain
package and very widely used by Quantum Chemists. The parallel
version of GAMESS has been ported and tested on PARAM 10000
for various molecular systems. The Table below gives the
wall clock timings for this package:
|
S.No.
|
System
|
Basis set
|
No. of
Contractions
|
No. of
Processors
|
Wallclock
time (secs)
|
|
1
|
FM-8H2O
|
6-31G**
|
260
|
6
|
500
|
|
2
|
FM-12H2O
|
6-31G**
|
360
|
7
|
1099
|
|
3
|
FM-16H2O
|
6-31G**
|
460
|
13
|
1584
|
|
4
|
T-6H2O
|
6-31G**
|
315
|
27
|
376
|
Pt-LTY system
The catalysis group, at
National Chemical Laboratory, has performed Hatree-Fock
calculations on Ptn (where n=1 to 12) cluster models using
parallel version of GAMESS on PARAM. It was observed that
3-d Ptn clusters are more stable than 2-d Ptn clusters for
fixed 'n' values. The 'binding energy of the cluster per
Pt atom' for the Ptn cluster models showed that Pt5 and
Pt6 is the most stable among the small clusters. Hence,
Pt atom and Pt5 cluster deposited over zeolite LTY catalyst
were modeled in this study.
LTY has hexagonal prisms,
sodalite cages and super cages. There are three types of
non-framework cationic sites, namely sites I, II and III.
In a unit cell of LTY, [MxSi192-xAlxO384], there are 96
possible locations for non-framework cations. The 96 locations
are distributed as: 16 sites I, 32 sites II and 48 sites
III (refer to the Figure). We have represented the active
site in LTY by a 6-m ring cluster, which represents two
non-framework cation sites namely I and II. Site I is inside
the hexagonal prism, whereas site II is in the window connecting
the sodalite and super cages. The 6-m ring cluster represents
such a window.
In the zeolite cluster
models (with Si/Al=2, the binding energy of alkali metals
decreases with increasing size of the alkali metal from
Li to K. However, Rb and Cs are strongly bonded than Li.
The net charge on various metals shows that smaller cations
prefer MI site, whereas larger cations prefer MII (refer
Table 4).
When a single Pt5 cluster
deposited over LTY cluster model, the binding energy is
correlating with binding energy of alkali metal to zeolite.
The electron density on Pt and Pt5 increases as the extra-framework
cation varies from H to Rb, indicating the catalytic influence.

Table 4: The electronic properties of the cluster model
[MIMIIi4Al2O18H12], where MI is the alkali metal present
in the cationic site I and MII is alkali metal present in
the site II.
|
Cluster
|
Nseq
|
Total
Energy
(a.u.)
|
Binding
Energy#
(Kcal/mol)
|
Net Electron
Charge on
|
|
MI
|
MII
|
MI
|
MII
|
|
[MIMIISi4Al2O18H12]
|
H
|
H
|
-309.3972
|
-510.8953
|
0.64
|
0.74
|
|
[MIMIISi4Al2O18H12]
|
Li
|
Li
|
-309.2468
|
-369.4493
|
0.66
|
0.66
|
|
[MIMIISi4Al2O18H12]
|
Na
|
Na
|
-309.1949
|
-336.9668
|
0.85
|
0.75
|
|
[MIMIISi4Al2O18H12]
|
K
|
K
|
-309.1270
|
-294.4704
|
0.92
|
0.88
|
|
[MIMIISi4Al2O18H12]
|
H
|
Rb
|
-309.3449
|
-470.9649
|
0.62
|
0.94
|
|
[MIMIISi4Al2O18H12]
|
H
|
Cs
|
-309.3155
|
-453.4407
|
0.62
|
0.93
|
|
[MIMIISi4Al2O18H12]
|
Na
|
Li
|
-309.2107
|
-346.8556
|
0.82
|
0.66
|
|
[MIMIISi4Al2O18H12]
|
Na
|
Na
|
-309.1949
|
-336.9668
|
0.85
|
0.88
|
|
[MIMIISi4Al2O18H12]
|
Na
|
K
|
-309.1622
|
-316.5010
|
0.81
|
0.89
|
|
[MIMIISi4Al2O18H12]
|
Na
|
Rb
|
-309.1854
|
-331.0211
|
0.82
|
0.93
|
|
[MIMIISi4Al2O18H12]
|
Na
|
Cs
|
-309.1537
|
-311.1811
|
0.81
|
0.94
|
#B.E.= T.E. [MIMIISi4Al2O18H12]-
{T.E. [Si4Al2O18H12]2- + T.E.[MIMII]}


|