High
Performance Computing Hardware Technologies from
C-DAC |
C-DAC’s
mandate on inception was to develop a supercomputer.
A little over a decade later, C-DAC has emerged as
one of the leading organisations in developing Hardware
Technologies. A glimpse into some of the pioneering
work undertaken by C-DAC in this area.
System
Area Networks
| Background |
|
The
term System Area Networks (SAN), coined by Gordan
Bell, refers to the high performance network
that interconnects Servers, Workstations and
Storage Devices within a Supercomputer. SAN
is a key element, a lifeline, of any High Performance
Supercomputer based on Parallel Processing Technology.
An efficient, robust, low latency, high bandwidth
SAN is extremely crucial for performance extraction
from any parallel machine.
It
is well established that the throughput of the
parallel supercomputer depends predominantly
upon two major components
-
Central Processing Unit (CPU) - Processing
Element
-
System Area Network (SAN) - An Interconnect
Fabric
A
high throughput achieved by using High Speed
CPUs (with large cache and secondary storage)
would be of no advantage without suitable High
Speed Interconnect Fabric-SAN. The requirement
of such network is very different from the conventional
LAN/WAN. The high throughput that depends on
the highspeed technology and low latency that
depends on the architecture and software model
poses a challenging problem.
So,
C-DAC has taken a major initiative in the development
of SAN technologies as part of its Third Mission.
This development of High-Speed low latency Networks
operating in ranges close to Gigabytes/Second
requires a highly advanced and intense R&D effort
at Architecture, Circuit Design, VLSI Design
and Implementation, porting of Active Messages
(AM) protocols and software levels.
The
design philosophy adopted has been to use the
standards as far as possible at various levels,
avoiding proprietary implementations, to take
advantage of third party components and modules.
|
| Evolution |
|
Client/Server
applications and a need to share peripherals
gave rise to Local Area Networks (LAN). Distributed
computing drove the advent of Wide Area Networks
(WAN). The desire to create high-performance,
reliable, scalable computer systems by clustering
low-cost off the shelf Servers/Workstations
thus sparked an entirely different paradigm:
the System Area Network, or SAN.
SANs
fall somewhere in between standard LANs that
offer good scalability and reusability across
different computer systems and custom Memory
Buses that offer low latency, high bandwidth,
and high reliability.
The
demand for high performance communication subsystems,
which are used to connect client and powerful
commercial servers and to build high-end servers
from clusters of SMPs, cannot be met by commodity
LANs. Shared media LANs such as Ethernet or
FDDI offer very high latency (100 - 1000s of
microseconds) and relatively low bandwidth (1-10
megabytes/second). The transition to Gigabit
Ethernet or switched LANs such as switched-Ethernet
or ATM alleviates this situation only partially.
These recent networks increase the bandwidth
only somewhat and their latencies continue to
be in the hundreds of microseconds range.
The
poor performance of LANs is aggravated by the
huge latency to access a LAN interface from
within a commodity workstation node. This latency
is a result of three parts-the software protocol
(TCP/IP) latency, the latency through the operating
system, and the hardware latency to access (e.g.
read data from) the SAN interface.
On
the contrary, Memory Buses deliver extremely
low latencies (tens of nanoseconds) and very
high bandwidth (100-2500 megabytes/second).
Memory Buses can be accessed from processors
in a few processor cycles because their high
reliability and highly trusted environment avoid
software intervention, their direct accessibility
(e.g., through cache misses) avoids operating
system intervention, and their proximity to
the processor (only a few processor cycles away)
avoids the overhead of I/O bridge and I/O bus
traversals. But, unlike LANs, often Memory Buses
are customized and have nonstandard interfaces,
and are hard to extend to hundreds of hosts
spread across a room.
SAN
combines the best of Memory Buses and LANs i.e.
the performance and reliability of a Memory
Bus so that one can avoid running TCP/IP, and
the scalability and standardized interfaces
of LANs, and that they can be reused across
several generations of machines and/ or can
be manufactured by third party vendors.
|
| PARAMNet |
|
C-DAC
has advented the PARAM OpenFrame Architecture
for scalable and flexible High Performance computing
and has built the new generation PARAM 10000
series of machines driven
by UltraSparc Nodes interconnected through C-DAC
developed system area network PARAMNet. The
PARAMNet has been designed taking into consideration
requirements of low latency, high bandwidth
and dynamically adaptive routing architecture.
The network is designed with built in capability
for performance scalability.
PARAMNet
offers a high speed, low latency packet switched
network based on wormhole routing. The network
has the ability to scale in bandwidth as well
as in distance and can be evolved into a high
speed powerful local area network.
The
three basic components of the state- of- the-
art PARAMNet are:
(i)
PARAMNet II SAN Switch
(ii)
PCI Bus Interface Adapter Cards with C-DAC’s
Communication Processor CCP-II
(iii)
Software.
|
| PARAMNET-II
Switch |
|
The
PARAMNet-II SAN switch is a high performance,
8-port, non-blocking, cascadable switch. Its
main features are:
-
Upto 8-ports. Each working at 1 Gbits/ sec.
Multi-port configurations realizable by cascading
of switches
-
Low latency (0.5 microsec), high throughput
(~110 Mbytes/s per port)
-
256 byte buffer (128 + 128 at the input and
output) per port
-
Uses low cost copper media for PHY interface
-
Management interface, enables easy managing
of multiple switches.
-
Group adaptive routing for high performance
even under heavy load conditions
-
Flow control including back pressure prevents
excessive packet retry conditions
The
switch consists of a fully non-blocking crossbar
block controlled by distributed schedulers.
Each of the schedulers get the input data from
a flow control block which handles the low level
hardware handshake between the end points. The
data from the flow control is handled in parallel
by the distributed schedulers. If there are
no port contentions at the output, then the
distributed schedulers give the minimum latency
per port. The routing protocol is proprietary,
and is aimed at reducing the overhead involved
in deciding the destination for a given packet.
The
output port arbiter handles the request in-sequence
if there is any contention for the output ports.
The arbiter handles the input request on a first-come-first-serve
basis. The output from the crossbar is handled
by the Tx portion of the flow control.
The
management interface for the switch uses a microcontroller
coupled to a CAN bus which allows connecting
multiple switches on a shared management bus.
The management software allows the network designer
to configure and monitor a given topology of
N PARAMNet hosts.
|
| PCI
Interface Adapter |
|
C-DAC
has developed PCI based 1Gigabit Link adapters
for interconnecting nodes through PARAMNet Switch.
The heart of these adapters is C-DAC’s Communication
Processor (CCP-II) one of the largest and most
complex chip ever designed in our country. C-DAC
has also parallely developed PCI Bus Drivers
for Solaris 2.7.
|
| C-DAC’S
Communication Processor (CCP-II) |
|
The
C-DAC’s Communication Processor (CCP-II) is
envisaged as an autonomous device capable of
handling the message segmentation, re-assembly
and guaranteed delivery of messages which are
normally done in software with minimum intervention
of the host processor.
It
provides a highly functional interface between
PCI based node and 1Gigabit Link based Communication
Network. The CCP-II has been designed to reduce
software latency (less than 10 microseconds
at application level) and increase data throughput
which are main parameters for good and effective
communication. It avoids unnecessary copying
of data either by directly delivering the message
into destination buffer or by copying it to
page aligned temporary area from where the kernel
can remap it by reclosing the number of copies.
It allows full utilisation of 1 Gigabit Link
network throughput by packetising the messages
and delivering them without any requirement
of reordering at the destination. The transmitter
is capable of multiple messages on to the link
while the receiver is capable of receiving multiple
messages simultaneously.
The
CCP-II provides a flexible interface to system
programmers as it does not put any limit on
the packet or message sizes. It allows the programmer
to define these parameters based on the expected
network message pattern and traffic.
Operating
at 33 MHz CCP-II ASIC is based on 0.8 micron,
two metal later technology implemented in 317K
Gate Array, embedded in 304 PIN MQUAD Package.
The chip has been designed using High level
Design Entry VHDL and design has been synthesised
using Synopsis Tools.
It
finds applications in areas of multimedia, cluster
computing and massively parallel computing.
|
| Interconnection
Topology |
|
PARAM
OpenFrame Topology is based on multistage interconnection
CLOS network. This switch interconnection topology
has been chosen to provide scalability, high
bandwidth and flexibility. For supporting high
bandwidth, it is desirable that the bandwidth
of the topology should scale linearly. Thus
as the number of nodes are added the bandwidth
of the network should grow in proportion.
The
communication protocol supporting synchronous
and asynchronous formats is built upon very
thin layer of software and supports three types
of messages based on message length. Messages
are sent in packets and priority can be assigned.
Messages less than a packet length can be assigned
high priority and transmitted immediately. Messages
up to few KB in length are transmitted with
packetisation and message multiplexing.
|
| C-DAC’s
KSHIPRA: Software Communication Model for SAN |
|
KSHIPRA,
Communication substrate designed to support
low latency and high bandwidth is the key to
the high level of aggregate system performance
and scalability of C-DAC HPCC software. KSHIPRA
caters to both parallel and distributed client
server programming models.
The
heart of KSHIPRA is the protected user level
primitives - Active Messages (AM). It conforms
to Active Messages II specifications given by
University of California, Berkeley (UCB). Its
implementation is an augmentation of the AM
II software provided by UCB.
KSHIPRA
Fast Sockets exports BSD sockets interface providing
source code compatibility for legacy distributed
/ client server applications. Kernel Active
Messages provide a high performance kernel to
kernel, and kernel to user communication allowing
kernel modules to leverage on AM technology.
One such example is the Data Link Provider Interface
(DLPI) driver which is layered over Kernel Active
Messages. KSHIPRA thus provides binary compatibility
to legacy distributed/client server applications.
It also provides the Abstract Device Interface
for layering Message Passing Interface (MPI)
over AM.
KSHIPRA
exports the following Application Programming
Interfaces:
-
Active Messages
- Fast
Sockets
-
DLPI
-
MPI
|
| Future
Road Map |
|
C-DAC
is already ahead on its work on the development
of the next phase of SAN of >4Gigabit/sec using
CCP-III, VIA Interface, NICs compatible with
66MHz 64 bit PCI Bus standard.
|
| Virtual
Interface Architecture (VIA) |
|
Virtual
Interface Architecture is an open industry specification
that defines an interface for low latency, high
bandwidth communication between servers, workstations
and storage devices within a cluster.
Intel
Corporation, Compaq Computer Corporation, and
Mircrosoft Corporation jointly authored the
VI architecture specification. VIA significantly
reduces the software overheads between a high
performance CPU / memory subsystem and a high
performance network. It achieves low-latency,
high-bandwidth communication and data exchange
between processes running on two nodes within
a computing cluster, with minimal CPU usage.
|
| INFINIBAND
Emerging as the Standard for Next Generation Interconnect: |
|
While
Gigabit/Sec. media exists to transfer the data,
the current bottleneck is interface between
CPU and NIC. This is about to change with INFINIBAND.
It is set to emerge as a High Speed PCI Bus
standard to create High Speed System Clusters
and Storage Area Networks. It provides scalable
solutions to ever increasing bandwidth demand
of server systems. Version 1.0 of INFINIBAND
Architecture Specifications has been recently
released. C-DAC will ensure that Next Generation
SAN is compliant with emerging INFINIBAND standards.
S.
P. Dixit
Programme Coordinator,
Hardware Technology Development Group (HTDG)
Click
here to send an eMail
|
MULTILINGUAL
HARDWARE PRODUCTS from C-DAC GIST
|
|
|
The
GIST 9000 ASIC chip was the first commercial
ASIC designed at C-DAC in 1989 followed by GIST
9001 and GIST 9002, fabricated respectively
at VTI, SAMSUNG & LSI foundry. These chips are
used in PC plug-in GIST card and stand alone
GIST terminal. GIST card & terminals are used
in DOS and UNIX environment for multilingual
word-processing and data processing applications.
During the last ten years over 100,000 chips
have been sold by way of GIST card and terminals
by C-DAC and its associated GIST technology
partners. Many applications have been developed
around the GIST card & terminals such as land
record, milk & sugar cooperatives information
system, accounting, electoral roll data base,
radio-text, multilingual dictionary building
etc.
LIPS
(Language Independent Program Subtitle) technology
is used for closed caption subtitle system.
PC plug in encoder & standalone decoder are
hardware used for encoding and decoding closed
caption subtitles of video programme. These
systems are deployed in country wide Doordrshan
TV network for broadcast and reception of closed
caption video programmes. LIPS creation station
is used in making video subtitles of video programme
on VHS, U-Matic, Betacam formats. Over 40 installations
in private & Govt. video post production studios
in India are using this systems, over 100 translators
are engaged in creating movie subtitles and
over 5000 movies have been subtitled in various
Indian and Arabic languages for Indian and overseas
viewers. Some of the major Broadcast users are
Doordrashan and ZEE TV channels.
Video
CD creation station is used for creating master
Video CD titles from various video formats like
VHS, S-VHS, U-Matic, Betacam. The main component
in building the Video CD creation station is
the MPEG-1 encoder. C-DAC, the only organization
in India, developed MPEG-1 encoder indigenously
andnow MPEG-II. Video CD titles have been prepared
for varied usages starting with home rituals
& festivals, multimedia presentation titles
etc.
S.S.
Pujari
Programme Coordinator,
GIST e-mail : 
|
Real
Time Systems Hardware Activities at C-DAC
|
|
|
The
major hardware development activities at the
Real Time Systems Development Group in C-DAC
Bangalore Centre include :
-
Development of CompactPCI Bus based Industrial
Computer System supported by Department of Information Technology.
-
Development of Universal Serial Bus (USB)
based Remote Terminal Unit (RTU) - Internally
supported Project.
|
| CompactPCI
based Industrial Computer System |
|
CompactPCI
(CPCI) is emerging as an Industrial Standard
Architecture for Embedded Real-time controllers,
Instrumentation, Telecom and Military Applications.
The PCI electrical specification 2.0 is implemented
with 3U/6U Eurocard form factor.
The
specification was evolved by a consortium of
over 350 companies known as PCI Industrial Computer
Manufacturers Group (PICMG) and C-DAC has recently
become a member of PICMG.
It’s
easy peripheral expansion, large software support,
low-cost, ease of maintenance and hot-swap facilities,
are best suited for Industrial Computing Environment.
CPCI uses rear-panel connectors; important for
mission critical applications that requires
the rapid replacement of boards. Unlike PCI
bus motherboards, CPCI can accommodate upto
eight boards, with additional expansion easily
supported through the use of readily available
bridge chips. The CPCI is independent of the
processor, with a backplane with no active components.
The
following modules are being developed adhering
to CPCI specification, leading to rugged Industrial
Computer System:
-
CompactPCI Motherboard
-
Processor Module
-
Analog Input Module
-
Analog Output Module
-
Digital Input Module
-
Digital Output Module
-
DSP Module
-
GPS Module
-
Counter Module
-
Field bus Module
-
Communication Controller
The
sub-systems developed are designed to be configured
to engineer the following products, which would
find application in process and manufacturing
industry.
-
Process Computers
-
SCADA
-
High Speed Data Acquisition System
-
Multi-loop process Controllers
-
High end RTUs
|
| USB
based Remote Terminal Unit |
|
The
Remote Terminal Units are basically remote Data
Acquisition Systems, collecting real-time data
from various process and send the data to the
master station for further processing.
C-DAC’s
earlier COPS-90 RTU was based on Transputers
and the I/O Modules were interconnected using
proprietary backplane bus.
COPS-90
RTUs have been deployed in 15 Electrical substations
/ generating stations at Punjab State Electricity
Board. These RTUs have been functioning at PSEB
for the last four years. COPS-90 RTU has also
been installed at Bhilai Steel Plant and functioning
as a high speed Data Acquisition System for
skid detection in BBM Mill.
Since
the transputer chips are now obsolete, C-DAC
initiated an internal R&D project to develop
a Pentium based RTU with Universal Serial Bus
backplane. The development is in advanced stage.
The USB based RTUs would have state-of-the-art
features such as plug-and-play, hot swappability
and web connectivity.
R.
Pitchiah,
Group Co-ordinator,
Real Time Systems, C-DAC Bangalore
Send
an eMail
|
|