First Connect HomeThinking AloudHardware Technologies from C-DACShowcaseCheck outC-DAC in FocusSnapShotIn ProfileGuest ColumnPen to PaperFirst PersonIn the Family

High Performance Computing Hardware Technologies from C-DAC

C-DAC’s mandate on inception was to develop a supercomputer. A little over a decade later, C-DAC has emerged as one of the leading organisations in developing Hardware Technologies. A glimpse into some of the pioneering work undertaken by C-DAC in this area.

System Area Networks

Background

The term System Area Networks (SAN), coined by Gordan Bell, refers to the high performance network that interconnects Servers, Workstations and Storage Devices within a Supercomputer. SAN is a key element, a lifeline, of any High Performance Supercomputer based on Parallel Processing Technology. An efficient, robust, low latency, high bandwidth SAN is extremely crucial for performance extraction from any parallel machine.

It is well established that the throughput of the parallel supercomputer depends predominantly upon two major components

  • Central Processing Unit (CPU) - Processing Element
  • System Area Network (SAN) - An Interconnect Fabric

A high throughput achieved by using High Speed CPUs (with large cache and secondary storage) would be of no advantage without suitable High Speed Interconnect Fabric-SAN. The requirement of such network is very different from the conventional LAN/WAN. The high throughput that depends on the highspeed technology and low latency that depends on the architecture and software model poses a challenging problem.

So, C-DAC has taken a major initiative in the development of SAN technologies as part of its Third Mission. This development of High-Speed low latency Networks operating in ranges close to Gigabytes/Second requires a highly advanced and intense R&D effort at Architecture, Circuit Design, VLSI Design and Implementation, porting of Active Messages (AM) protocols and software levels.

The design philosophy adopted has been to use the standards as far as possible at various levels, avoiding proprietary implementations, to take advantage of third party components and modules.

Evolution

Client/Server applications and a need to share peripherals gave rise to Local Area Networks (LAN). Distributed computing drove the advent of Wide Area Networks (WAN). The desire to create high-performance, reliable, scalable computer systems by clustering low-cost off the shelf Servers/Workstations thus sparked an entirely different paradigm: the System Area Network, or SAN.

SANs fall somewhere in between standard LANs that offer good scalability and reusability across different computer systems and custom Memory Buses that offer low latency, high bandwidth, and high reliability.

The demand for high performance communication subsystems, which are used to connect client and powerful commercial servers and to build high-end servers from clusters of SMPs, cannot be met by commodity LANs. Shared media LANs such as Ethernet or FDDI offer very high latency (100 - 1000s of microseconds) and relatively low bandwidth (1-10 megabytes/second). The transition to Gigabit Ethernet or switched LANs such as switched-Ethernet or ATM alleviates this situation only partially. These recent networks increase the bandwidth only somewhat and their latencies continue to be in the hundreds of microseconds range.

The poor performance of LANs is aggravated by the huge latency to access a LAN interface from within a commodity workstation node. This latency is a result of three parts-the software protocol (TCP/IP) latency, the latency through the operating system, and the hardware latency to access (e.g. read data from) the SAN interface.

On the contrary, Memory Buses deliver extremely low latencies (tens of nanoseconds) and very high bandwidth (100-2500 megabytes/second). Memory Buses can be accessed from processors in a few processor cycles because their high reliability and highly trusted environment avoid software intervention, their direct accessibility (e.g., through cache misses) avoids operating system intervention, and their proximity to the processor (only a few processor cycles away) avoids the overhead of I/O bridge and I/O bus traversals. But, unlike LANs, often Memory Buses are customized and have nonstandard interfaces, and are hard to extend to hundreds of hosts spread across a room.

SAN combines the best of Memory Buses and LANs i.e. the performance and reliability of a Memory Bus so that one can avoid running TCP/IP, and the scalability and standardized interfaces of LANs, and that they can be reused across several generations of machines and/ or can be manufactured by third party vendors.

PARAMNet

C-DAC has advented the PARAM OpenFrame Architecture for scalable and flexible High Performance computing and has built the new generation PARAM 10000 series of machines driven by UltraSparc Nodes interconnected through C-DAC developed system area network PARAMNet. The PARAMNet has been designed taking into consideration requirements of low latency, high bandwidth and dynamically adaptive routing architecture. The network is designed with built in capability for performance scalability.

PARAMNet offers a high speed, low latency packet switched network based on wormhole routing. The network has the ability to scale in bandwidth as well as in distance and can be evolved into a high speed powerful local area network.

The three basic components of the state- of- the- art PARAMNet are:

(i) PARAMNet II SAN Switch

(ii) PCI Bus Interface Adapter Cards with C-DAC’s Communication Processor CCP-II

(iii) Software.

 

PARAMNET-II Switch

The PARAMNet-II SAN switch is a high performance, 8-port, non-blocking, cascadable switch. Its main features are:

  • Upto 8-ports. Each working at 1 Gbits/ sec. Multi-port configurations realizable by cascading of switches
  • Low latency (0.5 microsec), high throughput (~110 Mbytes/s per port)
  • 256 byte buffer (128 + 128 at the input and output) per port
  • Uses low cost copper media for PHY interface
  • Management interface, enables easy managing of multiple switches.
  • Group adaptive routing for high performance even under heavy load conditions
  • Flow control including back pressure prevents excessive packet retry conditions

The switch consists of a fully non-blocking crossbar block controlled by distributed schedulers. Each of the schedulers get the input data from a flow control block which handles the low level hardware handshake between the end points. The data from the flow control is handled in parallel by the distributed schedulers. If there are no port contentions at the output, then the distributed schedulers give the minimum latency per port. The routing protocol is proprietary, and is aimed at reducing the overhead involved in deciding the destination for a given packet.

The output port arbiter handles the request in-sequence if there is any contention for the output ports. The arbiter handles the input request on a first-come-first-serve basis. The output from the crossbar is handled by the Tx portion of the flow control.

The management interface for the switch uses a microcontroller coupled to a CAN bus which allows connecting multiple switches on a shared management bus. The management software allows the network designer to configure and monitor a given topology of N PARAMNet hosts.

PCI Interface Adapter

C-DAC has developed PCI based 1Gigabit Link adapters for interconnecting nodes through PARAMNet Switch. The heart of these adapters is C-DAC’s Communication Processor (CCP-II) one of the largest and most complex chip ever designed in our country. C-DAC has also parallely developed PCI Bus Drivers for Solaris 2.7.

C-DAC’S Communication Processor (CCP-II)

The C-DAC’s Communication Processor (CCP-II) is envisaged as an autonomous device capable of handling the message segmentation, re-assembly and guaranteed delivery of messages which are normally done in software with minimum intervention of the host processor.

It provides a highly functional interface between PCI based node and 1Gigabit Link based Communication Network. The CCP-II has been designed to reduce software latency (less than 10 microseconds at application level) and increase data throughput which are main parameters for good and effective communication. It avoids unnecessary copying of data either by directly delivering the message into destination buffer or by copying it to page aligned temporary area from where the kernel can remap it by reclosing the number of copies. It allows full utilisation of 1 Gigabit Link network throughput by packetising the messages and delivering them without any requirement of reordering at the destination. The transmitter is capable of multiple messages on to the link while the receiver is capable of receiving multiple messages simultaneously.

The CCP-II provides a flexible interface to system programmers as it does not put any limit on the packet or message sizes. It allows the programmer to define these parameters based on the expected network message pattern and traffic.

Operating at 33 MHz CCP-II ASIC is based on 0.8 micron, two metal later technology implemented in 317K Gate Array, embedded in 304 PIN MQUAD Package. The chip has been designed using High level Design Entry VHDL and design has been synthesised using Synopsis Tools.

It finds applications in areas of multimedia, cluster computing and massively parallel computing.

Interconnection Topology

PARAM OpenFrame Topology is based on multistage interconnection CLOS network. This switch interconnection topology has been chosen to provide scalability, high bandwidth and flexibility. For supporting high bandwidth, it is desirable that the bandwidth of the topology should scale linearly. Thus as the number of nodes are added the bandwidth of the network should grow in proportion.

The communication protocol supporting synchronous and asynchronous formats is built upon very thin layer of software and supports three types of messages based on message length. Messages are sent in packets and priority can be assigned. Messages less than a packet length can be assigned high priority and transmitted immediately. Messages up to few KB in length are transmitted with packetisation and message multiplexing.

 

C-DAC’s KSHIPRA: Software Communication Model for SAN

KSHIPRA, Communication substrate designed to support low latency and high bandwidth is the key to the high level of aggregate system performance and scalability of C-DAC HPCC software. KSHIPRA caters to both parallel and distributed client server programming models.

The heart of KSHIPRA is the protected user level primitives - Active Messages (AM). It conforms to Active Messages II specifications given by University of California, Berkeley (UCB). Its implementation is an augmentation of the AM II software provided by UCB.

KSHIPRA Fast Sockets exports BSD sockets interface providing source code compatibility for legacy distributed / client server applications. Kernel Active Messages provide a high performance kernel to kernel, and kernel to user communication allowing kernel modules to leverage on AM technology. One such example is the Data Link Provider Interface (DLPI) driver which is layered over Kernel Active Messages. KSHIPRA thus provides binary compatibility to legacy distributed/client server applications. It also provides the Abstract Device Interface for layering Message Passing Interface (MPI) over AM.

KSHIPRA exports the following Application Programming Interfaces:

  • Active Messages
  • Fast Sockets
  • DLPI
  • MPI
Future Road Map

C-DAC is already ahead on its work on the development of the next phase of SAN of >4Gigabit/sec using CCP-III, VIA Interface, NICs compatible with 66MHz 64 bit PCI Bus standard.

 

Virtual Interface Architecture (VIA)

Virtual Interface Architecture is an open industry specification that defines an interface for low latency, high bandwidth communication between servers, workstations and storage devices within a cluster.

Intel Corporation, Compaq Computer Corporation, and Mircrosoft Corporation jointly authored the VI architecture specification. VIA significantly reduces the software overheads between a high performance CPU / memory subsystem and a high performance network. It achieves low-latency, high-bandwidth communication and data exchange between processes running on two nodes within a computing cluster, with minimal CPU usage.

 

INFINIBAND Emerging as the Standard for Next Generation Interconnect:

While Gigabit/Sec. media exists to transfer the data, the current bottleneck is interface between CPU and NIC. This is about to change with INFINIBAND. It is set to emerge as a High Speed PCI Bus standard to create High Speed System Clusters and Storage Area Networks. It provides scalable solutions to ever increasing bandwidth demand of server systems. Version 1.0 of INFINIBAND Architecture Specifications has been recently released. C-DAC will ensure that Next Generation SAN is compliant with emerging INFINIBAND standards.

S. P. Dixit
Programme Coordinator,
Hardware Technology Development Group (HTDG)
Click here to send an eMail

 

 

MULTILINGUAL HARDWARE PRODUCTS from C-DAC GIST

The GIST 9000 ASIC chip was the first commercial ASIC designed at C-DAC in 1989 followed by GIST 9001 and GIST 9002, fabricated respectively at VTI, SAMSUNG & LSI foundry. These chips are used in PC plug-in GIST card and stand alone GIST terminal. GIST card & terminals are used in DOS and UNIX environment for multilingual word-processing and data processing applications. During the last ten years over 100,000 chips have been sold by way of GIST card and terminals by C-DAC and its associated GIST technology partners. Many applications have been developed around the GIST card & terminals such as land record, milk & sugar cooperatives information system, accounting, electoral roll data base, radio-text, multilingual dictionary building etc.

LIPS (Language Independent Program Subtitle) technology is used for closed caption subtitle system. PC plug in encoder & standalone decoder are hardware used for encoding and decoding closed caption subtitles of video programme. These systems are deployed in country wide Doordrshan TV network for broadcast and reception of closed caption video programmes. LIPS creation station is used in making video subtitles of video programme on VHS, U-Matic, Betacam formats. Over 40 installations in private & Govt. video post production studios in India are using this systems, over 100 translators are engaged in creating movie subtitles and over 5000 movies have been subtitled in various Indian and Arabic languages for Indian and overseas viewers. Some of the major Broadcast users are Doordrashan and ZEE TV channels.

Video CD creation station is used for creating master Video CD titles from various video formats like VHS, S-VHS, U-Matic, Betacam. The main component in building the Video CD creation station is the MPEG-1 encoder. C-DAC, the only organization in India, developed MPEG-1 encoder indigenously andnow MPEG-II. Video CD titles have been prepared for varied usages starting with home rituals & festivals, multimedia presentation titles etc.

S.S. Pujari
Programme Coordinator,
GIST e-mail :

 

 

 

Real Time Systems Hardware Activities at C-DAC

The major hardware development activities at the Real Time Systems Development Group in C-DAC Bangalore Centre include :

  • Development of CompactPCI Bus based Industrial Computer System supported by Department of Information Technology.
  • Development of Universal Serial Bus (USB) based Remote Terminal Unit (RTU) - Internally supported Project.

 

CompactPCI based Industrial Computer System

CompactPCI (CPCI) is emerging as an Industrial Standard Architecture for Embedded Real-time controllers, Instrumentation, Telecom and Military Applications. The PCI electrical specification 2.0 is implemented with 3U/6U Eurocard form factor.

The specification was evolved by a consortium of over 350 companies known as PCI Industrial Computer Manufacturers Group (PICMG) and C-DAC has recently become a member of PICMG.

It’s easy peripheral expansion, large software support, low-cost, ease of maintenance and hot-swap facilities, are best suited for Industrial Computing Environment. CPCI uses rear-panel connectors; important for mission critical applications that requires the rapid replacement of boards. Unlike PCI bus motherboards, CPCI can accommodate upto eight boards, with additional expansion easily supported through the use of readily available bridge chips. The CPCI is independent of the processor, with a backplane with no active components.

The following modules are being developed adhering to CPCI specification, leading to rugged Industrial Computer System:

  • CompactPCI Motherboard
  • Processor Module
  • Analog Input Module
  • Analog Output Module
  • Digital Input Module
  • Digital Output Module
  • DSP Module
  • GPS Module
  • Counter Module
  • Field bus Module
  • Communication Controller

The sub-systems developed are designed to be configured to engineer the following products, which would find application in process and manufacturing industry.

  • Process Computers
  • SCADA
  • High Speed Data Acquisition System
  • Multi-loop process Controllers
  • High end RTUs

USB based Remote Terminal Unit

The Remote Terminal Units are basically remote Data Acquisition Systems, collecting real-time data from various process and send the data to the master station for further processing.

C-DAC’s earlier COPS-90 RTU was based on Transputers and the I/O Modules were interconnected using proprietary backplane bus.

COPS-90 RTUs have been deployed in 15 Electrical substations / generating stations at Punjab State Electricity Board. These RTUs have been functioning at PSEB for the last four years. COPS-90 RTU has also been installed at Bhilai Steel Plant and functioning as a high speed Data Acquisition System for skid detection in BBM Mill.

Since the transputer chips are now obsolete, C-DAC initiated an internal R&D project to develop a Pentium based RTU with Universal Serial Bus backplane. The development is in advanced stage. The USB based RTUs would have state-of-the-art features such as plug-and-play, hot swappability and web connectivity.

R. Pitchiah,
Group Co-ordinator,
Real Time Systems, C-DAC Bangalore
Send an eMail