Глоссарий терминов по высокоскоростным вычислениям

Вычислительный центр им. А.А. Дородницына РАН

Глоссарий терминов по высокоскоростным вычислениям

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

ANSI:

(American National Standards Institute )

Национальный институт стандартизации США.

AFS:

A distributed file system that allows users on different machines to access the same files. AFS allows files to be shared not only on different machines at the same site, but also at different sites across the country.

Закон Амдала (Параллельные вычисления)

Если F -- часть вычислений, которая является последовательной, а (1-F) -- часть, которую можнозапараллелить, тогда максимальное ускорение, S, которое можно достичь используя N параллельных процессоров:

S = (частичная параллельная скорость)/(скорость одного процессора) = 1/(F+(1-F)/N).

[Gene Amdahl, "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities", AFIPS Conference Proceedings, (30), pp. 483-485, 1967].

архитектура:

Конструкция основных компонентов компьютера (hardware) и способов взаимодействия этих компонент, составляющих полную машину.Для параллельного процессора архитектура включает как топологию машины целиком,так и детальную конструкцию каждого узла.

массив:

Последовательность объектов, в которой порядок важен (чтопротивопоставляется множеству, являющемуся группой объектов, в которой порядок не важен).Переменная массива в программе хранит n-мерную матрицу или вектор данных.Термин компьютерный массив также используется, чтобыописать множество узлов в параллельном процессоре, это термин подразумевает,но не требует, что этот процессор имеет деометрическую или матрично-подобную связность.

ATM:

Asynchronous Transfer Mode, a datatransfer protocol (also called cell switching) which features dynamicbandwidth allocation and a fixed cell length.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

bandwidth:

A measure of the speed ofinformation transfer, typically used to quantify the communicationcapability of concurrent computers. Bandwidth can be used to measureboth node to node and collective (bus) communication capability.Bandwidth is usually measured in megabytes of data per second.

barrier:

Point of synchronization formultiple simultaneous processes. Each process reaches the barrier, andthen waits until all of the other processes have reached that samebarrier before proceeding.

batch:

A method of processing a series of commands on a computer with no human interaction. Typically, a list of commands is placed within a file, and then that file is executed. Cluster CoNTroller provides batch access to the CTC cluster resources.

benchmark:

A standardized program (or suite of programs) that is run tomeasure the performance of one computer against the performance ofother computers running the same program.

blocking:

The action of communication routines that wait until theirfunction is complete before returning control to the calling program.For example, a routine that sends a message might delay its exit untilit receives confirmation that the message has been received.

bus:

A common medium connecting multiple electronic components. Low-costcomputers use a bus topology to connect the processors of a multiprocessor.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Programming language, originally basedon "B", designed by Dennis Ritchie. C is a low-level language that hasmany features commonly found in higher-level languages. C and C++ aretwo of the most common programming languages used today.

C++:

An object-oriented superset of the Cprogramming language. C++ allows the user to use abstract data classesand other advanced data representation/manipulation methods.

cache:

A fast memory used to hold commonlyused variables which are automatically fetched by hardware from theslower and larger main computer memory. Large memory requirements oftenlead to dense but slow memories. Memory throughput is high for largeamounts of data, but for individual or small amounts of data, the fetchtimes can be very long. To overcome long fetch times, computerarchitects use smaller interface memories with better fetch speeds orcache memories. The term is more often used when these memories arerequired to interface with main memory. If the required data arealready stored in the cache, fetches are fast. If the required data arenot in the cache, a cache miss results in the cache being refilled from main memory at the expense of time.

Cache memories are usually made transparent to the user. Areference to a given area of main memory for one piece of data orinstruction is usually closely followed by several additionalreferences to that same area for other data or instruction.Consequently, caches are automatically filled by a pre-definedalgorithm. The computer system manages the "prefetch" process.

Cluster CoNTroller System (CCS):

Name of the scheduling system developed at CTC and licensed to MPI Software Technology. It is the batch system used at CTC.

cluster, clustering:

A type of architecture consisting of a networked set ofnodes.

coarse grain:

See granularity

функции коллективной коммуникации:

Функции лбмена сообщениями, которые обмениваются данными среди всех процессов в группе.Эти функции обычно включают функцию барьера для синхронизации, функцию рассылки (broadcast) для отправки данных от одного процессора всем процессорам, и функции сборки/разброса (gather/scatter).

коммуникационная накладка:

Мера дополнительной рабочей нагрузки, которой подвержен параллельный алгоритм из-за коммуникации между узлами параллельного процессора. Если коммуникация – единственный источник накладки, то коммуникационная накладка задается формулой: ((число процессоров * параллельное время выполнения) - последовательное время выполнения) / последовательное время выполнения.

Compaq Visual Fortran:

Compaq's commercial Fortran and Fortran 90 scheduler for Windows. It is integrated into and includes the Microsoft Visual Studio programming environment.

computational science:

A field that concentrates on the effective use of computersoftware, hardware and mathematics to solve real problems. It is a termused when it is desirable to distinguish the more pragmatic aspects ofcomputing from (1) computer science,which often deals with the more theoretical aspects of computing; andfrom (2) computing engineering, which deals primarily with the designand construction of computers themselves. Computational science isoften thought of as the third leg of science along with experimentaland theoretical science.

computer science:

The systematic study ofcomputing systems and computation. The body of knowledge resulting fromthis discipline contains theories for understanding computing systemsand methods; design methodology, algorithms, and tools; methods for thetesting of concepts; methods of analysis and verification; andknowledge representation and implementation.

Connection Machine:

A SIMD concurrent computer once manufactured by the now defunct Thinking Machines Corporation.

contention:

A situation that occurs whenseveral processes attempt to access the same resource simultaneously.An example is memory contention in shared memorymultiprocessors, which occurs when two processors attempt to read fromor write to a location in shared memory in the same clock cycle.

ЦПУ:

Центральное Программируемое Устройство, арифметическое и управляющие частипоследовательного компьютера.

critical section:

A section of code thatshould be executed by only one processor at a time. Typically such asection involves data that must be read, modified, and rewritten; ifprocessor 2 is allowed to read the data after processor 1 has read itbut before processor 1 has updated it, then processor 1's update willbe overwritten and lost when processor 2 writes its update. Spin locksor semaphores are used to ensure strict sequential execution of acritical section of code by one processor at a time.

ВЦ РАН:

Вычислительный центр им. А.А.ДородницынаРАН

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

декомпозиция данных:

Способ разделения массивов между свзязаннымиЦПУ, чтобы минимизировать коммуникацию (взаимосвязь).

data dependency:

A situation that occurswhen there are two or more references to the same memory location, andat least one of them is a write. Parallel programmers must be aware ofdata dependencies in their programs to ensure that modifications madeto data by one processor are communicated to other processors that usethe same data. See recurrence for an example of a particular type of dependency.

data parallel:

A programming model inwhich each processor performs the same work on a unique segment of thedata. Either message passing libraries such as MPI or higher-level languages such as HPF can be used for coding with this model. An alternative to data parallel is functional parallel.

deadlock:

A situation in which processorsof a concurrent processor are waiting on an event which will neveroccur. A simple version of deadlock for a loosely synchronousenvironment arises when blocking reads and writes are not correctlymatched. For example, if two nodesboth execute blocking writes to each other at the same time, deadlockwill occur since neither write can complete until a complementary readis executed in the other node.

распределенная память:

Память, которая разделена на сегменты, каждый из которых может быть напрямую доступен только одним узлом из параллельных процессоров. Распределенная память и совместная (коллективная) память есть две основных архитектуры, которые требуют очень разных стилей програмирования.

распределенная обработка:

Обработка на ряде соединенных в сеть компьютеров, каждый из которых имеет локальную память. Компьютеры могут быть или могут не быть различной относительной мощности и функции.

распределенная совместная память (DSM):

Память, которая физически распределена,но это скрывается операционной системой, так что представляется пользователю как совместная память с однимадресным пространством. Также называют виртуальной совместной памятью.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

efficiency:

A measure of the amount oftime a parallel programs spends doing computation as opposed tocommunication. Efficiency is defined as speedup / number of processors. The closer it is to 1, the more perfectly parallel the task is at that level of parallelism; the closer to 0, the less parallel.

Emulex:

Emulex is a company (formerly Giganet) that makes a high performance cluster interconnect (or network) called Clan that supports the VIA protocol. CTC has Clan installed on V1 and Vplus. It provides a high bandwidth (~100 MBytes/Sec) low latency (~10 usec) connection between any two nodes on the cluster.

ESSL:

The Engineering and Scientific Subroutine Library, a library of mathematical subroutines that have been highly optimized for the SP family of machines and so run much faster than equivalent routines from other libraries. For equivalent functionality in other numerical libraries, see CTC's Software page.

ethernet:

A common network interconnect typically used for local area networks. Typically it supports the TCP/IP network protocol, a standard for connecting machines on a network and the internet.

Express:

Язык, поддерживающий параллельность посредствомпередачи сообщений, прозванный очередями сообщений. Поддерживался корпорацией ParaSoft. См. ftp://ftp.parasoft.com/express/docs/.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

БПФ:

Сокращения оn Быстрое Преобразование Фурье. Это методика, которая используется для очень быстрого подсчета ряда Фурье. Дискретное преобразование Фурье от ряда, состоящего из N точек, может быть подсчитано за N log N действий с помощью этой методики, в то время как непосредственный подсчет потребовал бы N**2 операций.

fine grain:

See granularity

FLOPS:

Floating point operations persecond; a measure of memory access performance, equal to the rate atwhich a machine can perform single-precision floating-pointcalculations.

Fortran:

Fortran: Acronym for FORmula TRANslator, one of the oldest high-level programming languages but one that is still widely used in scientific computing because of its compact notation for equations, ease in handling large arrays, and huge selection of library routines for solving mathematical problems efficiently. Fortran 77 and Fortran 90 are the two standards currently in use. HPF is a set of extensions to Fortran that support parallel programming. Fortran compilers available at CTC are fromCompaq, PGI, and Gnu.

functional decomposition:

A method ofprogramming decomposition in which a problem is broken up into severalindependent tasks, or functions, which can be run simultaneously ondifferent processors.

functional parallel:

A programming modelin which a program is broken down by tasks, and parallelism is achievedby assigning each task to a different processor. Message passinglibraries such as MPI are commonly used for communication among the processors. An alternative to functional parallel is data parallel.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

сборка/размещение:

Операция коллективной коммуникации,в которой (для сборки - gather) один процесс собирает данные от каждого участвующегопроцесса и запоминает их в порядке номеров процессов или (дляразмещения - scatter) один процесс разделяет некоторые данные и распределяет кусок каждомуучаствующему процессу опять же в порядке номеров процессов.

gigabyte (GB):

2**30 (hex 40 000 000) bytes of data, i.e. 1,073,741,824 bytes.

gigaflop or gflop:

One billion (10**9) floating point operations per second.

global memory:

The main memory accessible by all processors or CPUs.

grain size:

The number of fundamental entities, or members,in a grain. For example, if a grid is spatially decomposed intosubgrids, then the grain size is the number of grid points in asubgrid.

granularity:

A term often used in parallel processing to indicate independent processes that could be distributed to multiple CPUs.Fine granularity is illustrated by execution of statements or smallloop iterations as separate processes; coarse granularity involvessubroutines or sets of subroutines as separate processes. The moreprocesses, the "finer" the granularity and the more overheadrequired to keep track of them. Granularity can also be related to thetemporal duration of a "task" at work. It is not only the number ofprocesses but also how much work each process does, relative to thetime of synchronization, that determines the overhead and reduces speedup figures.

granule:

The fundamental grouping of members of a domain (system) into an object manipulated as a unit.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

hierarchy, hierarchical:

The logicalorganization of memory into a pyramid-type structure. At the "top" arerelatively few fast-access registers; "underneath" these are larger andlarger organizational groupings of memory that require correspondinglygreater access times.

High Performance Computing (HPC):

A computing system that provides more computing performance, power, or resource than is generally available. Sufficient memory to store large problem sets, computer memory, throughput, computational rates, and other related computer capabilities contribute to performace.

HPF:

High Performance Fortran, an extension to Fortran 77 or 90 that provides:opportunities for parallel execution automatically detected by thecompiler; various type of available parallelism - MIMD, SIMD,or some combination; allocation of data among individual processormemories, and placement of data within a single processor.

hypercube architecture:

Multiple CPU architecture with 2^N processors. Each CPU has N nearest neighbors in a manner similar to a hypercube, where each corner has N edges. The 2**3 machine would have eight CPUs arranged at the corners of a cube connected by the edges.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

IBM:

International Business MachinesCorporation, a firm that manufactures computer hardware and softwareincluding the IBM RS/6000 SP. CTC has a long history of joint work withIBM in developing and testing products needed by the scientificresearch community.

IMSL:

International Mathematical and Statistical Library, a useful set of computational subroutines written in Fortran.

inhomogeneous problem:

A problem whoseunderlying domain contains members of different types, e.g., anecological problem such as WaTor with different species of fish.Inhomogeneous problems are typically irregular, but the converse is notgenerally true.

Intel Math Kernel Library (MKL):

An Intel-supplied library which provides implementations of the BLAS andLAPACK optimized for the IA32 and IA64 architectures.

I/O:

Input/Output, the hardware and software mechanisms connecting a computer with the "outside world". This includes computer to disk and computer to terminal/network/graphics connections. Standard I/O is a particular software package used by the C language.

irregular problem:

A problem with a geometrically irregular domain containing many similar members, e.g., finite element nodal points.

ISO:

(International Organization for Standardization)

Международная организациия стандартизации.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

No J entries yet.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

kilobyte (KB):

2**10 (hex 400) bytes of data, i.e. 1,024 bytes.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

LAPACK:

A library of Fortran 77subroutines for solving the most common problems in numerical linearalgebra: systems of linear equations, linear least squares problems,eigenvalue problems, and singular value problems. LAPACK been designedto be efficient on a wide range of high performance computers.

latency:

The time to send a zero-length message from one node of a concurrent processor to another. Non-zero latency arises from the overhead in initiating and completing the message transfer. Also called startup time.

баланс загрузки:

Баланс загрузки - это цель алгоритмов, выполняющихся на параллельных процессах, которая достигается, если все узлы эффективно выполняют примерно равное количество работы, так что никакой узел не простаивает значительное количество времени.

пространство локального диска:

Дисковое пространство внутри данного узла. Например, на учетверенном узле все процессоры узла (1-4) достигают один и тот же локальный диск.

local memory:

The memory associated with a single CPU in a multiple CPU architecture, or memory associated with a local node in a distributed system.

loop unrolling:

An optimization technique valid for both scalar and vector architectures.The iterations of an inner loop are decreased by a factor of two ormore by explicit inclusion of the very next one or several iterations.Loop unrolling can allow traditional compilers to make better use ofthe registers and to improve overlap operations. On vector machinesloop unrolling may either improve or degrade performance, and theprocess involves a tradeoff between overlap and register use on onehand and vector length on the other.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

main memory:

A level of random access memory that lies between cache or register memory and extended random access memory. Main memory has higher capacity, but it is slower than cache or registers and has less capacity but faster access than extended random access memory.

mass storage:

An external storage device capable ofstoringlarge amounts of data. Online, directly accessible disk space islimited onmost systems; mass storage systems provide additional space that isslowerand more difficult to access, but can be virtually unlimited in size.

master-worker:

A programming approach in which one process, designated the "master," assigns tasks to other processes known as "workers."

megabyte (MB):

2**20 (hex 100 000) bytes of data, i.e. 1,048,576 bytes.

member:

Members are grouped first into granules and then grains. In a finite difference problem, the members are grid points.

memory:

See cache and main memory.

передача сообщений:

Парадигма коммуникации, в которой процессы поддерживают связь посредством обмена сообщениямичерез коммуникационные каналы.

MIMD:

(Multiple Instruction stream, Multiple Data stream) --- несколько потоков команд и несколько потоков данных --- архитектура, в которойнесколько потоков команд выполняются одновременно. Каждая единственнаякоманда может распоряжаться несколькими элементами данных (например, одним или более векторамина векторной машине). Хотя одно-процессорный векторный компьютер способенработать по методу MIMD из-за пересекающихся функциональных единиц, терминологию MIMDиспользуют более обще относя ее к многопроцессорным машинам. См. также SIMD, SISD.

MIPS:

Millions of instructions per second.

MOPS:

Millions of operations per second.

MPI:

Message Passing Interface, a de factostandard for communication among the nodes running a parallel programon a distributed memory system. MPI is a library of routines that canbe called from both Fortran and C programs. MPI's advantage over oldermessage passing libraries is that it is both portable (because MPI hasbeen implemented for almost every distributed memory architecture) andfast (because each implementation is optimized for the hardware it runson).

MPI/Pro:

A commercial, supported version of MPI that runs extremely well on Windows, supports SMP, TCP and VIA interconnects seamlessly and adhears to the MPI standard so MPI applications from other platforms should work without modification.

multiprocessing:

The ability of a computer to intermix jobs on one or more CPUs.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

NAg:

NAg is a company that provides a set of libraries for the solution of numerical and statistical problems. The C and Fortran libraries are planned for installation on the CTC Complex.

nearest neighbor:

A computer architecture that involves a connectivity which can be interpreted as that between adjacent members in geometric space.

узел:

Один из индивидуальных компьютеров, связанных вместе, чтобы сформировать параллельную систему. Компьютер может иметь множество процессоров, которые разделяют (совместно используют) систему ресурсов, таких как диск, память и сетевой интерфейс.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

object-oriented programming:

Style ofprogramming characterized by the use of separate "objects" to performdifferent tasks within a program. These "objects" usually consist of anabstract data type or class, along with the methods and procedures usedto manipulate that abstract data type.

openMP:

A set of compiler directives for C/C++ and FORTRAN which allow theprogrammer to expose thread-level parallelism in an architecture-independentfashion. OpenMP may make it easier to use in-box multiprocessing withoutusing an OS-specific thread library.

optimization:

The act of tuning a programto achieve the fastest possible performance on the system where it isrunning. There are tools available to help with this process, includingoptimization flags on compiler commands and optimizing preprocessorssuch as KAP from Kuck & Associates. You can also optimize a programby hand, using profiling tools to identify "hot spots" where theprogram spends most of its execution time. Optimization requiresrepeated test runs and comparison of timings, so it is usuallyworthwhile only for production programs that will be rerun many times.

overhead:

There are four contributions to the overhead, f, defined so that the speedup = number of nodes / (1 + f). The communication overhead and load balance contributions are also defined in this glossary. There are also algorithmic and software contributions to the overhead.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

P4:

Пакет макро/подпрограмм для параллельного программирования, разработанный Rusty Lusk lusk@anta.mcs.anl.gov. P4 использует мониторы на машинах с разделяемой (коллективной) памятью и передачу сообщений на машинах с распределеннойпамятью. Он употребляется как библиотека подпрограмм для C и Fortran. Улучшение "Argonne macros" (PARMACS).

parallel processing:

Processing with more than one CPU on a single application simultaneously.

parallelization:

The process of achieving a high percentage of the CPU time expended in parallel; minimizing idle CPU time in a parallel processing environment. For one program, parallelization refers to the splitting of program execution among many CPUs.

partitioning:

Restructuring a program or algorithm in semi-independent computational segments to take advantage of multiple CPUssimultaneously. The goal is to achieve roughly equivalent work in eachsegment with minimal need for intersegment communication. It is alsoworthwhile to have fewer segments than CPUs on a dedicated system.

petabyte:

2**50 (hex 4 000 000 000 000) bytes of data, i.e. 1,125,899,906,842,620 bytes.

pipelining:

Pipelining is the decomposition of a computation into operations which maythen be executed concurrently. Pipelining increases the utilization offunctional units in the processor, leading to greater computationalthroughput. Common microprocessors have between four and twenty pipelinestages.

polled communication:

Polling involves a nodeinspecting the communication hardware -- typically a flag bit -- to seeif information has arrived or departed. Polling is an alternative to aninterrupt-driven system and is typically the basis for implementing thecrystalline operating systems. The natural synchronization of the nodes imposed by polling is used in the implementation of blocking communication primitives.

preprocessor:

Software that takes a given source code and transforms it into an equivalent program in the same language. In theory, after compilation the transformed code should run faster than the original code--but this always happen in practice, so one should verify. A preprocessor tries to improve performance by applying a series of transformation rules designed to optimize loops based on the processor's cachesize; minimize memory stride;perform loop unrolling;etc.

primary memory:

Main memory accessible by the CPU(s) without using input/output processes.

процесс:

Задача, исполняемая на данном процессоре в данное время.

процессор:

Часть компьютера, которая реально исполняет ваши команды.Также известна как центральное процессорное устройство или ЦПУ.

PVM (Параллельная виртуальная машина):

(Parallel Virtual Machine) -- Параллельная виртуальная машина , библиотека передачи сообщений и множество средств, используемых, чтобы создать и исполнитьодновременные или параллельные приложения.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

No Q entries yet

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

RAID:

Redundant array of inexpensivedisks; a file system containing many disks, some of which are used tohold redundant copies of data or error correction codes to increasereliability. RAIDS are often used as parallel access file systems,where the sheer size of storage capacity required precludes using moreconventional (but more expensive) disk technology.

recurrence:

A dependencyin a DO-loop whereby a result depends upon completion of the previousiteration of the loop. Such dependencies inhibit vectorization. Forexample:

A(I) = A(I-1) + B(I)

In a loop on I, this process would not be vectorizable on most vectorcomputers without marked degradation in performance. This is not anaxiom or law, but rather is simply a fact resulting from currentmachine design.

reduced instruction set computer (RISC):

Aphilosophy of instruction set design where a small number of simple,fast instructions are implemented rather than a larger number ofslower, more complex instructions.

rendering:

The process of turning mathematical data into a picture or graph.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

scalably parallel:

All nodes are an equal communication distance apart and multiple routes avoid bottlenecks.

scattered decomposition:

A technique for decomposing data domains that involves scattering, or sprinkling, the elements of the domain over the nodesof the concurrent processor. This technique is used when locality,which is preserved by the alternate decomposition into connecteddomains often called domain decomposition, is less important than thegain in load balance obtained by associating each node with all parts of the domain.

scheduler:

The part of a computer system's software that determines which task is assigned to each system resource at any given time. In a batch system such as Cluster CoNTroller System that maintains a queue of jobs waiting to run, the scheduler determines when each job can start based on such criteria as the order in which job requests were submitted and the availability of the system resources needed by each job.

sequential execution:

See serial processing. Parallel programs may have sections that must be executed sequentially; see critical section.

serial processing:

Running an application on a single CPU.

shared memory:

A memory that is directly accessed by more than one node of a concurrent processor. Shared memory and distributed memory are two major architectures that require very different programming styles.

SIMD:

(Single Instruction stream, Multiple Data stream) --- один поток команд и несколько потоков данных ---архитектура, которой характеризуется большая часть векторныхкомпьютеров. Один поток команд запускает процесс, который устанавливает движениепотоков данных и результатов. Этот термин также применим к параллельным процессорам, у которых один поток команд вызывает синхронное выполнение одной и той же операции на более чем одном процессоре, даже на различных кусках данных (например,ILLIAC). См. также MIMD, SISD, и SPMD.

SISD:

(Single Instruction stream, Single Data stream) --- один поток команд и один поток данных ---, архитектура традиционного компьютера, в котором каждая команда (из одного потока команд) имеет дело с точно определеннымиэлементами данных или парой операндов скорее чем с "потоками" данных.См. также SIMD и MIMD.

SMP:

Symmetric Multi-Processor, a shared memory system.

SP:

Scalable POWERparallel System, a distributed memory machine from IBM. It consists of nodes (RS/6000 processors with associated memory and disk) connected by an ethernet and by a high-performance switch.

speedup:

A measure of how much faster a given program runs whenexecuted in parallel on several processors as compared to serialexecution on a single processor. Speedup is defined as S = sequentialrun time / parallel run time.

SPMD:

(Single Program, Multiple Data stream) --- одна программа и несколько потоков данных ----обобщение параллельного программирования данных SIMD, SPMD далее ослабляет обязательства по синхронизации, которые ограничивают функционирование различных процессов, и простоявно определяет какую программу все процессы должны запустить, но не какую команду каждый вынужден выполнять в каждый отдельный момент времени.Распределение данных, тем не менее, является все еще ключевой концепцией; в реальности, другой, обычно используемый для SPMD, термин есть декомпозиция данных--- это косвенно указывает на тот факт, что собираются декомпозировать всеобъемлющий набор данных,при выполнении на каждом участвующем процессе одного и того же кода.

square decomposition:

A strategy in which the array of nodes is decomposed into a two-dimensional mesh; we can then define scattered or local (domain) versions of this decomposition.

Standard I/O:

See I/O.

startup time:

See latency.

stride:

A term derived from the concept ofwalking (striding) through the data from one noncontiguous location tothe next. If data are to be accessed as a number of evenly spaced,discontiguous blocks, then stride is the distance between thebeginnings of successive blocks. For example, consider accessing rowsof a column-stored matrix. The rows have elements that are spaced inmemory by a stride of N, the dimension of the matrix.

striping:

Another technique for avoidingserialized I/O. In this case, the idea is to have each node write outits own portion of data into its own file. This is particularly good ifcheckpointing is what is really desired, because having each node'sstate saved in its own file is actually what you want.

superlinear:

Greater than linear; usuallyused in reporting results on parallel computers where the processingspeed increases more rapidly than does the number of processors.Example: A job taking 16 hours on a one-processor machine may take onlya half-hour on a 16-processor machine instead of one hour (from linear speedup).Upon close examination such examples are seen to be the result ofalgorithmic improvements. A demonstration that this is the case iseasily described. Consider a uni-processor that simulates amultiple-processor machine (such as the 16-processor machine in theexample cited). The uni-processor simulates the first cycle of each ofthe 16 parallel machines. It then simulates the second of each of theparallel machines, and so on until the program concludes. Clearly,(ignoring overhead) this takes16 times as long as would be the case on a 16-processor machine of thesame cycle time. Thus, any speedup seen by the 16-processor machine canbe simulated at one-sixteenth speed by the one-processor machine. Speedincrease is at most and at best linear in the number of processors.

superscalar:

A type of microprocessordesign incorporating multiple functional units together with sufficienthardware complexity to allow units to function relatively autonomously.This type of design provides opportunities for concurrent operations totake place during a single clock cycle.

switch:

A high bandwidth data transmission device used to communicate between different nodes on an IBM SP.

синхронизация:

Акт приведения известных точеквыполнения у двух или более процессов к одному и тому же заданному моменту времени. Явная синхронизация не нужна в программах с архитектурой SIMD (в которой каждый процессор выполняет ту же самую операцию, как и любой другой, или не делает ничего), но часто необходима в программах с архитектурами SPMD и MIMD. Время, затрачиваемое процессами в ожидании других процессов для синхронизации с ними,может быть основным источником неэффективности в параллельных программах.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

task:

The basic unit of work to be performed by the computer. Aprogram typically consists of a number of tasks, such as reading data,performing calculations, and writing output. A common way toparallelize a program is to divide up the tasks and assign one task toeach processor or node. For this reason, the term "task" is sometimesused interchangeably with processor or node in discussions of parallelprogramming.

T1:

Network transmissions of a DS1 formatted digital signal at a rate of 1.5 Mb/s

T3:

Network transmissions of a DS3 formatted digital signal at a rate of 45 Mb/s

TCP/IP:

Transmission ControlProtocol/Internet Protocol, the protocol used for communications on theinternet. TCP/IP software includes the telnet and ftp commands.

terabyte:

2**40 (hex 10,000,000,000) bytes of data, i.e. 1,099,511,627,776 bytes.

teraflop:

A processor speed of one trillion (10**12) floating point operations per second.

token:

When you login to a machine in AFSyou must be issued a "token" in order to become an authenticated AFSuser. This token is issued to you automatically at login when you enteryour password. Every token has an expiration date associated with it;on CTC machines, tokens are set to expire after 100 hours. To see whattokens you have, enter the command "tokens"

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

UNIX:

An operating system originallydeveloped by AT&T which, in various incarnations, is now availableon most types of supercomputer.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

vector:

A computer vector is an array ofnumbers with a location prescribed according to a contiguous or randomformula. A mathematical vector is a set of numbers or components withno conditions on their retrievability from computer memory. (See also stride.)

vector processing:

The practice of using computers that can process multiple vectors or arrays. Modern supercomputers achieve speed through pipelinedarithmetic units. Pipelining, when coupled with instructions designedto process multiple vectors, arrays or numbers rather than one datapair at a time, leads to great performance improvements.

vectorization:

The act of tuning an application code to take advantage of vector architecture.

virtual concurrent processor (aka virtual machine):

Thevirtual concurrent processor is a programming environment which allowsthe user a hardware-independent, portable programming environmentwithin the message passing paradigm. The virtual machine is composed ofvirtual nodes which correspond to individual processes; there may beseveral processes or virtual nodes on the node of a real computer.

visualization:

In the broadest sense,visualization is the art or science of transforming information to aform "comprehensible" by the sense of sight. Visualization is broadlyassociated with graphical display in the form of pictures (printed orphoto), workstation displays, or video.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Windows 2000, W2K:

This self-contained operating system with a graphical user interface was developed by Microsoft Corporation. This operating system, sometimes called W2K, is currently running on all CTC login and compute nodes.

wormhole routing:

A technique for routingmessages in which the head of the message establishes a path, which isreserved for the message until the tail has passed through it. Unlikevirtual cut-through, the tail proceeds at a rate dictated by theprogress of the head, which reduces the demand for intermediatebuffering.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

No X entries yet

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

No Y entries yet

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

No Z entries yet

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Sources:

· The Free On-line Dictionary of Computing Editor, Denis Howe
Copyright Dennis Howe 1993.

· Supercomputing - An Informal Glossary of Terms, Prepared by the Scientific Supercomputer Subcommittee of the Committee on Communications and Information Policy

This material is posted here with permission of the IEEE. Suchpermission of the IEEE does not in any way imply IEEEendorsement of any of CTC's products or services. Internalor personal use of this material is permitted. However, permissionto reprint/republish this material for advertising or promotionalpurposes or for creating new collective works for resale orredistribution must be obtained from the IEEE by sending ablank email message to info.pub.permission@ieee.org.

By choosing to view this document, you agree to all provisions of thecopyright laws protecting it.

· Solving Problems On Concurrent Processors,Volume 1, General Techniques and Regular Problems, by Geoffrey C. Fox,Mark A. Johnson, Gregory A. Lyzenga, Steve W. Otto, John K. Salmon, andDavid K. Walker of the California Institute of Technology

Reprinted by permission of Prentice-Hall, Inc., Upper Saddle River, NJ, 1996

Вычислительный центр им. А.А. Дородницына РАН