Глоссарий терминов по высокоскоростным вычислениям

Вычислительный центр им. А.А. Дородницына РАН

Глоссарий терминов по высокоскоростным вычислениям

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Sources

ANSI:: (American National Standards Institute )

Национальный институт стандартизации США.
AFS:: A distributed file system that allows users on different machines to access the same files. AFS allows files to be shared not only on different machines at the same site, but also at different sites across the country.
Закон Амдала (Параллельные вычисления): Если F -- часть вычислений, которая является последовательной, а (1-F) -- часть, которую можно запараллелить, тогда максимальное ускорение, S, которое можно достичь используя N параллельных процессоров:
S = (частичная параллельная скорость)/(скорость одного процессора) = 1/(F+(1-F)/N).
[Gene Amdahl, "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities", AFIPS Conference Proceedings, (30), pp. 483-485, 1967].
архитектура:: Конструкция основных компонентов компьютера (hardware) и способов взаимодействия этих компонент, составляющих полную машину. Для параллельного процессора архитектура включает как топологию машины целиком, так и детальную конструкцию каждого узла.
массив:: Последовательность объектов, в которой порядок важен (что противопоставляется множеству, являющемуся группой объектов, в которой порядок не важен). Переменная массива в программе хранит n-мерную матрицу или вектор данных. Термин компьютерный массив также используется, чтобы описать множество узлов в параллельном процессоре, это термин подразумевает, но не требует, что этот процессор имеет деометрическую или матрично-подобную связность.
ATM:: Asynchronous Transfer Mode, a data transfer protocol (also called cell switching) which features dynamic bandwidth allocation and a fixed cell length.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
bandwidth:: A measure of the speed of information transfer, typically used to quantify the communication capability of concurrent computers. Bandwidth can be used to measure both node to node and collective (bus) communication capability. Bandwidth is usually measured in megabytes of data per second.
barrier:: Point of synchronization for multiple simultaneous processes. Each process reaches the barrier, and then waits until all of the other processes have reached that same barrier before proceeding.
batch:: A method of processing a series of commands on a computer with no human interaction. Typically, a list of commands is placed within a file, and then that file is executed. Cluster CoNTroller provides batch access to the CTC cluster resources.
benchmark:: A standardized program (or suite of programs) that is run to measure the performance of one computer against the performance of other computers running the same program.
blocking:: The action of communication routines that wait until their function is complete before returning control to the calling program. For example, a routine that sends a message might delay its exit until it receives confirmation that the message has been received.
bus:: A common medium connecting multiple electronic components. Low-cost computers use a bus topology to connect the processors of a multiprocessor.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
C:: Programming language, originally based on "B", designed by Dennis Ritchie. C is a low-level language that has many features commonly found in higher-level languages. C and C++ are two of the most common programming languages used today.
C++:: An object-oriented superset of the C programming language. C++ allows the user to use abstract data classes and other advanced data representation/manipulation methods.
cache:: A fast memory used to hold commonly used variables which are automatically fetched by hardware from the slower and larger main computer memory. Large memory requirements often lead to dense but slow memories. Memory throughput is high for large amounts of data, but for individual or small amounts of data, the fetch times can be very long. To overcome long fetch times, computer architects use smaller interface memories with better fetch speeds or cache memories. The term is more often used when these memories are required to interface with main memory. If the required data are already stored in the cache, fetches are fast. If the required data are not in the cache, a cache miss results in the cache being refilled from main memory at the expense of time.; Cache memories are usually made transparent to the user. A reference to a given area of main memory for one piece of data or instruction is usually closely followed by several additional references to that same area for other data or instruction. Consequently, caches are automatically filled by a pre-defined algorithm. The computer system manages the "prefetch" process.
Cluster CoNTroller System (CCS):: Name of the scheduling system developed at CTC and licensed to MPI Software Technology. It is the batch system used at CTC.
cluster, clustering:: A type of architecture consisting of a networked set of nodes.
coarse grain:: See granularity
функции коллективной коммуникации:: Функции лбмена сообщениями, которые обмениваются данными среди всех процессов в группе. Эти функции обычно включают функцию барьера для синхронизации, функцию рассылки (broadcast) для отправки данных от одного процессора всем процессорам, и функции сборки/разброса (gather/scatter).
коммуникационная накладка:: Мера дополнительной рабочей нагрузки, которой подвержен параллельный алгоритм из-за коммуникации между узлами параллельного процессора. Если коммуникация – единственный источник накладки, то коммуникационная накладка задается формулой: ((число процессоров * параллельное время выполнения) - последовательное время выполнения) / последовательное время выполнения.
Compaq Visual Fortran:: Compaq's commercial Fortran and Fortran 90 scheduler for Windows. It is integrated into and includes the Microsoft Visual Studio programming environment.
computational science:: A field that concentrates on the effective use of computer software, hardware and mathematics to solve real problems. It is a term used when it is desirable to distinguish the more pragmatic aspects of computing from (1) computer science, which often deals with the more theoretical aspects of computing; and from (2) computing engineering, which deals primarily with the design and construction of computers themselves. Computational science is often thought of as the third leg of science along with experimental and theoretical science.
computer science:: The systematic study of computing systems and computation. The body of knowledge resulting from this discipline contains theories for understanding computing systems and methods; design methodology, algorithms, and tools; methods for the testing of concepts; methods of analysis and verification; and knowledge representation and implementation.
Connection Machine:: A SIMD concurrent computer once manufactured by the now defunct Thinking Machines Corporation.
contention:: A situation that occurs when several processes attempt to access the same resource simultaneously. An example is memory contention in shared memory multiprocessors, which occurs when two processors attempt to read from or write to a location in shared memory in the same clock cycle.
ЦПУ:: Центральное Программируемое Устройство, арифметическое и управляющие части последовательного компьютера.
critical section:: A section of code that should be executed by only one processor at a time. Typically such a section involves data that must be read, modified, and rewritten; if processor 2 is allowed to read the data after processor 1 has read it but before processor 1 has updated it, then processor 1's update will be overwritten and lost when processor 2 writes its update. Spin locks or semaphores are used to ensure strict sequential execution of a critical section of code by one processor at a time.
ВЦ РАН:: Вычислительный центр им. А.А.Дородницына РАН
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
декомпозиция данных:: Способ разделения массивов между свзязанными ЦПУ, чтобы минимизировать коммуникацию (взаимосвязь).
data dependency:: A situation that occurs when there are two or more references to the same memory location, and at least one of them is a write. Parallel programmers must be aware of data dependencies in their programs to ensure that modifications made to data by one processor are communicated to other processors that use the same data. See recurrence for an example of a particular type of dependency.
data parallel:: A programming model in which each processor performs the same work on a unique segment of the data. Either message passing libraries such as MPI or higher-level languages such as HPF can be used for coding with this model. An alternative to data parallel is functional parallel.
deadlock:: A situation in which processors of a concurrent processor are waiting on an event which will never occur. A simple version of deadlock for a loosely synchronous environment arises when blocking reads and writes are not correctly matched. For example, if two nodes both execute blocking writes to each other at the same time, deadlock will occur since neither write can complete until a complementary read is executed in the other node.
распределенная память:: Память, которая разделена на сегменты, каждый из которых может быть напрямую доступен только одним узлом из параллельных процессоров. Распределенная память и совместная (коллективная) память есть две основных архитектуры, которые требуют очень разных стилей програмирования.
распределенная обработка:: Обработка на ряде соединенных в сеть компьютеров, каждый из которых имеет локальную память. Компьютеры могут быть или могут не быть различной относительной мощности и функции.
распределенная совместная память (DSM):: Память, которая физически распределена, но это скрывается операционной системой, так что представляется пользователю как совместная память с одним адресным пространством. Также называют виртуальной совместной памятью.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
efficiency:: A measure of the amount of time a parallel programs spends doing computation as opposed to communication. Efficiency is defined as speedup / number of processors. The closer it is to 1, the more perfectly parallel the task is at that level of parallelism; the closer to 0, the less parallel.
Emulex:: Emulex is a company (formerly Giganet) that makes a high performance cluster interconnect (or network) called Clan that supports the VIA protocol. CTC has Clan installed on V1 and Vplus. It provides a high bandwidth (~100 MBytes/Sec) low latency (~10 usec) connection between any two nodes on the cluster.
ESSL:: The Engineering and Scientific Subroutine Library, a library of mathematical subroutines that have been highly optimized for the SP family of machines and so run much faster than equivalent routines from other libraries. For equivalent functionality in other numerical libraries, see CTC's Software page.
ethernet:: A common network interconnect typically used for local area networks. Typically it supports the TCP/IP network protocol, a standard for connecting machines on a network and the internet.
Express:: Язык, поддерживающий параллельность посредством передачи сообщений, прозванный очередями сообщений. Поддерживался корпорацией ParaSoft. См. ftp://ftp.parasoft.com/express/docs/.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
FFT:: An acronym for Fast Fourier Transform, a technique for very fast computation of Fourier series. A discrete Fourier transform using N points can be computed in N log N steps by the fast method, whereas the straightforward method would take N**2 steps.
fine grain:: See granularity
FLOPS:: Floating point operations per second; a measure of memory access performance, equal to the rate at which a machine can perform single-precision floating-point calculations.
Fortran:: Fortran: Acronym for FORmula TRANslator, one of the oldest high-level programming languages but one that is still widely used in scientific computing because of its compact notation for equations, ease in handling large arrays, and huge selection of library routines for solving mathematical problems efficiently. Fortran 77 and Fortran 90 are the two standards currently in use. HPF is a set of extensions to Fortran that support parallel programming. Fortran compilers available at CTC are from Compaq, PGI, and Gnu.
functional decomposition:: A method of programming decomposition in which a problem is broken up into several independent tasks, or functions, which can be run simultaneously on different processors.
functional parallel:: A programming model in which a program is broken down by tasks, and parallelism is achieved by assigning each task to a different processor. Message passing libraries such as MPI are commonly used for communication among the processors. An alternative to functional parallel is data parallel.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
сборка/размещение:: Операция коллективной коммуникации, в которой (для сборки - gather) один процесс собирает данные от каждого участвующего процесса и запоминает их в порядке номеров процессов или (для размещения - scatter) один процесс разделяет некоторые данные и распределяет кусок каждому участвующему процессу опять же в порядке номеров процессов.
gigabyte (GB):: 2**30 (hex 40 000 000) bytes of data, i.e. 1,073,741,824 bytes.
gigaflop or gflop:: One billion (10**9) floating point operations per second.
global memory:: The main memory accessible by all processors or CPUs.
grain size:: The number of fundamental entities, or members, in a grain. For example, if a grid is spatially decomposed into subgrids, then the grain size is the number of grid points in a subgrid.
granularity:: A term often used in parallel processing to indicate independent processes that could be distributed to multiple CPUs. Fine granularity is illustrated by execution of statements or small loop iterations as separate processes; coarse granularity involves subroutines or sets of subroutines as separate processes. The more processes, the "finer" the granularity and the more overhead required to keep track of them. Granularity can also be related to the temporal duration of a "task" at work. It is not only the number of processes but also how much work each process does, relative to the time of synchronization, that determines the overhead and reduces speedup figures.
granule:: The fundamental grouping of members of a domain (system) into an object manipulated as a unit.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
hierarchy, hierarchical:: The logical organization of memory into a pyramid-type structure. At the "top" are relatively few fast-access registers; "underneath" these are larger and larger organizational groupings of memory that require correspondingly greater access times.
High Performance Computing (HPC):: A computing system that provides more computing performance, power, or resource than is generally available. Sufficient memory to store large problem sets, computer memory, throughput, computational rates, and other related computer capabilities contribute to performace.
HPF:: High Performance Fortran, an extension to Fortran 77 or 90 that provides: opportunities for parallel execution automatically detected by the compiler; various type of available parallelism - MIMD, SIMD, or some combination; allocation of data among individual processor memories, and placement of data within a single processor.
hypercube architecture:: Multiple CPU architecture with 2^N processors. Each CPU has N nearest neighbors in a manner similar to a hypercube, where each corner has N edges. The 2**3 machine would have eight CPUs arranged at the corners of a cube connected by the edges.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
IBM:: International Business Machines Corporation, a firm that manufactures computer hardware and software including the IBM RS/6000 SP. CTC has a long history of joint work with IBM in developing and testing products needed by the scientific research community.
IMSL:: International Mathematical and Statistical Library, a useful set of computational subroutines written in Fortran.
inhomogeneous problem:: A problem whose underlying domain contains members of different types, e.g., an ecological problem such as WaTor with different species of fish. Inhomogeneous problems are typically irregular, but the converse is not generally true.
Intel Math Kernel Library (MKL):: An Intel-supplied library which provides implementations of the BLAS and LAPACK optimized for the IA32 and IA64 architectures.
I/O:: Input/Output, the hardware and software mechanisms connecting a computer with the "outside world". This includes computer to disk and computer to terminal/network/graphics connections. Standard I/O is a particular software package used by the C language.
irregular problem:: A problem with a geometrically irregular domain containing many similar members, e.g., finite element nodal points.
ISO:: (International Organization for Standardization)

Международная организациия стандартизации.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

No J entries yet.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
kilobyte (KB):: 2**10 (hex 400) bytes of data, i.e. 1,024 bytes.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
LAPACK:: A library of Fortran 77 subroutines for solving the most common problems in numerical linear algebra: systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems. LAPACK been designed to be efficient on a wide range of high performance computers.
latency:: The time to send a zero-length message from one node of a concurrent processor to another. Non-zero latency arises from the overhead in initiating and completing the message transfer. Also called startup time.
баланс загрузки:: Цель алгоритмов, выполняющихся на параллельных процессах, которая достигается, если все узлы эффективно выполняют примерно равное количество работы, так что никакой узел не простаивает значительное количество времени.
пространство локального диска:: Дисковое пространство внутри данного узла. Например, на учетверенном узле все процессоры узла (1-4) достигают один и тот же локальный диск.
local memory:: The memory associated with a single CPU in a multiple CPU architecture, or memory associated with a local node in a distributed system.
loop unrolling:: An optimization technique valid for both scalar and vector architectures. The iterations of an inner loop are decreased by a factor of two or more by explicit inclusion of the very next one or several iterations. Loop unrolling can allow traditional compilers to make better use of the registers and to improve overlap operations. On vector machines loop unrolling may either improve or degrade performance, and the process involves a tradeoff between overlap and register use on one hand and vector length on the other.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
main memory:: A level of random access memory that lies between cache or register memory and extended random access memory. Main memory has higher capacity, but it is slower than cache or registers and has less capacity but faster access than extended random access memory.
mass storage:: An external storage device capable of storing large amounts of data. Online, directly accessible disk space is limited on most systems; mass storage systems provide additional space that is slower and more difficult to access, but can be virtually unlimited in size.
master-worker:: A programming approach in which one process, designated the "master," assigns tasks to other processes known as "workers."
megabyte (MB):: 2**20 (hex 100 000) bytes of data, i.e. 1,048,576 bytes.
member:: Members are grouped first into granules and then grains. In a finite difference problem, the members are grid points.
memory:: See cache and main memory.
передача сообщений:: Парадигма коммуникации, в которой процессы поддерживают связь посредством обмена сообщениями через коммуникационные каналы.
MIMD:: (Multiple Instruction stream, Multiple Data stream) --- несколько потоков команд и несколько потоков данных --- архитектура, в которой несколько потоков команд выполняются одновременно. Каждая единственная команда может распоряжаться несколькими элементами данных (например, одним или более векторами на векторной машине). Хотя одно-процессорный векторный компьютер способен работать по методу MIMD из-за пересекающихся функциональных единиц, терминологию MIMD используют более обще относя ее к многопроцессорным машинам. См. также SIMD, SISD.
MIPS:: Millions of instructions per second.
MOPS:: Millions of operations per second.
MPI:: Message Passing Interface, a de facto standard for communication among the nodes running a parallel program on a distributed memory system. MPI is a library of routines that can be called from both Fortran and C programs. MPI's advantage over older message passing libraries is that it is both portable (because MPI has been implemented for almost every distributed memory architecture) and fast (because each implementation is optimized for the hardware it runs on).
MPI/Pro:: A commercial, supported version of MPI that runs extremely well on Windows, supports SMP, TCP and VIA interconnects seamlessly and adhears to the MPI standard so MPI applications from other platforms should work without modification.
multiprocessing:: The ability of a computer to intermix jobs on one or more CPUs.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
NAg:: NAg is a company that provides a set of libraries for the solution of numerical and statistical problems. The C and Fortran libraries are planned for installation on the CTC Complex.
nearest neighbor:: A computer architecture that involves a connectivity which can be interpreted as that between adjacent members in geometric space.
узел:: Один из индивидуальных компьютеров, связанных вместе, чтобы сформировать параллельную систему. Компьютер может иметь множество процессоров, которые разделяют (совместно используют) систему ресурсов, таких как диск, память и сетевой интерфейс.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
object-oriented programming:: Style of programming characterized by the use of separate "objects" to perform different tasks within a program. These "objects" usually consist of an abstract data type or class, along with the methods and procedures used to manipulate that abstract data type.
openMP:: A set of compiler directives for C/C++ and FORTRAN which allow the programmer to expose thread-level parallelism in an architecture-independent fashion. OpenMP may make it easier to use in-box multiprocessing without using an OS-specific thread library.
optimization:: The act of tuning a program to achieve the fastest possible performance on the system where it is running. There are tools available to help with this process, including optimization flags on compiler commands and optimizing preprocessors such as KAP from Kuck & Associates. You can also optimize a program by hand, using profiling tools to identify "hot spots" where the program spends most of its execution time. Optimization requires repeated test runs and comparison of timings, so it is usually worthwhile only for production programs that will be rerun many times.
overhead:: There are four contributions to the overhead, f, defined so that the speedup = number of nodes / (1 + f). The communication overhead and load balance contributions are also defined in this glossary. There are also algorithmic and software contributions to the overhead.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
P4:: Пакет макро/подпрограмм для параллельного программирования, разработанный Rusty Lusk lusk@anta.mcs.anl.gov. P4 использует мониторы на машинах с разделяемой (коллективной) памятью и передачу сообщений на машинах с распределенной памятью. Он употребляется как библиотека подпрограмм для C и Fortran. Улучшение "Argonne macros" (PARMACS).
parallel processing:: Processing with more than one CPU on a single application simultaneously.
parallelization:: The process of achieving a high percentage of the CPU time expended in parallel; minimizing idle CPU time in a parallel processing environment. For one program, parallelization refers to the splitting of program execution among many CPUs.
partitioning:: Restructuring a program or algorithm in semi-independent computational segments to take advantage of multiple CPUs simultaneously. The goal is to achieve roughly equivalent work in each segment with minimal need for intersegment communication. It is also worthwhile to have fewer segments than CPUs on a dedicated system.
petabyte:: 2**50 (hex 4 000 000 000 000) bytes of data, i.e. 1,125,899,906,842,620 bytes.
pipelining:: Pipelining is the decomposition of a computation into operations which may then be executed concurrently. Pipelining increases the utilization of functional units in the processor, leading to greater computational throughput. Common microprocessors have between four and twenty pipeline stages.
polled communication:: Polling involves a node inspecting the communication hardware -- typically a flag bit -- to see if information has arrived or departed. Polling is an alternative to an interrupt-driven system and is typically the basis for implementing the crystalline operating systems. The natural synchronization of the nodes imposed by polling is used in the implementation of blocking communication primitives.
preprocessor:: Software that takes a given source code and transforms it into an equivalent program in the same language. In theory, after compilation the transformed code should run faster than the original code--but this always happen in practice, so one should verify. A preprocessor tries to improve performance by applying a series of transformation rules designed to optimize loops based on the processor's cache size; minimize memory stride; perform loop unrolling; etc.
primary memory:: Main memory accessible by the CPU(s) without using input/output processes.
процесс:: Задача, исполняемая на данном процессоре в данное время.
процессор:: Часть компьютера, которая реально исполняет ваши команды. Также известна как центральное процессорное устройство или ЦПУ.
PVM:: (Parallel Virtual Machine) -- Параллельная виртуальная машина , библиотека передачи сообщений и множество средств, используемых, чтобы создать и исполнить одновременные или параллельные приложения.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
No Q entries yet
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
RAID:: Redundant array of inexpensive disks; a file system containing many disks, some of which are used to hold redundant copies of data or error correction codes to increase reliability. RAIDS are often used as parallel access file systems, where the sheer size of storage capacity required precludes using more conventional (but more expensive) disk technology.
recurrence:: A dependency in a DO-loop whereby a result depends upon completion of the previous iteration of the loop. Such dependencies inhibit vectorization. For example:

A(I) = A(I-1) + B(I)
In a loop on I, this process would not be vectorizable on most vector computers without marked degradation in performance. This is not an axiom or law, but rather is simply a fact resulting from current machine design.
reduced instruction set computer (RISC):: A philosophy of instruction set design where a small number of simple, fast instructions are implemented rather than a larger number of slower, more complex instructions.
rendering:: The process of turning mathematical data into a picture or graph.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
scalably parallel:: All nodes are an equal communication distance apart and multiple routes avoid bottlenecks.
scattered decomposition:: A technique for decomposing data domains that involves scattering, or sprinkling, the elements of the domain over the nodes of the concurrent processor. This technique is used when locality, which is preserved by the alternate decomposition into connected domains often called domain decomposition, is less important than the gain in load balance obtained by associating each node with all parts of the domain.
scheduler:: The part of a computer system's software that determines which task is assigned to each system resource at any given time. In a batch system such as Cluster CoNTroller System that maintains a queue of jobs waiting to run, the scheduler determines when each job can start based on such criteria as the order in which job requests were submitted and the availability of the system resources needed by each job.
sequential execution:: See serial processing. Parallel programs may have sections that must be executed sequentially; see critical section.
serial processing:: Running an application on a single CPU.
shared memory:: A memory that is directly accessed by more than one node of a concurrent processor. Shared memory and distributed memory are two major architectures that require very different programming styles.
SIMD:: (Single Instruction stream, Multiple Data stream) --- один поток команд и несколько потоков данных --- архитектура, которой характеризуется большая часть векторных компьютеров. Один поток команд запускает процесс, который устанавливает движение потоков данных и результатов. Этот термин также применим к параллельным процессорам, у которых один поток команд вызывает синхронное выполнение одной и той же операции на более чем одном процессоре, даже на различных кусках данных (например, ILLIAC). См. также MIMD, SISD, и SPMD.
SISD:: (Single Instruction stream, Single Data stream) --- один поток команд и один поток данных ---, архитектура традиционного компьютера, в котором каждая команда (из одного потока команд) имеет дело с точно определенными элементами данных или парой операндов скорее чем с "потоками" данных. См. также SIMD и MIMD.
SMP:: Symmetric Multi-Processor, a shared memory system.
SP:: Scalable POWERparallel System, a distributed memory machine from IBM. It consists of nodes (RS/6000 processors with associated memory and disk) connected by an ethernet and by a high-performance switch.
speedup:: A measure of how much faster a given program runs when executed in parallel on several processors as compared to serial execution on a single processor. Speedup is defined as S = sequential run time / parallel run time.
SPMD:: (Single Program, Multiple Data stream) --- одна программа и несколько потоков данных ---- обобщение параллельного программирования данных SIMD, SPMD далее ослабляет обязательства по синхронизации, которые ограничивают функционирование различных процессов, и просто явно определяет какую программу все процессы должны запустить, но не какую команду каждый вынужден выполнять в каждый отдельный момент времени. Распределение данных, тем не менее, является все еще ключевой концепцией; в реальности, другой, обычно используемый для SPMD, термин есть декомпозиция данных --- это косвенно указывает на тот факт, что собираются декомпозировать всеобъемлющий набор данных, при выполнении на каждом участвующем процессе одного и того же кода.
square decomposition:: A strategy in which the array of nodes is decomposed into a two-dimensional mesh; we can then define scattered or local (domain) versions of this decomposition.
Standard I/O:: See I/O.
startup time:: See latency.
stride:: A term derived from the concept of walking (striding) through the data from one noncontiguous location to the next. If data are to be accessed as a number of evenly spaced, discontiguous blocks, then stride is the distance between the beginnings of successive blocks. For example, consider accessing rows of a column-stored matrix. The rows have elements that are spaced in memory by a stride of N, the dimension of the matrix.
striping:: Another technique for avoiding serialized I/O. In this case, the idea is to have each node write out its own portion of data into its own file. This is particularly good if checkpointing is what is really desired, because having each node's state saved in its own file is actually what you want.
superlinear:: Greater than linear; usually used in reporting results on parallel computers where the processing speed increases more rapidly than does the number of processors. Example: A job taking 16 hours on a one-processor machine may take only a half-hour on a 16-processor machine instead of one hour (from linear speedup). Upon close examination such examples are seen to be the result of algorithmic improvements. A demonstration that this is the case is easily described. Consider a uni-processor that simulates a multiple-processor machine (such as the 16-processor machine in the example cited). The uni-processor simulates the first cycle of each of the 16 parallel machines. It then simulates the second of each of the parallel machines, and so on until the program concludes. Clearly, (ignoring overhead) this takes 16 times as long as would be the case on a 16-processor machine of the same cycle time. Thus, any speedup seen by the 16-processor machine can be simulated at one-sixteenth speed by the one-processor machine. Speed increase is at most and at best linear in the number of processors.
superscalar:: A type of microprocessor design incorporating multiple functional units together with sufficient hardware complexity to allow units to function relatively autonomously. This type of design provides opportunities for concurrent operations to take place during a single clock cycle.
switch:: A high bandwidth data transmission device used to communicate between different nodes on an IBM SP.
синхронизация:: Акт приведения известных точек выполнения у двух или более процессов к одному и тому же заданному моменту времени. Явная синхронизация не нужна в программах с архитектурой SIMD (в которой каждый процессор выполняет ту же самую операцию, как и любой другой, или не делает ничего), но часто необходима в программах с архитектурами SPMD и MIMD. Время, затрачиваемое процессами в ожидании других процессов для синхронизации с ними, может быть основным источником неэффективности в параллельных программах.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
task:: The basic unit of work to be performed by the computer. A program typically consists of a number of tasks, such as reading data, performing calculations, and writing output. A common way to parallelize a program is to divide up the tasks and assign one task to each processor or node. For this reason, the term "task" is sometimes used interchangeably with processor or node in discussions of parallel programming.
T1:: Network transmissions of a DS1 formatted digital signal at a rate of 1.5 Mb/s
T3:: Network transmissions of a DS3 formatted digital signal at a rate of 45 Mb/s
TCP/IP:: Transmission Control Protocol/Internet Protocol, the protocol used for communications on the internet. TCP/IP software includes the telnet and ftp commands.
terabyte:: 2**40 (hex 10,000,000,000) bytes of data, i.e. 1,099,511,627,776 bytes.
teraflop:: A processor speed of one trillion (10**12) floating point operations per second.
token:: When you login to a machine in AFS you must be issued a "token" in order to become an authenticated AFS user. This token is issued to you automatically at login when you enter your password. Every token has an expiration date associated with it; on CTC machines, tokens are set to expire after 100 hours. To see what tokens you have, enter the command "tokens"
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
UNIX:: An operating system originally developed by AT&T which, in various incarnations, is now available on most types of supercomputer.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
vector:: A computer vector is an array of numbers with a location prescribed according to a contiguous or random formula. A mathematical vector is a set of numbers or components with no conditions on their retrievability from computer memory. (See also stride.)
vector processing:: The practice of using computers that can process multiple vectors or arrays. Modern supercomputers achieve speed through pipelined arithmetic units. Pipelining, when coupled with instructions designed to process multiple vectors, arrays or numbers rather than one data pair at a time, leads to great performance improvements.
vectorization:: The act of tuning an application code to take advantage of vector architecture.
virtual concurrent processor (aka virtual machine):: The virtual concurrent processor is a programming environment which allows the user a hardware-independent, portable programming environment within the message passing paradigm. The virtual machine is composed of virtual nodes which correspond to individual processes; there may be several processes or virtual nodes on the node of a real computer.
visualization:: In the broadest sense, visualization is the art or science of transforming information to a form "comprehensible" by the sense of sight. Visualization is broadly associated with graphical display in the form of pictures (printed or photo), workstation displays, or video.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Windows 2000, W2K:: This self-contained operating system with a graphical user interface was developed by Microsoft Corporation. This operating system, sometimes called W2K, is currently running on all CTC login and compute nodes.
wormhole routing:: A technique for routing messages in which the head of the message establishes a path, which is reserved for the message until the tail has passed through it. Unlike virtual cut-through, the tail proceeds at a rate dictated by the progress of the head, which reduces the demand for intermediate buffering.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
No X entries yet
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
No Y entries yet
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
No Z entries yet
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Sources:

The Free On-line Dictionary of Computing Editor, Denis Howe
Copyright Dennis Howe 1993.

Supercomputing - An Informal Glossary of Terms, Prepared by the Scientific Supercomputer Subcommittee of the Committee on Communications and Information Policy

Copyright ╘ 1996 Institute of Electrical and Electronics Engineers. Portions re-printed, with permission, from SUPERCOMPUTING 2nd Edition, IEEE Catalog No. UH0182-6.
This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of CTC's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by sending a blank email message to info.pub.permission@ieee.org.
By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Solving Problems On Concurrent Processors, Volume 1, General Techniques and Regular Problems, by Geoffrey C. Fox, Mark A. Johnson, Gregory A. Lyzenga, Steve W. Otto, John K. Salmon, and David K. Walker of the California Institute of Technology
Reprinted by permission of Prentice-Hall, Inc., Upper Saddle River, NJ, 1996