- ANSI:
- (American National Standards Institute )
Национальный институт стандартизации США.
- AFS:
- A distributed file system that allows users on different machines to
access the same files. AFS allows files to be shared not only on different
machines at the same site, but also at different sites across the country.
- Закон Амдала (Параллельные вычисления)
-
Если F -- часть вычислений, которая является последовательной, а (1-F) -- часть, которую можно
запараллелить, тогда максимальное ускорение, S, которое можно достичь
используя N параллельных процессоров:
S = (частичная параллельная скорость)/(скорость одного процессора) = 1/(F+(1-F)/N).
[Gene Amdahl, "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities",
AFIPS Conference Proceedings, (30), pp. 483-485, 1967].
- архитектура:
- Конструкция основных компонентов компьютера
(hardware) и способов взаимодействия этих компонент, составляющих полную машину.
Для параллельного процессора архитектура включает как топологию машины целиком,
так и детальную конструкцию каждого узла.
- массив:
- Последовательность объектов, в которой порядок важен (что
противопоставляется множеству, являющемуся группой объектов, в которой порядок не важен).
Переменная массива в программе хранит n-мерную матрицу или вектор данных.
Термин компьютерный массив также используется, чтобы
описать множество узлов в параллельном процессоре, это термин подразумевает,
но не требует, что этот процессор имеет деометрическую или матрично-подобную связность.
- ATM:
- Asynchronous Transfer Mode, a data
transfer protocol (also called cell switching) which features dynamic
bandwidth allocation and a fixed cell length.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- bandwidth:
- A measure of the speed of
information transfer, typically used to quantify the communication
capability of concurrent computers. Bandwidth can be used to measure
both node to node and collective (bus) communication capability.
Bandwidth is usually measured in megabytes of data per second.
- barrier:
- Point of synchronization for
multiple simultaneous processes. Each process reaches the barrier, and
then waits until all of the other processes have reached that same
barrier before proceeding.
- batch:
- A method of processing a series of commands on a computer with no human
interaction. Typically, a list of commands is placed within a file, and then
that file is executed. Cluster CoNTroller provides batch access to the CTC cluster resources.
- benchmark:
- A standardized program (or suite of programs) that is run to
measure the performance of one computer against the performance of
other computers running the same program.
- blocking:
- The action of communication routines that wait until their
function is complete before returning control to the calling program.
For example, a routine that sends a message might delay its exit until
it receives confirmation that the message has been received.
- bus:
- A common medium connecting multiple electronic components. Low-cost
computers use a bus topology to connect the processors of a multiprocessor.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- C:
- Programming language, originally based
on "B", designed by Dennis Ritchie. C is a low-level language that has
many features commonly found in higher-level languages. C and C++ are
two of the most common programming languages used today.
- C++:
- An object-oriented superset of the C
programming language. C++ allows the user to use abstract data classes
and other advanced data representation/manipulation methods.
- cache:
- A fast memory used to hold commonly
used variables which are automatically fetched by hardware from the
slower and larger main computer memory. Large memory requirements often
lead to dense but slow memories. Memory throughput is high for large
amounts of data, but for individual or small amounts of data, the fetch
times can be very long. To overcome long fetch times, computer
architects use smaller interface memories with better fetch speeds or
cache memories. The term is more often used when these memories are
required to interface with main memory. If the required data are
already stored in the cache, fetches are fast. If the required data are
not in the cache, a cache miss results in the cache being refilled from main memory at the expense of time.
- Cache memories are usually made transparent to the user. A
reference to a given area of main memory for one piece of data or
instruction is usually closely followed by several additional
references to that same area for other data or instruction.
Consequently, caches are automatically filled by a pre-defined
algorithm. The computer system manages the "prefetch" process.
- Cluster CoNTroller System (CCS):
- Name of the scheduling system developed at CTC and licensed to MPI Software Technology.
It is the batch system used at CTC.
- cluster, clustering:
- A type of architecture consisting of a networked set of
nodes.
- coarse grain:
- See granularity
- функции коллективной коммуникации:
- Функции лбмена сообщениями, которые обмениваются данными среди всех процессов в группе.
Эти функции обычно включают функцию барьера для синхронизации,
функцию рассылки (broadcast) для отправки данных от одного процессора всем процессорам, и функции
сборки/разброса (gather/scatter).
- коммуникационная накладка:
-
Мера дополнительной рабочей нагрузки,
которой подвержен параллельный алгоритм из-за коммуникации между узлами параллельного процессора.
Если коммуникация – единственный источник накладки, то коммуникационная накладка
задается формулой: ((число процессоров * параллельное время выполнения) - последовательное время выполнения) / последовательное время выполнения.
- Compaq Visual Fortran:
- Compaq's commercial Fortran and Fortran 90 scheduler for Windows.
It is integrated into and includes the Microsoft Visual Studio programming environment.
- computational science:
- A field that concentrates on the effective use of computer
software, hardware and mathematics to solve real problems. It is a term
used when it is desirable to distinguish the more pragmatic aspects of
computing from (1) computer science,
which often deals with the more theoretical aspects of computing; and
from (2) computing engineering, which deals primarily with the design
and construction of computers themselves. Computational science is
often thought of as the third leg of science along with experimental
and theoretical science.
- computer science:
- The systematic study of
computing systems and computation. The body of knowledge resulting from
this discipline contains theories for understanding computing systems
and methods; design methodology, algorithms, and tools; methods for the
testing of concepts; methods of analysis and verification; and
knowledge representation and implementation.
- Connection Machine:
- A SIMD concurrent computer once manufactured by the now defunct Thinking Machines Corporation.
- contention:
- A situation that occurs when
several processes attempt to access the same resource simultaneously.
An example is memory contention in shared memory
multiprocessors, which occurs when two processors attempt to read from
or write to a location in shared memory in the same clock cycle.
- ЦПУ:
- Центральное Программируемое Устройство, арифметическое и управляющие части
последовательного компьютера.
- critical section:
- A section of code that
should be executed by only one processor at a time. Typically such a
section involves data that must be read, modified, and rewritten; if
processor 2 is allowed to read the data after processor 1 has read it
but before processor 1 has updated it, then processor 1's update will
be overwritten and lost when processor 2 writes its update. Spin locks
or semaphores are used to ensure strict sequential execution of a
critical section of code by one processor at a time.
- ВЦ РАН:
- Вычислительный центр им. А.А.Дородницына
РАН
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- декомпозиция данных:
- Способ разделения массивов между свзязанными
ЦПУ, чтобы минимизировать коммуникацию (взаимосвязь).
- data dependency:
- A situation that occurs
when there are two or more references to the same memory location, and
at least one of them is a write. Parallel programmers must be aware of
data dependencies in their programs to ensure that modifications made
to data by one processor are communicated to other processors that use
the same data. See recurrence for an example of a particular type of dependency.
- data parallel:
- A programming model in
which each processor performs the same work on a unique segment of the
data. Either message passing libraries such as MPI or higher-level languages such as HPF can be used for coding with this model. An alternative to data parallel is functional parallel.
- deadlock:
- A situation in which processors
of a concurrent processor are waiting on an event which will never
occur. A simple version of deadlock for a loosely synchronous
environment arises when blocking reads and writes are not correctly
matched. For example, if two nodes
both execute blocking writes to each other at the same time, deadlock
will occur since neither write can complete until a complementary read
is executed in the other node.
- распределенная память:
- Память, которая разделена на сегменты, каждый из которых может быть напрямую доступен только одним узлом из параллельных процессоров. Распределенная память и совместная (коллективная) память есть две основных архитектуры, которые требуют очень разных стилей програмирования.
- распределенная обработка:
- Обработка на ряде соединенных в сеть компьютеров, каждый из которых имеет локальную память.
Компьютеры могут быть или могут не быть различной относительной мощности и функции.
- распределенная совместная память (DSM):
- Память, которая физически распределена,
но это скрывается операционной системой, так что представляется пользователю как совместная память с одним
адресным пространством. Также называют виртуальной совместной памятью.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- efficiency:
- A measure of the amount of
time a parallel programs spends doing computation as opposed to
communication. Efficiency is defined as speedup / number of processors. The closer it is to 1, the more perfectly parallel the task is at that level of parallelism; the closer to 0, the less parallel.
- Emulex:
- Emulex is a company (formerly Giganet) that makes a high performance cluster interconnect (or network)
called Clan that supports the VIA protocol. CTC has Clan installed on V1 and Vplus.
It provides a high bandwidth (~100 MBytes/Sec) low latency (~10 usec) connection between any
two nodes on the cluster.
- ESSL:
- The Engineering and Scientific Subroutine Library, a library
of mathematical subroutines that have been highly optimized for the SP family of
machines and so run much faster than equivalent routines from other libraries.
For equivalent functionality in other numerical libraries, see CTC's
Software page.
- ethernet:
- A common network interconnect typically used for local area networks. Typically it supports the
TCP/IP network protocol, a standard for connecting machines on a network and the internet.
- Express:
- Язык, поддерживающий параллельность посредством
передачи сообщений, прозванный очередями сообщений. Поддерживался корпорацией
ParaSoft. См. ftp://ftp.parasoft.com/express/docs/.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- FFT:
- An acronym for Fast Fourier Transform, a technique for very
fast computation of Fourier series. A discrete Fourier transform using
N points can be computed in N log N steps by the fast method, whereas
the straightforward method would take N**2 steps.
- fine grain:
- See granularity
- FLOPS:
- Floating point operations per
second; a measure of memory access performance, equal to the rate at
which a machine can perform single-precision floating-point
calculations.
- Fortran:
- Fortran:
Acronym for FORmula TRANslator, one of the oldest high-level programming
languages but one that is still widely used in scientific computing because of
its compact notation for equations, ease in handling large arrays, and huge
selection of library routines for solving mathematical problems efficiently.
Fortran 77 and Fortran 90 are the two standards currently in use. HPF is a set of extensions to
Fortran that support parallel programming. Fortran compilers available at CTC are from
Compaq, PGI, and Gnu.
- functional decomposition:
- A method of
programming decomposition in which a problem is broken up into several
independent tasks, or functions, which can be run simultaneously on
different processors.
- functional parallel:
- A programming model
in which a program is broken down by tasks, and parallelism is achieved
by assigning each task to a different processor. Message passing
libraries such as MPI are commonly used for communication among the processors. An alternative to functional parallel is data parallel.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- сборка/размещение:
- Операция коллективной коммуникации,
в которой (для сборки - gather) один процесс собирает данные от каждого участвующего
процесса и запоминает их в порядке номеров процессов или (для
размещения - scatter) один процесс разделяет некоторые данные и распределяет кусок каждому
участвующему процессу опять же в порядке номеров процессов.
- gigabyte (GB):
- 2**30 (hex 40 000 000) bytes of data, i.e. 1,073,741,824 bytes.
- gigaflop or gflop:
- One billion (10**9) floating point operations per second.
- global memory:
- The main memory accessible by all processors or CPUs.
- grain size:
- The number of fundamental entities, or members,
in a grain. For example, if a grid is spatially decomposed into
subgrids, then the grain size is the number of grid points in a
subgrid.
- granularity:
- A term often used in parallel processing to indicate independent processes that could be distributed to multiple CPUs.
Fine granularity is illustrated by execution of statements or small
loop iterations as separate processes; coarse granularity involves
subroutines or sets of subroutines as separate processes. The more
processes, the "finer" the granularity and the more overhead
required to keep track of them. Granularity can also be related to the
temporal duration of a "task" at work. It is not only the number of
processes but also how much work each process does, relative to the
time of synchronization, that determines the overhead and reduces speedup figures.
- granule:
- The fundamental grouping of members of a domain (system) into an object manipulated as a unit.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- hierarchy, hierarchical:
- The logical
organization of memory into a pyramid-type structure. At the "top" are
relatively few fast-access registers; "underneath" these are larger and
larger organizational groupings of memory that require correspondingly
greater access times.
- High Performance Computing (HPC):
- A computing system that provides more computing performance, power, or resource
than is generally available. Sufficient memory to store large problem sets,
computer memory, throughput, computational rates, and other related computer
capabilities contribute to performace.
- HPF:
- High Performance Fortran, an extension to
Fortran 77 or 90 that provides:
opportunities for parallel execution automatically detected by the
compiler; various type of available parallelism - MIMD, SIMD,
or some combination; allocation of data among individual processor
memories, and placement of data within a single processor.
- hypercube architecture:
- Multiple CPU architecture with 2^N processors. Each CPU has N nearest neighbors in a manner similar to a hypercube, where each corner has N edges. The 2**3 machine would have eight CPUs arranged at the corners of a cube connected by the edges.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- IBM:
- International Business Machines
Corporation, a firm that manufactures computer hardware and software
including the IBM RS/6000 SP. CTC has a long history of joint work with
IBM in developing and testing products needed by the scientific
research community.
- IMSL:
- International Mathematical and Statistical Library, a useful set of computational subroutines written in Fortran.
- inhomogeneous problem:
- A problem whose
underlying domain contains members of different types, e.g., an
ecological problem such as WaTor with different species of fish.
Inhomogeneous problems are typically irregular, but the converse is not
generally true.
- Intel Math Kernel Library (MKL):
- An Intel-supplied library which provides implementations of the BLAS and
LAPACK optimized for the IA32 and IA64 architectures.
- I/O:
- Input/Output, the hardware and software mechanisms connecting
a computer with the "outside world". This includes computer to disk and computer
to terminal/network/graphics connections. Standard I/O is a particular software
package used by the C language.
- irregular problem:
- A problem with a geometrically irregular domain containing many similar members, e.g., finite element nodal points.
- ISO:
- (International Organization for Standardization)
Международная организациия стандартизации.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
No J entries yet.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- kilobyte (KB):
- 2**10 (hex 400) bytes of data, i.e. 1,024 bytes.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- LAPACK:
- A library of Fortran 77
subroutines for solving the most common problems in numerical linear
algebra: systems of linear equations, linear least squares problems,
eigenvalue problems, and singular value problems. LAPACK been designed
to be efficient on a wide range of high performance computers.
- latency:
- The time to send a zero-length message from one node of a concurrent processor to another. Non-zero latency arises from the overhead in initiating and completing the message transfer. Also called startup time.
- баланс загрузки:
-
Цель алгоритмов, выполняющихся на параллельных процессах, которая достигается,
если все узлы эффективно выполняют примерно
равное количество работы, так что никакой узел не простаивает значительное количество времени.
- пространство локального диска:
- Дисковое пространство внутри данного узла. Например, на учетверенном узле
все процессоры узла (1-4) достигают один и тот же локальный диск.
- local memory:
- The memory associated with a single CPU in a multiple CPU architecture, or memory associated with a local node in a distributed system.
- loop unrolling:
- An optimization technique valid for both scalar and vector architectures.
The iterations of an inner loop are decreased by a factor of two or
more by explicit inclusion of the very next one or several iterations.
Loop unrolling can allow traditional compilers to make better use of
the registers and to improve overlap operations. On vector machines
loop unrolling may either improve or degrade performance, and the
process involves a tradeoff between overlap and register use on one
hand and vector length on the other.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- main memory:
- A level of random access memory that lies between
cache or register memory and extended random
access memory. Main memory has higher capacity, but it is slower than cache
or registers and has less capacity but faster access than extended random
access memory.
- mass storage:
- An external storage device capable of
storing
large amounts of data. Online, directly accessible disk space is
limited on
most systems; mass storage systems provide additional space that is
slower
and more difficult to access, but can be virtually unlimited in size.
- master-worker:
- A programming approach in which one process, designated the "master," assigns tasks to other processes known as "workers."
- megabyte (MB):
- 2**20 (hex 100 000) bytes of data, i.e. 1,048,576 bytes.
- member:
- Members are grouped first into granules and then grains. In a finite difference problem, the members are grid points.
- memory:
- See cache and main memory.
- передача сообщений:
- Парадигма коммуникации, в которой процессы поддерживают связь посредством обмена сообщениями
через коммуникационные каналы.
- MIMD:
- (Multiple Instruction stream, Multiple Data stream) ---
несколько потоков команд и несколько потоков данных --- архитектура, в которой
несколько потоков команд выполняются одновременно. Каждая единственная
команда может распоряжаться несколькими элементами данных (например, одним или более векторами
на векторной машине). Хотя одно-процессорный векторный компьютер способен
работать по методу MIMD из-за пересекающихся функциональных единиц, терминологию MIMD
используют более обще относя ее к многопроцессорным машинам.
См. также SIMD, SISD.
- MIPS:
- Millions of instructions per second.
- MOPS:
- Millions of operations per second.
- MPI:
- Message Passing Interface, a de facto
standard for communication among the nodes running a parallel program
on a distributed memory system. MPI is a library of routines that can
be called from both Fortran and C programs. MPI's advantage over older
message passing libraries is that it is both portable (because MPI has
been implemented for almost every distributed memory architecture) and
fast (because each implementation is optimized for the hardware it runs
on).
- MPI/Pro:
-
A commercial, supported version of MPI that runs extremely well on Windows, supports SMP,
TCP and VIA interconnects seamlessly and adhears to the MPI standard so MPI applications from
other platforms should work without modification.
- multiprocessing:
- The ability of a computer to intermix jobs on one or more CPUs.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- NAg:
- NAg is a company that provides a set of libraries for the solution of
numerical and statistical problems. The C and Fortran libraries are
planned for installation on the CTC Complex.
- nearest neighbor:
- A computer architecture that involves a connectivity which can be interpreted as that between adjacent members in geometric space.
- узел:
- Один из индивидуальных компьютеров, связанных вместе, чтобы сформировать параллельную систему. Компьютер может иметь множество процессоров, которые разделяют (совместно используют) систему ресурсов, таких как диск, память и сетевой интерфейс.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- object-oriented programming:
- Style of
programming characterized by the use of separate "objects" to perform
different tasks within a program. These "objects" usually consist of an
abstract data type or class, along with the methods and procedures used
to manipulate that abstract data type.
- openMP:
- A set of compiler directives for C/C++ and FORTRAN which allow the
programmer to expose thread-level parallelism in an architecture-independent
fashion. OpenMP may make it easier to use in-box multiprocessing without
using an OS-specific thread library.
- optimization:
- The act of tuning a program
to achieve the fastest possible performance on the system where it is
running. There are tools available to help with this process, including
optimization flags on compiler commands and optimizing preprocessors
such as KAP from Kuck & Associates. You can also optimize a program
by hand, using profiling tools to identify "hot spots" where the
program spends most of its execution time. Optimization requires
repeated test runs and comparison of timings, so it is usually
worthwhile only for production programs that will be rerun many times.
- overhead:
- There are four contributions to the overhead, f, defined so that the speedup = number of nodes / (1 + f). The communication overhead and load balance contributions are also defined in this glossary. There are also algorithmic and software contributions to the overhead.
-
A B C
D E F G H I J K L M N O P Q R S T U V W X Y
Z
- P4:
- Пакет макро/подпрограмм для параллельного программирования,
разработанный Rusty Lusk lusk@anta.mcs.anl.gov. P4 использует мониторы на
машинах с разделяемой (коллективной) памятью и передачу сообщений на машинах с распределенной
памятью. Он употребляется как библиотека подпрограмм для C и Fortran. Улучшение "Argonne macros" (PARMACS).
- parallel processing:
- Processing with more than one CPU on a single application simultaneously.
- parallelization:
- The process of achieving a high percentage of the CPU time expended in parallel; minimizing idle CPU time in a parallel processing environment. For one program, parallelization refers to the splitting of program execution among many CPUs.
- partitioning:
- Restructuring a program or algorithm in semi-independent computational segments to take advantage of multiple CPUs
simultaneously. The goal is to achieve roughly equivalent work in each
segment with minimal need for intersegment communication. It is also
worthwhile to have fewer segments than CPUs on a dedicated system.
- petabyte:
- 2**50 (hex 4 000 000 000 000) bytes of data, i.e. 1,125,899,906,842,620 bytes.
- pipelining:
- Pipelining is the decomposition of a computation into operations which may
then be executed concurrently. Pipelining increases the utilization of
functional units in the processor, leading to greater computational
throughput. Common microprocessors have between four and twenty pipeline
stages.
- polled communication:
- Polling involves a node
inspecting the communication hardware -- typically a flag bit -- to see
if information has arrived or departed. Polling is an alternative to an
interrupt-driven system and is typically the basis for implementing the
crystalline operating systems. The natural synchronization of the nodes imposed by polling is used in the implementation of blocking communication primitives.
- preprocessor:
- Software that takes a given source code and transforms it
into an equivalent program in the same language. In theory, after compilation the
transformed code should run faster than the original code--but this always happen in
practice, so one should verify. A preprocessor tries to improve performance by
applying a series of transformation rules designed to optimize loops based on the
processor's
cache
size; minimize memory
stride;
perform
loop unrolling;
etc.
- primary memory:
- Main memory accessible by the CPU(s) without using input/output processes.
- процесс:
- Задача, исполняемая на данном процессоре в данное время.
- процессор:
- Часть компьютера, которая реально исполняет ваши команды.
Также известна как центральное процессорное устройство или ЦПУ.
- PVM:
- (Parallel Virtual Machine) -- Параллельная виртуальная машина ,
библиотека передачи сообщений и множество средств, используемых, чтобы создать и исполнить
одновременные или параллельные приложения.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- No Q entries yet
-
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- RAID:
- Redundant array of inexpensive
disks; a file system containing many disks, some of which are used to
hold redundant copies of data or error correction codes to increase
reliability. RAIDS are often used as parallel access file systems,
where the sheer size of storage capacity required precludes using more
conventional (but more expensive) disk technology.
- recurrence:
- A dependency
in a DO-loop whereby a result depends upon completion of the previous
iteration of the loop. Such dependencies inhibit vectorization. For
example:
A(I) = A(I-1) + B(I)
In a loop on I, this process would not be vectorizable on most vector
computers without marked degradation in performance. This is not an
axiom or law, but rather is simply a fact resulting from current
machine design.
- reduced instruction set computer (RISC):
- A
philosophy of instruction set design where a small number of simple,
fast instructions are implemented rather than a larger number of
slower, more complex instructions.
- rendering:
- The process of turning mathematical data into a picture or graph.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- scalably parallel:
- All nodes are an equal communication distance apart and multiple routes avoid bottlenecks.
- scattered decomposition:
- A technique for decomposing data domains that involves scattering, or sprinkling, the elements of the domain over the nodes
of the concurrent processor. This technique is used when locality,
which is preserved by the alternate decomposition into connected
domains often called domain decomposition, is less important than the
gain in load balance obtained by associating each node with all parts of the domain.
- scheduler:
- The part of a computer system's software that determines which task is assigned
to each system resource at any given time. In a batch system such as
Cluster CoNTroller System
that maintains a queue of jobs waiting to run, the scheduler
determines when each job can start based on such criteria as the order in which
job requests were submitted and the availability of the system resources needed by each job.
- sequential execution:
- See serial processing. Parallel programs may have sections that must be executed sequentially; see critical section.
- serial processing:
- Running an application on a single CPU.
- shared memory:
- A memory that is directly accessed by more than one node of a concurrent processor. Shared memory and distributed memory are two major architectures that require very different programming styles.
- SIMD:
- (Single Instruction stream, Multiple Data stream) --- один поток команд и несколько потоков данных ---
архитектура, которой характеризуется большая часть векторных
компьютеров. Один поток команд запускает процесс, который устанавливает движение
потоков данных и результатов. Этот термин также применим к
параллельным процессорам,
у которых один поток команд вызывает синхронное выполнение одной и той же операции
на более чем одном процессоре, даже на различных кусках данных (например,
ILLIAC). См. также MIMD, SISD,
и SPMD.
- SISD:
- (Single Instruction stream, Single Data stream) --- один поток команд и один поток данных ---,
архитектура традиционного компьютера,
в котором каждая команда (из одного потока команд) имеет дело с точно определенными
элементами данных или парой операндов скорее чем с "потоками" данных.
См. также SIMD и MIMD.
- SMP:
- Symmetric Multi-Processor, a
shared memory system.
- SP:
- Scalable POWERparallel System, a distributed memory machine from IBM. It consists of nodes (RS/6000 processors with associated memory and disk) connected by an ethernet and by a high-performance switch.
- speedup:
- A measure of how much faster a given program runs when
executed in parallel on several processors as compared to serial
execution on a single processor. Speedup is defined as S = sequential
run time / parallel run time.
- SPMD:
- (Single Program, Multiple Data stream) --- одна программа и несколько потоков данных ----
обобщение параллельного программирования данных SIMD,
SPMD далее ослабляет обязательства по синхронизации,
которые ограничивают функционирование различных процессов, и просто
явно определяет какую программу все процессы должны запустить, но не какую
команду каждый вынужден выполнять в каждый отдельный момент времени.
Распределение данных, тем не менее, является все еще ключевой концепцией;
в реальности, другой, обычно используемый для SPMD, термин есть
декомпозиция данных
--- это косвенно указывает на тот факт, что собираются декомпозировать всеобъемлющий набор данных,
при выполнении на каждом участвующем процессе одного и того же кода.
- square decomposition:
- A strategy in which the array of nodes is decomposed into a two-dimensional mesh; we can then define scattered or local (domain) versions of this decomposition.
- Standard I/O:
- See I/O.
- startup time:
- See latency.
- stride:
- A term derived from the concept of
walking (striding) through the data from one noncontiguous location to
the next. If data are to be accessed as a number of evenly spaced,
discontiguous blocks, then stride is the distance between the
beginnings of successive blocks. For example, consider accessing rows
of a column-stored matrix. The rows have elements that are spaced in
memory by a stride of N, the dimension of the matrix.
- striping:
- Another technique for avoiding
serialized I/O. In this case, the idea is to have each node write out
its own portion of data into its own file. This is particularly good if
checkpointing is what is really desired, because having each node's
state saved in its own file is actually what you want.
- superlinear:
- Greater than linear; usually
used in reporting results on parallel computers where the processing
speed increases more rapidly than does the number of processors.
Example: A job taking 16 hours on a one-processor machine may take only
a half-hour on a 16-processor machine instead of one hour (from linear speedup).
Upon close examination such examples are seen to be the result of
algorithmic improvements. A demonstration that this is the case is
easily described. Consider a uni-processor that simulates a
multiple-processor machine (such as the 16-processor machine in the
example cited). The uni-processor simulates the first cycle of each of
the 16 parallel machines. It then simulates the second of each of the
parallel machines, and so on until the program concludes. Clearly,
(ignoring overhead) this takes
16 times as long as would be the case on a 16-processor machine of the
same cycle time. Thus, any speedup seen by the 16-processor machine can
be simulated at one-sixteenth speed by the one-processor machine. Speed
increase is at most and at best linear in the number of processors.
- superscalar:
- A type of microprocessor
design incorporating multiple functional units together with sufficient
hardware complexity to allow units to function relatively autonomously.
This type of design provides opportunities for concurrent operations to
take place during a single clock cycle.
- switch:
- A high bandwidth data transmission device used to communicate between different nodes on an IBM SP.
- синхронизация:
- Акт приведения известных точек
выполнения у двух или более процессов к одному и тому же заданному
моменту времени. Явная синхронизация не нужна в программах
с архитектурой SIMD (в которой каждый
процессор выполняет ту же самую операцию, как и любой другой, или не делает ничего),
но часто необходима в программах с архитектурами
SPMD и MIMD.
Время, затрачиваемое процессами в ожидании других процессов для синхронизации с ними,
может быть основным источником неэффективности в параллельных программах.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- task:
- The basic unit of work to be performed by the computer. A
program typically consists of a number of tasks, such as reading data,
performing calculations, and writing output. A common way to
parallelize a program is to divide up the tasks and assign one task to
each processor or node. For this reason, the term "task" is sometimes
used interchangeably with processor or node in discussions of parallel
programming.
- T1:
- Network transmissions of a DS1 formatted digital signal at a rate of 1.5 Mb/s
- T3:
- Network transmissions of a DS3 formatted digital signal at a rate of 45 Mb/s
- TCP/IP:
- Transmission Control
Protocol/Internet Protocol, the protocol used for communications on the
internet. TCP/IP software includes the telnet and ftp commands.
- terabyte:
- 2**40 (hex 10,000,000,000) bytes of data, i.e. 1,099,511,627,776 bytes.
- teraflop:
- A processor speed of one trillion (10**12) floating point operations per second.
- token:
- When you login to a machine in AFS
you must be issued a "token" in order to become an authenticated AFS
user. This token is issued to you automatically at login when you enter
your password. Every token has an expiration date associated with it;
on CTC machines, tokens are set to expire after 100 hours. To see what
tokens you have, enter the command "tokens"
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- UNIX:
- An operating system originally
developed by AT&T which, in various incarnations, is now available
on most types of supercomputer.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- vector:
- A computer vector is an array of
numbers with a location prescribed according to a contiguous or random
formula. A mathematical vector is a set of numbers or components with
no conditions on their retrievability from computer memory. (See also stride.)
- vector processing:
- The practice of using computers that can process multiple vectors or arrays. Modern supercomputers achieve speed through pipelined
arithmetic units. Pipelining, when coupled with instructions designed
to process multiple vectors, arrays or numbers rather than one data
pair at a time, leads to great performance improvements.
- vectorization:
- The act of tuning an application code to take advantage of vector architecture.
- virtual concurrent processor (aka virtual machine):
- The
virtual concurrent processor is a programming environment which allows
the user a hardware-independent, portable programming environment
within the message passing paradigm. The virtual machine is composed of
virtual nodes which correspond to individual processes; there may be
several processes or virtual nodes on the node of a real computer.
- visualization:
- In the broadest sense,
visualization is the art or science of transforming information to a
form "comprehensible" by the sense of sight. Visualization is broadly
associated with graphical display in the form of pictures (printed or
photo), workstation displays, or video.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- Windows 2000, W2K:
- This self-contained operating system with a graphical user interface was
developed by Microsoft Corporation. This operating system, sometimes
called W2K, is currently running on all CTC login and compute nodes.
- wormhole routing:
- A technique for routing
messages in which the head of the message establishes a path, which is
reserved for the message until the tail has passed through it. Unlike
virtual cut-through, the tail proceeds at a rate dictated by the
progress of the head, which reduces the demand for intermediate
buffering.
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- No X entries yet
-
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- No Y entries yet
-
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
- No Z entries yet
-
-
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z