Entries |
Document | Title | Date |
20080244221 | EXPOSING SYSTEM TOPOLOGY TO THE EXECUTION ENVIRONMENT - Embodiments of apparatuses, methods, and systems for exposing system topology to an execution environment are disclosed. In one embodiment, an apparatus includes execution cores and resources on a single integrated circuit, and topology logic. The topology logic is to populate a data structure with information regarding a relationship between the execution cores and the resources. | 10-02-2008 |
20090055624 | CONTROL OF PROCESSING ELEMENTS IN PARALLEL PROCESSORS - The present invention relates to the control of an array of processing elements in a parallel processor using row and column select lines. For each column in the array, a column select line connects to all of the processing elements in the column. For each row in the array, a row select line connecting to all of the processing elements in the row. A processing element in the array may be selected by activation of its row and column select lines. | 02-26-2009 |
20090094436 | ULTRA-SCALABLE SUPERCOMPUTER BASED ON MPU ARCHITECTURE - The invention provides an ultra-scalable supercomputer based on MPU architecture in achieving the well-balanced performance of hundreds of TFLOPS or PFLOPS range in applications. The supercomputer system design includes the interconnect topology and its corresponding routing strategies, the communication subsystem design and implementation, the software and hardware schematic implementations. The supercomputer comprises a plurality of processing nodes powering the parallel processing and Axon nodes connecting computing nodes while implementing the external interconnections. The interconnect topology can be based on MPU architecture and the communication routing logic as required by switching logics is implemented in the FPGA chips while some modular designs for accelerating particular traffic patterns from applications and meliorating the communication overhead are able to be deployed as well. | 04-09-2009 |
20090158007 | SCALEABLE ARRAY OF MICRO-ENGINES FOR WAVEFORM PROCESSING - A system for implementing waveform processing in a software defined radio (SDR) includes a scaleable array processor having a plurality of micro-engines (MEs) interconnected by a two dimensional topology. Each micro-engine includes multiple FIFOs for interconnecting to each other in the two dimensional topology. One micro-engine communicates with another adjacent micro-engine by way of the respective FIFOs. The micro-engines are dedicated to predetermined algorithms. The two dimensional topology includes an array of N×M micro-engines interconnected by the multiple FIFOs. The N×M are integer numbers of rows and columns, respectively, in the array of micro-engines. The micro-engines are dedicated to baseband processing of data for RF transmission or RF reception. | 06-18-2009 |
20090193225 | SYSTEM AND METHOD FOR APPLICATION SPECIFIC ARRAY PROCESSING - A processing architecture and methods therein for building application specific array processing utilizing a sequential data bus for control and data propagation. The methods of array processing provided by the architecture allows for numerical analysis of large numerical data such as simulation, image processing, computer modeling or other numerical functions. The architecture is unlimited in scalability and facilitates mixed mode processing of idealized, analytical and real data, in conjunction with real time input and output. | 07-30-2009 |
20100100703 | System For Parallel Computing - A system and a method for parallel computing for solving complex problems is envisaged. Particularly, hierarchical parallel computing system is envisaged by this invention, which is formed by multiple levels of groups, where each group consists of multiple processing elements. Each group of the parallel computing system models as processing element to its immediate upper layer. Thus, each processing element is hierarchically tagged to its immediate upper level, and a multi-level tier of groups are formed. In accordance with this invention, the parallel computing system operates by breaking any problem hierarchically, first across the groups and then within the groups. This hierarchical breakup of the problem helps in significantly improving the time required for processing a problem. | 04-22-2010 |
20100161938 | System-On-A-Chip Supporting A Networked Array Of Configurable Symmetric Multiprocessing Nodes - An integrated circuit having an array of programmable processing elements linked by an on-chip communication network. Each processing element includes a plurality of processing cores, a local memory, and thread scheduling means for scheduling execution of threads on the processing cores of the given processing element. The thread scheduling means assigns threads to the processing cores of the given processing element in a configurable manner. The configuration of the thread scheduling means defines one or more logical symmetric multiprocessors for executing threads on the given processing element. A logical symmetric multiprocessor is realized by a defined set of processing cores assigned to a group of threads executing on the given processing element. | 06-24-2010 |
20100174883 | PROCESSOR ARCHITECTURES FOR ENHANCED COMPUTATIONAL CAPABILITY AND LOW LATENCY - A processor includes a compute array comprising a first plurality of compute engines serially connected along a data flow path such that data flows between successive compute engines at successive times. The first plurality of compute engines includes an initial compute engine and a final compute engine. The data flow path includes a recirculation path connecting the final compute engine to the initial compute engine with no compute engine therebetween. | 07-08-2010 |
20110072237 | Methods and apparatus for efficiently sharing memory and processing in a multi-processor - A shared memory network for communicating between processors using store and load instructions is described. A new processor architecture which may be used with the shared memory network is also described that uses arithmetic/logic instructions that do not specify any source operand addresses or target operand addresses. The source operands and target operands for arithmetic/logic execution units are provided by independent load instruction operations and independent store instruction operations. | 03-24-2011 |
20110145544 | MULTI-LEVEL HIERARCHICAL ROUTING MATRICES FOR PATTERN-RECOGNITION PROCESSORS - Multi-level hierarchical routing matrices for pattern-recognition processors are provided. One such routing matrix may include one or more programmable and/or non-programmable connections in and between levels of the matrix. The connections may couple routing lines to feature cells, groups, rows, blocks, or any other arrangement of components of the pattern-recognition processor. | 06-16-2011 |
20110161625 | Interconnection network connecting operation-configurable nodes according to one or more levels of adjacency in multiple dimensions of communication in a multi-processor and a neural processor - A Wings array system for communicating between nodes using store and load instructions is described. Couplings between nodes are made according to a 1 to N adjacency of connections in each dimension of a G×H matrix of nodes, where G≧N and H≧N and N is a positive odd integer. Also, a 3D Wings neural network processor is described as a 3D G×H×K network of neurons, each neuron with an N×N×N array of synaptic weight values stored in coupled memory nodes, where G≧N, H≧N, K≧N, and N is determined from a 1 to N adjacency of connections used in the G×H×K network. Further, a hexagonal processor array is organized according to an INFORM coordinate system having axes at 60 degree spacing. Nodes communicate on row paths parallel to an FM dimension of communication, column paths parallel to an IO dimension of communication, and diagonal paths parallel to an NR dimension of communication. | 06-30-2011 |
20110173413 | EMBEDDING GLOBAL BARRIER AND COLLECTIVE IN A TORUS NETWORK - Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure. | 07-14-2011 |
20110213946 | PARALLEL COMPUTING SYSTEM AND COMMUNICATION CONTROL PROGRAM - A parallel computing system includes a plurality of processors multi-dimensionally commented by an interconnection network, wherein each of the processors in the parallel computing system determines, in dimensional order, communication channels to other processors in the interconnection network, each of the processors sets, as relative coordinates of destination processors with respect to the plurality of processors in data communications performed at a same timing, relative coordinates common to all of the processors, and each of the processors performs data communications with destination processors having the set relative coordinates. | 09-01-2011 |
20120017066 | LOW LATENCY MASSIVE PARALLEL DATA PROCESSING DEVICE - Data processing device comprising a multidimensional array of ALUs, having at least two dimension where the number of ALUs in the dimension is greater or equal to 2, adapted to process data without register caused latency between at least some of the ALUs in the corresponding array. | 01-19-2012 |
20120179893 | Area efficient arrangement of interface devices within an integrated circuit - An integrated circuit is disclosed that comprises: a core comprising logic circuitry: a plurality of interface devices for transmitting signals to and from the processing core, the plurality of interface devices comprising two types of interface devices: one type being a power interface device for delivering power to the core; and a second type being a signal interface device for transmitting data signals between the core and devices external to the integrated circuit; wherein the plurality of interface devices are arranged in two rows, an outer row towards an outer edge of the core and an inner row within the outer row closer to a centre of the core the inner row comprising one of the two types of interface devices and the outer row comprising an other of the two types of interface devices. | 07-12-2012 |
20120191945 | Processor Architecture With Switch Matrices For Transferring Data Along Buses - There is described a processor architecture, comprising: a plurality of first bus pairs, each first bus pair including a respective first bus running in a first direction (for example, left to right) and a respective second bus running in a second direction opposite to the first direction (for example right to left); a plurality of second bus pairs, each second bus pair including a respective third bus running in a third direction (for example downwards) and a respective fourth bus running in a fourth direction opposite to the third direction (for example upwards), the third and fourth buses intersecting the first and second buses; a plurality of switch matrices, each switch matrix located at an intersection of a first and a second pair of buses; a plurality of elements arranged in an array, each element being arranged to receive data from a respective first or second bus, and transfer data to a respective first or second bus. The elements in the array include processing elements, for operating on received data, and memory elements, for storing received data. The described architecture has the advantage that it requires relatively little memory, and the memory requirements can be met by local memory elements in the array. | 07-26-2012 |
20120216012 | SEQUENTIAL PROCESSOR COMPRISING AN ALU ARRAY - The present invention discloses a single chip sequential processor comprising at least one ALU-Block wherein said sequential processor is capable of maintaining its op-codes while processing data such as to overcome the necessity of requiring a new instruction in every clock cycle. | 08-23-2012 |
20120216013 | EFFICIENT AND SCALABLE MULTI-VALUE PROCESSOR AND SUPPORTING CIRCUITS - Briefly, an efficient and scalable processor device is disclosed that uses multi-value voltages for operands, results, and signaling. An array of cells is arranged in rows and columns, and one or more multi-value operands are used to select a cell from the array. A row driver may be used to select a row of cells, and a column driver is used to select a particular column from the selected row. Once a particular cell is selected, a voltage value associated with that cell is passed as an output, which is typically a multi-value result. The multi-value processor is constructed such that the row driver and column driver can be substantially identical, and have a structure that enables significant circuit reuse, provides substantial reduction in size for a circuit layout, has increased layout symmetry, simple scalability, and advantageous power conservation. | 08-23-2012 |
20120331268 | RECONFIGURABLE PROCESSOR ARCHITECTURE - A reconfigurable data processor architecture. The processor architecture includes: a first plurality of data processing elements, each having a respective synchronization unit, a data link structure adapted for dynamically interconnecting a number of the data processing elements, at least one configuration register, and at least one control unit in operative connection with the configuration register for controlling a contents thereof, wherein, based on the contents, the first plurality of data processing elements is adapted for temporarily constituting at runtime at least one group of one or more of said data processing elements from said first plurality of data processing elements dynamically via the data link structure. The synchronization units are adapted for synchronizing data processing by individual data processing elements within the group. The first plurality of data processing elements may be reconfigurably grouped and thus adapted to various data processing tasks at runtime. This increases data processing efficiency. | 12-27-2012 |
20130111188 | LOW LATENCY MASSIVE PARALLEL DATA PROCESSING DEVICE | 05-02-2013 |
20130198487 | DATA PROCESSING APPARATUS AND METHOD FOR DECODING PROGRAM INSTRUCTIONS IN ORDER TO GENERATE CONTROL SIGNALS FOR PROCESSING CIRCUITRY OF THE DATA PROCESSING APPARATUS - A data processing apparatus and method for accessing operands stored within a set of registers. Instruction decoder circuitry, responsive to program instructions, generates register access control signals identifying for each program instruction which registers in the register set are to be accessed by the processing circuitry when performing the processing operation specified by that program instruction. The set of registers are logically arranged as a plurality of register groups, with each register in the set being a member of more than one register group. Each program instruction includes a register specifier field, and instruction decoder circuitry is responsive to each program instruction to determine a selected register group, and to determine one or more selected members of that selected register group. The instruction decoder circuitry then outputs register access control signals identifying the register corresponding to each selected member of the selected register group. | 08-01-2013 |
20140244971 | ARRAY OF PROCESSOR CORE CIRCUITS WITH REVERSIBLE TIERS - Embodiments of the invention relate to an array of processor core circuits with reversible tiers. One embodiment comprises multiple tiers of core circuits and multiple switches for routing packets between the core circuits. Each tier comprises at least one core circuit. Each switch comprises multiple router channels for routing packets in different directions relative to the switch, and at least one routing circuit configured for reversing a logical direction of at least one router channel. | 08-28-2014 |
20140359254 | Logical cell array and bus system - A logic cell array having a number of logic cells and a segmented bus system for logic cell communication, the bus system including different segment lines having shorter and longer segments for connecting two points in order to be able to minimize the number of bus elements traversed between separate communication start and end points. | 12-04-2014 |
20140359255 | Coarse-Grained Data Processor Having Both Global and Direct Interconnects - A data processor having a plurality of coarse-grained data processing elements arranged in rows and columns, an interconnect structure comprising both global and direct interconnects, the global interconnects interconnecting the coarse-grained data processing elements globally and the direct interconnects interconnecting adjacent data processing elements. | 12-04-2014 |
20150039855 | METHODS AND APPARATUS FOR SIGNAL FLOW GRAPH PIPELINING THAT REDUCE STORAGE OF TEMPORARY VARIABLES - A system for pipelining signal flow graphs by a plurality of shared memory processors organized in a 3D physical arrangement with the memory overlaid on the processor nodes that reduces storage of temporary variables. A group function formed by two or more instructions to specify two or more parts of the group function. A first instruction specifies a first part and specifies control information for a second instruction adjacent to the first instruction or at a pre-specified location relative to the first instruction. The first instruction when executed transfers the control information to a pending register and produces a result which is transferred to an operand input associated with the second instruction. The second instruction specifies a second part of the group function and when executed transfers the control information from the pending register to a second execution unit to adjust the second execution unit's operation on the received operand. | 02-05-2015 |
20150106589 | SMALL FORM HIGH PERFORMANCE COMPUTING MINI HPC - A computing platform comprising a small form factor high performance computer for mobile high performance computing is provided. The computing platform comprises using small form factor design with a 64-core microprocessor/co-processor is provided. The small form factor high performance computer may include 64-core microprocessor/co-processors based on the ANNI Stem Cell HPC multicore datacenter chipset cluster of REMTEC. | 04-16-2015 |
20150356055 | EXECUTION ENGINE FOR EXECUTING SINGLE ASSIGNMENT PROGRAMS WITH AFFINE DEPENDENCIES - The execution engine is a new organization for a digital data processing apparatus, suitable for highly parallel execution of structured fine-grain parallel computations. The execution engine includes a memory for storing data and a domain flow program, a controller for requesting the domain flow program from the memory; and further for translating the program into programming information, a processor fabric for processing the domain flow programming information and a crossbar for sending tokens and the programming information to the processor fabric. | 12-10-2015 |
20170235580 | SYSTEM FOR SPECULATIVE EXECUTION EVENT COUNTER CHECKPOINTING AND RESTORING | 08-17-2017 |
20190146802 | INFORMATION PROCESSING APPARATUS, ARITHMETIC PROCESSING APPARATUS, AND CONTROL METHOD FOR INFORMATION PROCESSING APPARATUS | 05-16-2019 |