Class / Patent application number | Description | Number of patent applications / Date published |
712203000 | Multiprocessor instruction | 23 |
20080209172 | Selective hardware lock disabling - Controlling a reorder buffer (ROB) to selectively perform functional hardware lock disabling (HLD) is described. One apparatus embodiment includes a unit to enable an ROB to selectively disable a lock upon Identifying a lock acquire operation (LAO) associated with a critical section (CS) entry point, a unit to selectively retire the LAO, a unit to cause the ROB to selectively disable the lock, and a unit to snoop a buffer. The apparatus may, based on the snooping, selectively abort a transaction associated with the CS. | 08-28-2008 |
20080229065 | Configurable Microprocessor - A configurable microprocessor which combines a plurality of corelets into a single microprocessor core to handle high computing-intensive workloads. The process first selects two or more corelets in the plurality of corelets. The process combines resources of the two or more corelets to form combined resources, wherein each combined resource comprises a larger amount of a resource available to each individual corelet. The process then forms a single microprocessor core from the two or more corelets by assigning the combined resources to the single microprocessor core, wherein the combined resources are dedicated to the single microprocessor core, and wherein the single microprocessor core processes instructions with the dedicated combined resources. | 09-18-2008 |
20080263325 | SYSTEM AND STRUCTURE FOR SYNCHRONIZED THREAD PRIORITY SELECTION IN A DEEPLY PIPELINED MULTITHREADED MICROPROCESSOR - A microprocessor and system with improved performance and power in simultaneous multithreading (SMT) microprocessor architecture. The microprocessor and system includes a process wherein the processor has the ability to select instructions from one thread or another in any given processor clock cycle. Instructions from each, thread may be assigned selection priorities at multiple decision points in a processor in a given cycle dynamically. The thread priority is based on monitoring performance behavior and activities in the processor. In the exemplary embodiment, the present invention discloses a microprocessor and system for synchronizing thread priorities among multiple decision points throughout the micro-architecture of the microprocessor. This system and method for synchronizing thread priorities allows each thread priority to he in sync and aware of the status of other thread priorities at various decision points within the microprocessor. | 10-23-2008 |
20090089546 | Multiple multi-threaded processors having an L1 instruction cache and a shared L2 instruction cache - In general, in one aspect, the disclosure describes a processor that includes an instruction store to store instructions of at least a portion of at least one program and multiple engines coupled to the shared instruction store. The engines provide multiple execution threads and include an instruction cache to cache a subset of the at least the portion of the at least one program from the instruction store, with different respective portions of the engine's instruction cache being allocated to different respective ones of the engine threads. | 04-02-2009 |
20090198960 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT SUPPORT PARTIAL CACHE LINE READS - According to a method of data processing in a multiprocessor data processing system, in response to a processor request to read a target granule of a target cache line of data containing multiple granules, a processing unit originates on an interconnect of the multiprocessor data processing system a partial read request that requests permission to read only the target granule of the target cache line. In response to a combined response to the partial read request indicating success, the combined response representing a system-wide response to the partial read request, the processing unit receives the target granule of the target cache line, supplies the target granule to a requesting processor core, and updates a coherency state of the target granule while retaining a coherency state of at least one other granule of the target cache line. | 08-06-2009 |
20090198961 | Cross-Thread Interrupt Controller for a Multi-thread Processor - An interrupt controller for a dual thread processor has for a first thread, an interrupt request register accessible to the second thread, an interrupt count accessible to the second thread, and an interrupt acknowledge accessible to the first thread. Additionally, the interrupt controller has, for a second thread, an interrupt request register accessible to the first thread, an interrupt count accessible to the first thread, and an interrupt acknowledge accessible to the second thread. Each interrupt controller separately has a counter for each request which increments upon assertion of a request and decrements upon assertion of an acknowledgement. | 08-06-2009 |
20090235050 | DYNAMIC SERVICE MANAGEMENT FOR MULTICORE PROCESSORS - A system, apparatus, method and article to perform dynamic service management for multicore processors are described. The apparatus may include, for example, a processing device having multiple types of processors to process packets. A service manager may dynamically assign executable files for multiple services to the multiple types of processors during execution of the executable files based on packets processed for each service. Other embodiments are described and claimed. | 09-17-2009 |
20090319757 | ARCHITECTURE FOR LOCAL PROGRAMMING OF QUANTUM PROCESSOR ELEMENTS USING LATCHING QUBITS - An architecture for a quantum processor may include a set of superconducting flux qubits operated as computation qubits and a set of superconducting flux qubits operated as latching qubits. Latching qubits may include a first closed superconducting loop with serially coupled superconducting inductors, interrupted by a split junction loop with at least two Josephson junctions; and a clock signal input structure configured to couple clock signals to the split junction loop. Flux-based superconducting shift registers may be formed from latching qubits and sets of dummy latching qubits. The devices may include clock lines to clock signals to latch the latching qubits. Thus, latching qubits may be used to program and configure computation qubits in a quantum processor. | 12-24-2009 |
20100005274 | VIRTUAL FUNCTIONAL UNITS FOR VLIW PROCESSORS - A virtual functional unit design is presented that is employed in a statically scheduled VLIW processor “Virtual” views of the function unit appear to the processor scheduler that exceed the number of physical instantiations of the functional unit. As a result, significant processor performance improvements can be achieved for those types of functional units that are too difficult or too costly to physically duplicate. By providing different virtual views to the different clusters of a VLIW processor, the compiler/scheduler can generate more efficient code for the processor, than a processor without virtual views and the physical unit restricted to a subset of the processor's clusters. The compiler/scheduler guarantees that the restrictions with respect to scheduling of operations for functional units with multiple virtual views is met. NON-clustered processors also benefit from virtual views. By providing multiple virtual views in multiple issue slots of a physical function unit, the compiler/scheduler has more freedom to schedule operations for the functional unit. | 01-07-2010 |
20100005275 | MULTIPROCESSING SYSTEM - A multiprocessing system includes a storage part that stores to a memory, a first operating system (OS) task set that is constituted by a combination of a first task and a first OS corresponding to the first task, the first task being designated by an execution instruction; and a task executing part that refers to the first OS task set stored to the memory, loads the OS constituting the first OS task set, and executes the first task designated by the execution instruction. | 01-07-2010 |
20100122065 | System and Method for Large-Scale Data Processing Using an Application-Independent Framework - A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values. The reduce operator is applied by the application-independent reduce modules to process the intermediate data values to produce final output data. | 05-13-2010 |
20100185833 | MULTIPROCESSOR CONTROL APPARATUS, MULTIPROCESSOR CONTROL METHOD, AND MULTIPROCESSOR CONTROL CIRCUIT - An object of the invention is to reduce the electric power consumption resulting from temporarily activating a processor requiring a large electric power consumption, out of a plurality of processors. A multiprocessor system ( | 07-22-2010 |
20100228951 | PARALLEL PROCESSING MANAGEMENT FRAMEWORK - The present disclosure includes a management framework system for processing a parallel task. The framework includes a job package, a job submitter, task trackers, communicators, a plurality of processors, and a node service. The job package has a bundle of implementations defined by a user and an input data domain. The job submitter module has a splitter interface and a reducer interface. The job submitter is configured to split the input data domain into a plurality of sub-data domains. In addition, the job submitter module is configured to send and receive the plurality of sub-data domains to a plurality of processors. The one or more processors are configured to execute parallel tasks on sub-data domains. The management framework separates user-defined applications from parallel execution such that user-implementations are separated from management framework implementations. | 09-09-2010 |
20100318768 | CHANNEL-BASED RUNTIME ENGINE FOR STREAM PROCESSING - An apparatus, including a memory device for storing a program, and a processor in communication with the memory device, the processor operative with the program to facilitate design of a stream processing flow that satisfies an objective, wherein the stream processing flow includes at least three processing groups, wherein a first processing group includes a data source and an operator, a second processing group includes a data source and an operator and a third processing group includes a join operator at its input and another operator, wherein data inside each group is organized by channels and each channel is a sequence of data, wherein an operator producing a data channel does not generate new data for the channel until old data of the channel is received by all other operators in the same group, and wherein data that flows from the first and second groups to the third group is done asynchronously and is stored in a queue if not ready for processing by an operator of the third group, and deploy the stream processing flow in a concurrent computing system to produce an output. | 12-16-2010 |
20110055522 | REQUEST CONTROL DEVICE, REQUEST CONTROL METHOD AND ASSOCIATED PROCESSORS - A request control device, request control method, and a multiprocessor cooperation architecture. The request control device is connected to a request storage module and includes a comparing means and an identifier means. The comparing means is configured to determine if an incoming first queue unit corresponds to the same message with a queue unit that has existed in the request storage module. The identifier setting means is configured to set a save identifier of the queue unit that has existed in the request storage module to indicate not to save a state associated with the message if the first queue unit corresponds to the same message with the queue unit that has existed in the request storage module. According to the technical solution of the invention, the access to the memory caused by saving/loading the states is reduced and thereby increases the processing speed of the processor. | 03-03-2011 |
20110125986 | Reducing inter-task latency in a multiprocessor system - A method of reducing inter-task latency for software comprising a sequence of instructions including a synchronous remote procedure call to be executed on a multiprocessor system comprising a calling processor and at least one remote engine. The method comprises the steps of: inputting the software; inputting a runtime resource description describing a runtime environment of the multiprocessor system; identifying the synchronous remote procedure call in the sequence of instructions; replacing the synchronous remote procedure call in the sequence of instructions with an initiation instruction and a wait instruction to generate a substitute sequence of instructions; reordering the substitute sequence of instructions with reference to the runtime resource description and the dependencies to generate a reordered sequence of instructions; and outputting the reordered sequence of instructions. | 05-26-2011 |
20120066477 | ADVANCED PROCESSOR WITH MECHANISM FOR PACKET DISTRIBUTION AT HIGH LINE RATE - An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner. | 03-15-2012 |
20120131310 | Methods And Apparatus For Independent Processor Node Operations In A SIMD Array Processor - A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions. | 05-24-2012 |
20120179896 | METHOD AND APPARATUS FOR A HIERARCHICAL SYNCHRONIZATION BARRIER IN A MULTI-NODE SYSTEM - A hierarchical barrier synchronization of cores and nodes on a multiprocessor system, in one aspect, may include providing by each of a plurality of threads on a chip, input bit signal to a respective bit in a register, in response to reaching a barrier; determining whether all of the plurality of threads reached the barrier by electrically tying bits of the register together and “AND”ing the input bit signals; determining whether only on-chip synchronization is needed or whether inter-node synchronization is needed; in response to determining that all of the plurality of threads on the chip reached the barrier, notifying the plurality of threads on the chip, if it is determined that only on-chip synchronization is needed; and after all of the plurality of threads on the chip reached the barrier, communicating the synchronization signal to outside of the chip, if it is determined that inter-node synchronization is needed. | 07-12-2012 |
20130019083 | Redundant Transactional MemoryAANM Cain, III; Harold W.AACI HartsdaleAAST NYAACO USAAGP Cain, III; Harold W. Hartsdale NY USAANM Daly; David M.AACI Croton on HudsonAAST NYAACO USAAGP Daly; David M. Croton on Hudson NY USAANM Ekanadham; KattamuriAACI Mohegan LakeAAST NYAACO USAAGP Ekanadham; Kattamuri Mohegan Lake NY USAANM Huang; Michael C.AACI RochesterAAST NYAACO USAAGP Huang; Michael C. Rochester NY USAANM Moreira; Jose E.AACI IrvingtonAAST NYAACO USAAGP Moreira; Jose E. Irvington NY USAANM Serrano; Mauricio J.AACI BronxAAST NYAACO USAAGP Serrano; Mauricio J. Bronx NY US - A mechanism is provided for redundant execution of a set of instructions. A redundant execution begin (rbegin) instruction to be executed by a first hardware thread on the first processor is identified in the set of instructions. The set of instructions immediately after the rbegin instruction are executed on the first hardware thread and on a second hardware thread. Responsive to both the first processor and the second processor ending execution of the set of instructions, responsive to a first set of cache lines in a first speculative store matching a second set of cache lines in a second speculative store, and responsive to a first set of register states in a first status register matching a second set of register states in a second status register, dirty lines in the first speculative store are committed thereby committing a redundant transaction state to an architectural state. | 01-17-2013 |
20130138923 | MULTITHREADED DATA MERGING FOR MULTI-CORE PROCESSING UNIT - Described herein are methods, systems, apparatuses and products for multithreaded data merging for multi-core central and graphical processing units. An aspect provides for executing a plurality of threads on at least one central processing unit comprising a plurality of cores, each thread comprising an input data set (IDS) and being executed on one of the plurality of cores; initializing at least one local data set (LDS) comprising a size and a threshold; inserting IDS data elements into the at least one LDS such that each inserted IDS data element increases the size of the at least one LDS; and merging the at least one LDS into a global data set (GDS) responsive to the size of the at least one LDS being greater than the threshold. Other aspects are disclosed herein. | 05-30-2013 |
20140181473 | System and Method for Implementing Shared Probabilistic Counters Storing Update Probability Values - The systems and methods described herein may implement probabilistic counters and/or update mechanisms for those counters such that they are dependent on the value of a configurable accuracy parameter. The accuracy parameter value may be adjusted to provide fine-grained control over the tradeoff between the accuracy of the counters and the performance of applications that access them. The counters may be implemented as data structures that include a mantissa portion and an exponent portion that collectively represent an update probability value. When updating the counters, the value of the configurable accuracy parameter may affect whether, when, how often, or by what amount the mantissa portion and/or the exponent portion are updated. Updating a probabilistic counter may include multiplying its value by a constant that is dependent on the value of a configurable accuracy parameter. The counters may be accessible within transactions. The counters may have deterministic update policies. | 06-26-2014 |
20140380020 | SYSTEM AND METHODS FOR SYNCHRONIZATION OF REDUNDANT PROCESSING ELEMENTS - System and methods for synchronizing redundant processing elements are provided. In certain embodiments, a self-checking pair of system on chips (SoCs) includes a first SoC configured to execute a first plurality of instructions; and a second SoC configured to execute a second plurality of instructions that are approximately identical; wherein the first SoC exchanges a first instruction count with the second SoC, the first instruction count identifying a number of instructions executed by the first SoC; wherein the second SoC exchanges a second instruction count with the first SoC, the second instruction count identifying a number of instructions executed by the second SoC; and wherein the first SoC executes a first single step execution utility to synchronize the first instruction count with the second instruction count and the second SoC executes a second single step execution utility to synchronize the first instruction count with the second instruction count. | 12-25-2014 |