Class / Patent application number | Description | Number of patent applications / Date published |
712215000 | Simultaneous issuance of multiple instructions | 60 |
20080263330 | Clocked ports - A processor has an interface portion and an internal environment. The interface portion comprises at least one port. The internal environment comprises an execution unit arranged to execute instructions in dependence on a first timing signal and to transfer data between the interior portion and the at least one port in dependence on the first timing signal; and a thread scheduler for scheduling a plurality of threads for execution by the execution unit, each thread comprising a sequence of instructions and the thread scheduler being arranged to schedule the threads in dependence on the first timing signal. The port is arranged to transfer data between the port and an external environment in dependence on a second timing signal, and to alter a ready signal in dependence on the second timing signal to indicate a transfer of data with the external environment. The thread scheduler is configured to schedule one or more associated threads for execution in dependence on the ready signal. | 10-23-2008 |
20080313433 | PROCESSOR FOR SIMULTANEOUSLY EXECUTING MULTIPLE CONDITIONAL EXECUTION INSTRUCTION GROUPS - A processor is disclosed including several features allowing the processor to simultaneously execute instructions of multiple conditional execution instruction groups. Each conditional execution instruction group includes a conditional execution instruction and a code block specified by the conditional execution instruction. In one embodiment, the processor includes multiple registers for storing marking data pertaining to a number of instructions in each of multiple execution pipeline stages. In another embodiment, the processor includes write enable logic and an execution unit. The write enable logic produces write enable signals dependent upon received attributes, and the execution unit saves results of instructions of conditional execution instruction groups dependent upon the write enable signals. | 12-18-2008 |
20090006816 | Inter-Cluster Communication Network And Heirarchical Register Files For Clustered VLIW Processors - A VLIW processor has a hierarchy of functional unit clusters that communicate through explicit control in the instruction stream and store data in register files at each level of the hierarchy. Explicit instructions transfer values between sub-clusters through a cluster level switch network. Transfer instructions issue in dedicated instruction issue slots in parallel with instructions that perform computation in functional units. The switch network can perform permutations on the data being moved. The switch network enables for operands to be broadcast between the sub-clusters, global register file and memory. | 01-01-2009 |
20090031114 | MULTITHREAD PROCESSOR - To guarantee response time while strictly maintaining the priority specified by software, a processor ( | 01-29-2009 |
20090049279 | THREAD INTERLEAVING IN A MULTITHREADED EMBEDDED PROCESSOR - The present invention provides a network multithreaded processor, such as a network processor, including a thread interleaver that implements fine-grained thread decisions to avoid underutilization of instruction execution resources in spite of large communication latencies. In an upper pipeline, an instruction unit determines an-instruction fetch sequence responsive to an instruction queue depth on a per thread basis. In a lower pipeline, a thread interleaver determines a thread interleave sequence responsive to thread conditions including thread latency conditions. The thread interleaver selects threads using a two-level round robin arbitration. Thread latency signals are active responsive to thread latencies such as thread stalls, cache misses, and interlocks. During the subsequent one or more clock cycles, the thread is ineligible for arbitration. In one embodiment, other thread conditions affect selection decisions such as local priority, global stalls, and late stalls. | 02-19-2009 |
20090070559 | DATA PROCESSING CIRCUIT WHEREIN FUNCTIONAL UNITS SHARE READ PORTS - A data processing circuit comprises a register file ( | 03-12-2009 |
20090089551 | Apparatus and method of avoiding bank conflict in single-port multi-bank memory system - Provided are a method and apparatus for avoiding bank conflict. A first instruction that is one of access instructions that are predicted to cause the bank conflict is replaced with a second instruction by changing an execute timing of the first instruction to a timing prior to the execute timing of the first instruction so as for the access instructions not to cause the bank conflict. Next, a load/store unit that is scheduled to access the bank according to the first instruction accesses the bank and reads out a data from the bank at an execute timing of the second instruction, and after that, the load/store unit is allowed to be inputted the read data at the execute timing of the first instruction. Accordingly, although the access instructions that are predicted to cause the bank conflict are allocated to the load/store units, the bank conflict can be prevented, so that it is possible to avoid deterioration in performance due the occurrence of the bank conflict. | 04-02-2009 |
20090106534 | System and Method for Implementing a Software-Supported Thread Assist Mechanism for a Microprocessor - A system and computer-implementable method for implementing software-supported thread assist within a data processing system, wherein the data processing system supports processing instructions within at least a first thread and a second thread. An instruction dispatch unit (IDU) places the first thread into a sleep mode. The IDU separates an instruction stream for the second thread into at least a first independent instruction stream and a second independent instruction stream. The first independent instruction stream is processed utilizing facilities allocated to the first thread and the second independent instruction stream is processed utilizing facilities allocated to the second thread. In response to determining a result of the processing in the first independent instruction stream requires write back to registers allocated to the second thread, the IDU sets at least one selection bit to enable selective copying of content within registers allocated to the first thread to registers allocated to the second thread. | 04-23-2009 |
20090113181 | Method and Apparatus for Executing Instructions - A method and apparatus for executing instructions in a processor are provided. In one embodiment of the invention, the method includes receiving a plurality of instructions. The plurality of instructions includes first instructions in a first thread and second instructions in a second thread. The method further includes forming a common issue group including an instruction of a first instruction type and an instruction of a second instruction type. The method also includes issuing the common issue group to a first execution unit and a second execution unit. The instruction of the first instruction type is issued to the first execution unit and the instruction of the second instruction type is issued to the second execution unit. | 04-30-2009 |
20090164758 | System and Method for Performing Locked Operations - A mechanism for performing locked operations in a processing unit. A dispatch unit may dispatch a plurality of instructions including a locked instruction and a plurality of non-locked instructions. One or more of the non-locked instructions may be dispatched before and after the locked instruction. An execution unit may execute the plurality of instructions including the non-locked and locked instruction. A retirement unit may retire the locked instruction after execution of the locked instruction. During retirement, the processing unit may begin enforcing a previously obtained exclusive ownership of a cache line accessed by the locked instruction. Furthermore, the processing unit may stall the retirement of the one or more non-locked instructions dispatched after the locked instruction until after the writeback operation for the locked instruction is completed. At some point in time after retirement of the locked instruction, the writeback unit may perform a writeback operation associated with the locked instruction. | 06-25-2009 |
20090172359 | PROCESSING PIPELINE HAVING PARALLEL DISPATCH AND METHOD THEREOF - One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage. | 07-02-2009 |
20090177868 | APPARATUS, SYSTEM, AND METHOD FOR DISCONTIGUOUS MULTIPLE ISSUE OF INSTRUCTIONS - An apparatus, system, and method are disclosed for discontiguous multiple issue of instructions. An assignment unit assigns a plurality of instruction blocks to a plurality of issue units. The plurality of issue units each comprises a renaming map that maps each architecturally visible register address to a rename register. Each issue unit maps each architecturally visible register in the decoded instruction to a register placeholder if the renaming map entry for that architecturally visible register is invalid else maps the architecturally visible register in the decoded instruction to a rename register if the rename register entry is valid. Each issue unit further receives predecessor mapping information from the renaming map of the issue unit's predecessor issue unit in response to the assignment unit identifying a relationship with the predecessor issue unit and the final mapping information being available from the predecessor issue unit. | 07-09-2009 |
20090182987 | Processing Unit Incorporating Multirate Execution Unit - A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput. | 07-16-2009 |
20090204792 | Scalar Processor Instruction Level Parallelism (ILP) Coupled Pair Morph Mechanism - Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times. Separate processor cores may be morphed to appear differently for different applications. For example, two processor cores each capable of executing N-wide issue groups of instructions may be morphed to appear as a single processor core capable of executing 2N-wide issue groups. | 08-13-2009 |
20090210665 | System and Method for a Group Priority Issue Schema for a Cascaded Pipeline - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit. The method, among others, can be broadly summarized by the following steps: receiving an issue group of instructions, reordering the issue group of instructions using instruction type priority, and executing the reordered issue group of instructions in the cascaded delayed execution pipeline unit. | 08-20-2009 |
20090210666 | System and Method for Resolving Issue Conflicts of Load Instructions - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: receive an issue group of instructions; determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; determine if there is a issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one load instruction in a different execution pipeline; and schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit. | 08-20-2009 |
20090210667 | System and Method for Optimization Within a Group Priority Issue Schema for a Cascaded Pipeline - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions, (2) determine the dependency chain depth of all the instructions in the issue group, (3) schedule the instructions in an order of the longest dependency chain depth to shortest dependency chain depth, and (4) execute the issue group of instructions in the cascaded delayed execution pipeline unit. | 08-20-2009 |
20100064121 | DUAL-ISSUANCE OF MICROPROCESSOR INSTRUCTIONS USING DUAL DEPENDENCY MATRICES - A dual-issue instruction is decoded to determine a plurality of LSU dependencies needed by an LSU part of the dual-issue instruction and a plurality of non-LSU dependencies needed by a non-LSU part of the dual-issue instruction. During dispatch of the dual-issue instruction by the microprocessor, the dual dependency matrices are employed as follows: a Load-Store Unit (LSU) dependency matrix is written with the plurality of LSU dependencies and a non-LSU dependency matrix is written with the plurality of non-LSU dependencies; an LSU issue valid (LSU IV) indicator is set as valid to issue; an LSU portion of the dual-issue instruction is issued once the plurality of LSU dependencies of the dual issue instruction are satisfied; a non-LSU issue valid (non-LSU IV) indicator is set as valid to issue; and a non-LSU portion of the dual-issue instruction is issued once the plurality of non-LSU dependencies of the dual issue instruction are satisfied. The LSU dependency matrix and the non-LSU dependency matrix can then be notified that one or more instructions dependent upon the dual-issue instruction may now issue. | 03-11-2010 |
20100131741 | MULTI-CORE MICROCONTROLLER HAVING COMPARATOR FOR CHECKING PROCESSING RESULT - A microcontroller capable of improving processing performance as a whole by executing different programs by a plurality of CPUs and capable of detecting abnormality for safety-required processing by evaluating results of the same processing executed by the plurality of CPUs. A plurality of processing systems including CPUs and memories are provided, data output from the CPUs in each of the processing systems is separately compressed and stored by compressors for each of the CPUs, respectively. The compressed storage data is mutually compared by a comparator, and abnormality of processing can be detected when the comparison result indicates a mismatch. Even when the timings by which the same processing results are obtained are different when the plurality of CPUs asynchronously execute the same processing, the processing results of both of them can be easily compared with each other since compression is carried out by the compressors. Moreover, since the comparison of the comparator is enabled when comparison enable is given from all the CPUs, the comparison operation result can be obtained based on the timing at which the results of compression by the plurality of compressors are determined. | 05-27-2010 |
20100161945 | INFORMATION HANDLING SYSTEM WITH REAL AND VIRTUAL LOAD/STORE INSTRUCTION ISSUE QUEUE - An information handling system includes a processor that may perform issue queue virtual load/store instruction operations. The issue queue maintains load and store instructions with a real/virtual dependency flag. The issue queue provides storage resources for real and virtual load/store instructions. Real load/store instructions execute in a load store unit LSU. Virtual load/store instructions are pending execution in the LSU. The LSU may keep track of each virtual load/store instruction within the issue queue by thread, type, and pointer data. Provided that all dependencies are clear for a pending virtual load/store instruction, the LSU marks the pending virtual load/store instruction as real. The pending virtual load/store instruction may then issue to the LSU as a real load/store instruction. | 06-24-2010 |
20110138153 | MECHANISM FOR SELECTING INSTRUCTIONS FOR EXECUTION IN A MULTITHREADED PROCESSOR - In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication. | 06-09-2011 |
20110231634 | SYSTEM AND METHOD FOR GROUPING ALTERNATIVE POSSIBILITIES IN AN UNKNOWN INSTRUCTION PATH - A method, system and device is provided for processing digital data, for example, video, image, and media data. A dispatch unit may simultaneously issue a plurality of instructions to an execution unit. The instructions may correspond to different mutually exclusive outcomes of a common condition. A processor may determine the actual outcome of the common condition. An execution unit may execute the one of the plurality of instructions which corresponds to the actual outcome of the condition and discard the remaining simultaneously issued instructions. | 09-22-2011 |
20110276787 | MULTITHREAD PROCESSOR, COMPILER APPARATUS, AND OPERATING SYSTEM APPARATUS - A multithread processor for executing, in parallel, instructions included in a plurality of threads includes: a calculating group including a plurality of calculators each of which is for executing an instruction; instruction grouping units which classify, for each thread, the instructions included in the thread into groups each of which includes instructions that are simultaneously executable by the calculators; a thread selecting unit which selects, per execution cycle of the multithread processor, a thread including instructions to be issued to the calculators, from among the threads, by controlling execution frequency for executing the instructions included in the threads; and an instruction issuing unit which issues, to the calculators, per execution cycle of the multithread processor, the instructions classified into each of the groups and being among the instructions included in the thread selected by the thread selecting unit. | 11-10-2011 |
20110302392 | INSTRUCTION TRACKING SYSTEM FOR PROCESSORS - A method and apparatus for tracking instructions in a processor. A completion unit in the processor receives an instruction group to add to a table to form a received instruction group. In response to receiving the received instruction group, the completion unit determines whether an entry is present that contains a previously stored instruction group in a first location and has space for storing the received instruction group. In response to the entry being present, the completion unit stores the received instruction group in a second location in the entry to form a stored instruction group. | 12-08-2011 |
20120047352 | PROCESSOR - A processor includes: an instruction buffer which stores the instructions to be dispatched to the arithmetic units; a dependency detecting unit which (i) detects a first dependency and a second dependency and (ii) determines an instruction group including the instructions to be dispatched to the corresponding arithmetic units, the first dependency found between any given two of the instructions stored in the instruction buffer, the second dependency found between each of the instructions stored in the instruction buffer and each of the dispatched instructions, and the instruction group including at least one instruction (i) found in the instructions stored in the instruction buffer and (ii) having neither the first dependency nor the second dependency; and a dispatching unit which dispatches, to the corresponding one of the arithmetic units, the instruction included in the determined instruction group. | 02-23-2012 |
20120066481 | Dynamic instruction splitting - A data processing apparatus and method are provided. The data processing apparatus is configured to perform data processing operations in response to data processing instructions including a multiple operation instruction, in response to which multiple data processing operations are performed. The data processing apparatus comprises two or more data processing units configured to perform the data processing operations and an instruction arbitration unit configured to perform sub-division of a multiple operation instruction into a plurality of sub-instructions and to perform allocation of the plurality of sub-instructions amongst the two or more data processing units, wherein each sub-instruction is arranged to cause one of the two or more data processing units to perform at least one data processing operation of the multiple data processing operations. The instruction arbitration unit is configured to perform the sub-division and the allocation dynamically in dependence on a current availability of a resource for each of the two or more data processing units, enabling more efficient usage of the resources of each of the data processing units to be made. | 03-15-2012 |
20120166768 | PIPELINE REPLAY SUPPORT FOR MULTICYCLE OPERATIONS - Instructions asserted in the instruction pipeline of the microprocessor are accompanied by control information, comprising a group of bits, asserted within a control information pipeline of the processor. The control information pipeline is synchronized to the instruction pipeline so that the control information for an instruction progresses in synchronism with the instruction. The control information may identify, directly or indirectly, the type of operation called for by the instruction and, if the operation is to be performed in parts, indicate the part to be performed. Means are included in the processor, such as a number of functional execution units, to interpret that control information and take appropriate action. | 06-28-2012 |
20120260070 | THREAD SELECTION FOR MULTITHREADED PROCESSING - A multithreading processor | 10-11-2012 |
20120278593 | LOW COMPLEXITY OUT-OF-ORDER ISSUE LOGIC USING STATIC CIRCUITS - Instruction issue circuits are disclosed that are configured to issue multiple instructions within a superscalar pipeline of a microprocessor. The instruction issue circuit includes an instruction queue that stores instructions. A ready generation circuit is operably associated with the instruction queue and generates ready signals that indicate which instructions in the instruction queue are ready for execution. To simplify the instruction issue circuit, the instruction issue circuit has group blocks. Each group block receives a different group of the ready signals corresponding to a different group of the instructions. Each group block generates a group output indicating a group set within the corresponding group of the instructions that has a highest instruction execution priority and are ready for execution. By splitting the ready signals into groups, the groups of ready signals can be processed in parallel thereby reducing both the resulting delay and complexity of the instruction issue circuit. | 11-01-2012 |
20120331274 | Instruction Execution - A method of executing an instruction set having a first instruction and a second instruction, includes: reading the first instruction; determining whether the first instruction is integral with the second instruction; reading the second instruction; if the first instruction is integral with the second instruction, interpreting a first operator field of the second instruction to represent a first operator; and if the first instruction is not integral with the second instruction, interpreting the first operator field of the second instruction to represent a second operator, wherein the first operator is different to the second operator. | 12-27-2012 |
20130145128 | PROCESSING CORE WITH PROGRAMMABLE MICROCODE UNIT - A method and circuit arrangement utilize a programmable microcode unit that is capable of being programmed via software to modify the instruction sequences output by the microcode unit in response to microcode instructions issued to the microcode unit. Among other benefits, a programmable microcode unit consistent with the invention enables customization of a processor design to handle specific applications or tasks, as well as to support specific hardware configurations such as specific execution units. In addition, a programmable microcode unit may be updatable, e.g., to correct bugs or faults found in previous instruction sequences supported by the unit. | 06-06-2013 |
20130159677 | INSTRUCTION GENERATION - Generating instructions, in particular for mailbox verification in a simulation environment. A sequence of instructions is received, as well as selection data representative of a plurality of commands including a special command. Repeatedly selecting one of the plurality of commands and outputting an instruction based on the selected command. The outputting of an instruction includes outputting a next instruction in the sequence of instructions if the selected command is the special command, and outputting an instruction associated with the command if the selected command is not the special command. | 06-20-2013 |
20130173886 | Processor with Hazard Tracking Employing Register Range Compares - Systems and methods for tracking data hazards in a processor. The processor comprises a pipelined architecture configured to execute a first instruction and a second instruction, wherein the second instruction is older than the first instruction. At least one of the first and second instructions comprises at least one operand expressed as a range of registers. Hazard detection logic is configured to compare the first instruction and the second instruction to determine if there is a data hazard, prior to expanding the second instruction. | 07-04-2013 |
20130179664 | DIVISION UNIT WITH MULTIPLE DIVIDE ENGINES - Techniques are disclosed relating to integrated circuits that include hardware support for divide and/or square root operations. In one embodiment, an integrated circuit is disclosed that includes a division unit that, in turn, includes a normalization circuit and a plurality of divide engines. The normalization circuit is configured to normalize a set of operands. Each divide engine is configured to operate on a respective normalized set of operands received from the normalization circuit. In some embodiments, the integrated circuit includes a scheduler unit configured to select instructions for issuance to a plurality of execution units including the division unit. The scheduler unit is further configured to maintain a counter indicative of a number of instructions currently being operated on by the division unit, and to determine, based on the counter whether to schedule subsequent instructions for issuance to the division unit. | 07-11-2013 |
20130262831 | METHODS AND APPARATUS TO AVOID SURGES IN DI/DT BY THROTTLING GPU EXECUTION PERFORMANCE - Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period. | 10-03-2013 |
20140089639 | PROCESSOR WITH INSTRUCTION CONCATENATION - A processor includes a plurality of execution units. At least one of the execution units is configured to determine, based on a field of a first instruction, a number of additional instructions to execute in conjunction with the first instruction and prior to execution of the first instruction. | 03-27-2014 |
20150074380 | METHOD AND APPARATUS FOR ASYNCHRONOUS PROCESSOR PIPELINE AND BYPASS PASSING - A clock-less asynchronous processor comprising a plurality of parallel asynchronous processing logic circuits, each processing logic circuit configured to generate an instruction execution result. The processor comprises an asynchronous instruction dispatch unit coupled to each processing logic circuit, the instruction dispatch unit configured to receive multiple instructions from memory and dispatch individual instructions to each of the processing logic circuits. The processor comprises a crossbar coupled to an output of each processing logic circuit and to the dispatch unit, the crossbar configured to store the instruction execution results. | 03-12-2015 |
20150106595 | PRIORITIZING INSTRUCTIONS BASED ON TYPE - Methods and reservation stations for selecting instructions to issue to a functional unit of an out-of-order processor. The method includes classifying each instruction into one of a number of categories based on the type of instruction. Once classified an instruction is stored in an instruction queue corresponding to the category in which it was classified. Instructions are then selected from one or more of the instruction queues to issue to the functional unit based on a relative priority of the plurality of types of instructions. This allows certain types of instructions (e.g. control transfer instructions, flag setting instructions and/or address generation instructions) to be prioritized over other types of instructions even if they are younger. | 04-16-2015 |
20150113252 | THREAD CONTROL AND CALLING METHOD OF MULTI-THREAD VIRTUAL PIPELINE (MVP) PROCESSOR, AND PROCESSOR THEREOF - The present invention relates to a thread control method of a multi-thread virtual pipeline (MVP) processor, which comprises the following steps: allocating directly and sequentially threads in a central processing unit (CPU) thread operation queue to multi-path parallel hardware thread time slots of the MVP processor for operation; allowing an operating thread to generate hardware thread call instructions corresponding thereto to a hardware thread management unit; allowing the hardware thread management unit to enable the call instructions of ithread threads to form a program queue according to receiving time, and calling and preparing the hardware threads; and allowing the hardware threads to operate sequentially in idle multi-path parallel hardware thread time slots of the MVP processor according to the sequence of the hardware threads in the queue of the hardware thread management unit. The present invention also relates to a processor. | 04-23-2015 |
20150134934 | VIRTUAL LOAD STORE QUEUE HAVING A DYNAMIC DISPATCH WINDOW WITH A DISTRIBUTED STRUCTURE - An out of order processor. The processor includes a distributed load queue and a distributed store queue that maintain single program sequential semantics while allowing an out of order dispatch of loads and stores across a plurality of cores and memory fragments; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue. | 05-14-2015 |
20150324204 | PARALLEL SLICE PROCESSOR WITH DYNAMIC INSTRUCTION STREAM MAPPING - A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The dispatch routing network is controlled to dynamically vary the relationship between the slices and instruction streams according to execution requirements for the instruction streams and the availability of resources in the instruction execution slices. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution and ordinary instruction execution on a per-instruction basis, permitting the mixture of those instruction types. Instructions having an operand width greater than the width of a single instruction execution slice may be processed by multiple instruction execution slices configured to act in concert for the particular instructions. When an instruction execution slice is busy processing a current instruction for one of the streams, another slice can be selected to proceed with execution. | 11-12-2015 |
20150324205 | PROCESSING OF MULTIPLE INSTRUCTION STREAMS IN A PARALLEL SLICE PROCESSOR - Techniques for managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices provide flexibility in execution of program instructions by a processor core. An event is detected indicating that either resource requirement or resource availability will not be met by the execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution. When an instruction execution slice is busy processing a current instruction for one of the streams, another slice can be selected to proceed with execution. | 11-12-2015 |
20150324206 | PARALLEL SLICE PROCESSOR WITH DYNAMIC INSTRUCTION STREAM MAPPING - A method of operation of a processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues coupled by a dispatch routing network provides flexible and efficient use of internal resources. The dispatch routing network is controlled to dynamically vary the relationship between the slices and instruction streams according to execution requirements for the instruction streams and the availability of resources in the instruction execution slices. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution and ordinary instruction execution on a per-instruction basis. Instructions having an operand width greater than the width of a single instruction execution slice may be processed by multiple instruction execution slices configured to act in concert for the particular instructions. When an instruction execution slice is busy processing a current instruction for one of the streams, another slice can be selected to proceed with execution. | 11-12-2015 |
20150324207 | PROCESSING OF MULTIPLE INSTRUCTION STREAMS IN A PARALLEL SLICE PROCESSOR - A method of managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices provides instruction processing flexibility. An event is detected indicating that either resource requirement or resource availability for a subsequent instruction of an instruction stream will not be met by the instruction execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution. When an execution slice is busy processing a current instruction for one of the streams, another slice can be selected to proceed with execution. | 11-12-2015 |
20150347150 | HARDWARE COUNTERS TO TRACK UTILIZATION IN A MULTITHREADING COMPUTER SYSTEM - Embodiments relate tracking utilization in a multithreading (MT) computer system. According to one aspect, a computer system includes a configuration with a core configured to operate in a MT that supports multiple threads on shared resources of the core. The core is configured to perform a method that includes resetting a plurality of utilization counters. The utilization counters include a plurality of sets of counters. During each clock cycle on the core, a set of counters is selected from the plurality of sets of counters. The selecting is based on a number of currently active threads on the core. In addition, during each clock cycle a counter in the selected set of counters is incremented based on an aggregation of one or more execution events at the multiple threads of the core. Values of the utilization counters are provided to a software program. | 12-03-2015 |
20150355908 | ADDRESS EXPANSION AND CONTRACTION IN A MULTITHREADING COMPUTER SYSTEM - Embodiments relate to address expansion and contraction in a multithreading computer system. According to one aspect, a computer implemented method for address adjustment in a configuration is provided. The configuration includes a core configurable between an ST mode and an MT mode, where the ST mode addresses a primary thread and the MT mode addresses the primary thread and one or more secondary threads on shared resources of the core. The primary thread is accessed in the ST mode using a core address value. Switching from the ST mode to the MT mode is performed. The primary thread or one of the one or more secondary threads is accessed in the MT mode using an expanded address value. The expanded address value includes the core address value concatenated with a thread address value. | 12-10-2015 |
20160004538 | MULTIPLE ISSUE INSTRUCTION PROCESSING SYSTEM AND METHOD - A multiple issue instruction processing system is provided. The system includes a central processing unit (CPU), a memory system and an instruction control unit. The CPU is configured to execute one or more instructions of the executable instructions at the same time. The memory system is configured to store the instructions. The instruction control unit is configured to, based on location of a branch instruction stored in a track table, control the memory system to output the instructions likely to be executed to the CPU. | 01-07-2016 |
20160055002 | Method and Apparatus for Scheduling the Issue of Instructions in a Multithreaded Processor - There is provided a method to dynamically determine which instructions from a plurality of available instructions to issue in each clock cycle in a multithreaded processor capable of issuing a plurality of instructions in each clock cycle, comprising the steps of: determining a highest priority instruction from the plurality of available instructions; determining the compatibility of the highest priority instruction with each of the remaining available instructions; and issuing the highest priority instruction together with other instructions compatible with the highest priority instruction in the same clock cycle; wherein the highest priority instruction cannot be a speculative instruction. The effect of this is that speculative instructions are only ever issued together with at least one non-speculative instruction. | 02-25-2016 |
20160085557 | PROCESSOR AND PROCESSING METHOD OF VECTOR INSTRUCTION - A processor includes: a plurality of pipelines including a first pipeline and a second pipeline and configured to pipeline-process vector instructions including load instructions with respect to a memory, and when an instruction issuance controller configured to decode a vector instruction read out from an instruction memory and issue instructions to the pipelines issues a first load instruction with respect to a first region of a memory to the first pipeline and a second load instruction with respect to the first region of the memory is being processed in the second pipeline, a processing order in the first load instruction in the first pipeline is changed on the basis of an offset value determined according to a number of cycles that have been processed already in the second load instruction so that an access address of the first load instruction matches an access address of the second load instruction. | 03-24-2016 |
20160098275 | THERMAL-AWARE COMPILER FOR PARALLEL INSTRUCTION EXECUTION IN PROCESSORS - Embodiments are described for a method for compiling instruction code for execution in a processor having a number of functional units by determining a thermal constraint of the processor, and defining instruction words comprising both real instructions and one or more no operation (NOP) instructions to be executed by the functional units within a single clock cycle, wherein a number of NOP instructions executed over a number of consecutive clock cycles is configured to prevent exceeding the thermal constraint during execution of the instruction code. | 04-07-2016 |
20160110200 | FLEXIBLE INSTRUCTION EXECUTION IN A PROCESSOR PIPELINE - Executing instructions in a processor includes analyzing operations to be performed by instructions, including: determining a latency associated with a first operation to be performed by a first instruction, determining a second operation to be performed by a second instruction, where a result of the second operation depends on a result of the first operation, and assigning a value to the second instruction corresponding to the determined latency associated with the first operation. One or more instructions are selected to be issued together in the same clock cycle of the processor from among instructions whose operations have been analyzed, the selected instructions occurring consecutively according to a program order. A start of execution of the second instruction is delayed by a particular number of clock cycles after the clock cycle in which the second instruction is issued according to the value assigned to the second instruction. | 04-21-2016 |
20160110201 | FLEXIBLE INSTRUCTION EXECUTION IN A PROCESSOR PIPELINE - Executing instructions in a processor includes: selecting or more instructions to be issued together in the same clock cycle of the processor from among a plurality of instructions, the selected one or more instructions occurring consecutively according to a program order; and executing instructions that have been issued, through multiple execution stages of a pipeline of the processor. The executing includes: determining a delay assigned to a first instruction, and sending a result of a first operation performed by the first instruction in a first execution stage to a second execution stage, where the number of execution stages between the first execution stage and the second execution stage is based on the determined delay. | 04-21-2016 |
20160117172 | FREELIST BASED GLOBAL COMPLETION TABLE - Managing a global completion table used to track progress of groups of instructions, in which each group of instructions includes one or more instructions. Entries of the global completion table are allocated to the groups of instructions from a freelist of entries. That is, entries are allocated from a pool of entries, rather than allocating entries in-order in a circular queue. | 04-28-2016 |
20160117175 | FREELIST BASED GLOBAL COMPLETION TABLE - Managing a global completion table used to track progress of groups of instructions, in which each group of instructions includes one or more instructions. Entries of the global completion table are allocated to the groups of instructions from a freelist of entries. That is, entries are allocated from a pool of entries, rather than allocating entries in-order in a circular queue. | 04-28-2016 |
20160179551 | PIPELINING OUT-OF-ORDER INSTRUCTIONS | 06-23-2016 |
20160202989 | RECONFIGURABLE PARALLEL EXECUTION AND LOAD-STORE SLICE PROCESSOR | 07-14-2016 |
20160202991 | RECONFIGURABLE PARALLEL EXECUTION AND LOAD-STORE SLICE PROCESSING METHODS | 07-14-2016 |
20160202993 | INSTRUCTION STREAM TRACING OF MULTI-THREADED PROCESSORS | 07-14-2016 |
20160378481 | INSTRUCTION AND LOGIC FOR ENCODED WORD INSTRUCTION COMPRESSION - A processor includes a memory and a decompressor. The memory is to store compressed instruction. The decompressor includes logic to receive a request for an instruction in the compressed instructions to be executed by the processor, determine a block in the memory including the requested instruction, and determine a start address of the block in the compressed instructions. The decompressor also includes logic decompress chunks of the block, a given chunk to include parts of a plurality of very-long instruction word (VLIW) instructions. | 12-29-2016 |
20160378505 | SYSTEM OPERATION QUEUE FOR TRANSACTION - Embodiments relate to a system operation queue for a transaction. An aspect includes determining whether a system operation is part of an in-progress transaction of a central processing unit (CPU). Another aspect includes based on determining that the system operation is part of the in-progress transaction, storing the system operation in a system operation queue corresponding to the in-progress transaction. Yet another aspect includes, based on the in-progress transaction ending, processing the system operation in the system operation queue. | 12-29-2016 |