Entries |
Document | Title | Date |
20080201561 | MULTI-THREADED PARALLEL PROCESSOR METHODS AND APPARATUS - A processor system, a processor readable medium and a method for implementing multiple contexts on one or more SPE are disclosed. | 08-21-2008 |
20080215864 | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor - A simultaneous multithreaded processor that reduces the number of hardware components necessary as well as the complexity of design over current systems is disclosed. As opposed to requiring individual storage elements for saving instruction pointer information for each re-steer logic component within a processor pipeline, the present invention allows for instruction pointer information of an inactive thread to be stored in a single, ‘inactive thread’ storage element until the thread becomes active again. | 09-04-2008 |
20080222401 | METHOD AND SYSTEM FOR ENABLING STATE SAVE AND DEBUG OPERATIONS FOR CO-ROUTINES IN AN EVENT-DRIVEN ENVIRONMENT - A method of enabling state save and debug operations for co-routines for first failure data capture (FFDC) in an event-driven environment. A stack management utility allocates space for a context structure, which includes a state field, and a stack pointer in a buffer. A context management utility initializes a first context structure of a first co-routine and saves a state of the first context structure in response to an execution request for a second co-routine. The context management utility sets a second context structure as a current context. When execution of the current context is complete, the context management utility restores the first context structure of the first co-routine as the current context. If the state field is not set to a valid value, a state save function “state saves” all allocated co-routine stacks and context structures, restores the entire system to a previous valid state, and restarts operations. | 09-11-2008 |
20080229083 | Processor instruction set - The invention provides a processor comprising an execution unit and a thread scheduler configured to schedule a plurality of threads for execution by the execution unit in dependence on a respective status for each thread. The execution unit is configured to execute thread scheduling instructions which manage said statuses, the thread scheduling instructions including at least: a thread event enable instruction which sets a status to event-enabled to allow a thread to accept events, a wait instruction which sets the status to suspended pending at least one event upon which continued execution of the thread depends, and a thread event disable instruction which sets the status to event-disabled to stop the thread from accepting events. The continued execution comprises retrieval of a continuation point vector for the thread. | 09-18-2008 |
20080244246 | INTEGRATED MPE-FEC RAM FOR DVB-H RECEIVERS - A MPE-FEC memory chip and method for use in a DVB-H receiver, wherein the memory chip comprises a TS demux; a RS decoder; a system bus; and a RAM unit adapted to simultaneously interface to the TS demux, the RS decoder, and the system bus through time-multiplexing, wherein the RAM unit is adapted to (i) access multiple-words per clock cycle, and (ii) cache write and read accesses to reduce memory access from the TS demux and the system bus, and wherein the RAM unit is adapted to be clocked at a speed higher than an interfacing data-path to increase an effective throughput of the RAM unit. The RAM unit may comprise multiple RAM sub units, wherein while a first RAM sub unit is clock gated, the remaining multiple RAM sub units are accessible. | 10-02-2008 |
20080244247 | Processing long-latency instructions in a pipelined processor - There is provided a method and processor for processing a thread. The thread comprises a plurality of sequential instructions, the plurality of sequential instructions comprising some short-latency instructions and some long-latency instructions and at least one hazard instruction, the hazard instruction requiring one or more preceding instructions to be processed before the hazard instruction is processed. The method comprises the steps of: a) before processing each long-latency instruction, incrementing by one, a counter associated with the thread; b) after each long-latency instruction has been processed, decrementing by one, the counter associated with the thread; c) before processing each hazard instruction, checking the value of the counter associated with the thread, and i) if the counter value is zero, processing the hazard instruction, or ii) if the counter value is non-zero, pausing processing of the hazard instruction until a later time. The processor includes means for performing steps a), b) and c) of the method. | 10-02-2008 |
20080250233 | Providing thread fairness in a hyper-threaded microprocessor - A method and apparatus for providing fairness in a multi-processing element environment is herein described. Mask elements are utilized to associated portions of a reservation station with each processing element, while still allowing common access to another portion of reservation station entries. Additionally, bias logic biases selection of processing elements in a pipeline away from a processing element associated with a blocking stall to provide fair utilization of the pipeline. | 10-09-2008 |
20080263339 | Method and Apparatus for Context Switching and Synchronization - A method, computer-readable medium, and apparatus for context switching between a first thread and a second thread. The method includes detecting an exception, wherein the exception is generated in response to receiving a packet of information directed to one of the first thread and the second thread, and in response to detecting the exception, invoking an exception handler. The exception handler is configured to execute one or more instructions removing access to at least a portion of a processor cache. The portion of the processor cache contains cached information for the first thread using a first address translation. Removing access to the portion of the processor cache prevents the second thread using a second address translation from accessing the cached information in the processor cache. The exception handler is also configured to branch to at least one of the first thread and the second thread. | 10-23-2008 |
20080270771 | METHOD OF OPTIMIZING MULTI-SET CONTEXT SWITCH FOR EMBEDDED PROCESSORS - A method of optimizing multi-set context switch for embedded processors includes the steps of partitioning a plurality of registers into a plurality of register sets based on a live-range-sensitive context-switch procedure that is associated with a usage frequency of each of the registers, storing contents of first target registers according to live set information of a current task, wherein the first target registers are selected from the register sets, determining a next task by an operating system and updating the live set information according to the next task, and restoring contents of second target registers according to the updated live set information, wherein the second target registers are selected from the register sets. | 10-30-2008 |
20080270772 | Reduced data transfer during processor context switching - Data transfer during processor context switching is reduced, particularly in relation to a time-sharing microtasking programming model. Prior to switching context of a processor having local memory from a first to a second process, a portion of the local memory that does not require transfer to system memory for proper saving of data associated with the first process is determined. The context of the processor is then switched from the first to the second process, including transferring all of the local memory as the data associated with the first process, to system memory—except for the portion of the local memory that has been determined as not requiring saving to the system memory for proper saving of the data associated with the first process. Therefore, switching the context from the first to the second process results in a reduction of data transferred from the local memory to the system memory. | 10-30-2008 |
20080282071 | Microprocessor and register saving method - A microprocessor which realizes fast register saving and restoring which are involved in subroutine calls, and is capable of reducing the scale of a program. A register file is provided with at least one register for storing data to be used for computational processing. A saving memory stores therein data saved from the registers. A saving control unit saves data from a writing destination register to the saving memory when an instruction to write to the register is executed in a subroutine. Then the saving control unit restores data saved in the saving memory back to the original registers when an instruction to return from the subroutine is executed. | 11-13-2008 |
20080307208 | Application specific processor having multiple contexts - An application specific processor executes multiple dedicated applications in a system having a main control processor for controlling the operation of the system. The application specific processor includes a first context for executing a corresponding first application and a second context for executing a corresponding second application. An instruction memory outputs instructions for executing the first and second applications, and a context switch instruction for switching from one context to the other context. Context is switched in response to the context switch instruction while executing the first or second application. | 12-11-2008 |
20090024841 | Register File Backup Queue - A register file backup system for use with a computer which processes instructions to generate results which thereby change the visual state of the computer. The computer has a register file with a plurality of addressable locations for storing data. The backup system is adapted to return the visual state of the computer to a previous state if an instruction generates an exception. The backup system utilizes less overhead so as to provide easier register file backup than a comparable software or hardware device. The backup system comprises first means for sequentially storing in program order, address information corresponding to destination locations in the register file where instruction results are to be stored. The first means has first and second outputs for transferring the address information stored therein: the first output being coupled to the register file for transferring a first portion of the address information to the register file, and the second output is used for transferring a second portion of address information for backup storage of the register file contents. The backup system also has a second means coupled to (1) the second output of the first means, for receiving and storing the second portion of the address information, and (2) the register file, for receiving and backup storing further information corresponding to the contents of one or more destination locations in the register file before that destination location is changed according to second portion of the address information. A third means is used for transferring the further information from the second means back to the register file locations according to the second portion of the address information stored in the second means after an instruction generates an exception. | 01-22-2009 |
20090043996 | USER CO-ROUTINE INTERFACE FOR CUSTOMIZING SIP AND SDP PROTOCOLS - A method of using co-routines to implement a function-like interface between a BASIC program and the points in the system where SIP and SDP data (for example) are to be modified. This co-routine interface is intuitive from the end-user's perspective, and both real-time efficient and flexible from the system designer's perspective, and is applied to provide user-customized SIP and SDP modifications in an easy-to-use way that gives the end-user great flexibility while protecting the system from the undesirable side-effects that could result from a tightly coupled co-routine interface. | 02-12-2009 |
20090089561 | VISUALIZING CHANGES TO CONTENT OVER TIME - A processing device and method are provided for visualizing changes to dynamic content. Dynamic content may be obtained from a content source and a state of the content may be saved. The saved state of the content may be compared with a previously saved state of the content to produce difference data, indicating differences between the saved state of the content and the previously saved state of the content. The obtained content may be presented to a user and may include visual indications pointing out added portions of the content, deleted portions of the content, and/or unchanged portions of the content. In some embodiments, a scheduler may be configured to obtain content and save a state of the content at particular times or upon occurrences of particular events. In various embodiments, aged states of the content may be degraded. | 04-02-2009 |
20090089562 | Methods and apparatuses for reducing power consumption of processor switch operations - Methods and apparatuses for reducing power consumption of processor switch operations are disclosed. One or more embodiments may comprise specifying a subset of registers or state storage elements to be involved in a register or state storage operation, performing the register or state storage operation, and performing a switch operation. The embodiments may minimize the number of registers or state storage elements involved with the standby operation by specifying only the subset of registers or state storage elements, which may involve considerably fewer than the total number of registers or state storage or elements of the processor. The switch operation may be switch from one mode to another, such as a transition to or from a sleep mode, a context switch, or the execution of various types of instructions. | 04-02-2009 |
20090089563 | METHOD AND SYSTEM OF PERFORMING THREAD SCHEDULING - A method and system of performing thread scheduling. At least some of the illustrative embodiments are computer-readable mediums storing a program that, when executed by a processor of a host system, causes the processor to instantiate a CPU object that represents a processor abstraction, create a CPU context object that represents a thread abstraction (wherein the CPU context object is associated to a method, and wherein the CPU context object is mapped onto the CPU object), and execute the method within the CPU object. | 04-02-2009 |
20090094444 | Link Stack Repair of Erroneous Speculative Update - Whenever a link address is written to the link stack, the prior value of the link stack entry is saved, and is restored to the link stack after a link stack push operation is speculatively executed following a mispredicted branch. This condition is detected by maintaining a count of the total number of uncommitted link stack write instructions in the pipeline, and a count of the number of uncommitted link stack write instructions ahead of each branch instruction. When a branch is evaluated and determined to have been mispredicted, the count associated with it is compared to the total count. A discrepancy indicates a link stack write instruction was speculatively issued into the pipeline after the mispredicted branch instruction, and pushed a link address onto the link stack. The prior link address is restored to the link stack from the link stack restore buffer. | 04-09-2009 |
20090150656 | Reducing Aging Effect On Registers - Methods and apparatus to reduce aging effect on registers are described. In one embodiment, a select value is stored in a register that is unused, for example, to reduce the effects of negative bias temperature instability (NBTI) or oxide degradation on the register. Other embodiments are also described. | 06-11-2009 |
20090172369 | SAVING AND RESTORING ARCHITECTURAL STATE FOR PROCESSOR CORES - A method and apparatus for saving and restoring architectural states utilizing hardware is herein described. A first portion of an architectural state of a processing element, such as a core, is concurrently saved upon being updated. A remaining portion of the architectural state is saved to memory in response to a save state triggering event, which may include a hardware event or a software event. Once saved, the state is potentially transferred to another processing element, such as a second core. As a result, hardware, software, or combination thereof may transfer architectural states between multiple processing elements, such as threads or cores, of a processor utilizing hardware support. | 07-02-2009 |
20090187749 | PIPELINE PROCESSOR - A bypass circuit is provided in a pipeline processor. A pipeline register is provided between an instruction execution stage and a write-back stage. The pipeline register stores a data validity flag and a WRITE control flag to control writing data into a general purpose register unit. The data retained in the pipeline register is allowed to be written back into the general purpose register unit when the WRITE control flag indicates “valid”. The pipeline register continues to retain the retained data even after the writing of the retained data into the general purpose register unit. The first pipeline register supplies the retained data to the second stage through the bypass circuit at the time of executing a subsequent instruction having data dependency on a preceding instruction. | 07-23-2009 |
20090210682 | DATA TRANSFER BUS COMMUNICATION USING SINGLE REQUEST TO PERFORM COMMAND AND RETURN DATA TO DESTINATION INDICATED IN CONTEXT TO ALLOW THREAD CONTEXT SWITCH - Systems and methods for managing context switches among threads in a processing system. A processor may perform a context switch between threads using separate context registers. A context switch allows a processor to switch from processing a thread that is waiting for data to one that is ready for additional processing. The processor includes control registers with entries which may indicate that an associated context is waiting for data from an external source. | 08-20-2009 |
20090217013 | METHOD AND APPARATUS FOR PROGRAMMATICALLY REWINDING A REGISTER INSIDE A TRANSACTION - Embodiments of the present invention provide a system that allocates registers in a processor. The system starts by commencing a transaction, wherein commencing the transaction involves preserving a pre-transactional state of registers in a first register file. The system then allocates one or more registers for temporary use during the transaction. Upon finishing using each allocated register during the transaction, the system executes an instruction that restores the allocated register to the pre-transactional state. | 08-27-2009 |
20090217014 | PROCESSOR, MEMORY DEVICE, PROCESSING DEVICE, AND METHOD FOR PROCESSING INSTRUCTION - A processor includes a VM trap logic and a buffering logic. The VM trap logic determines whether or not an instruction acquired from a VM (Virtual Machine) satisfies a predetermined VM trap condition. The buffering logic determines whether or not the instruction acquired from the VM satisfies a predetermined buffering condition. | 08-27-2009 |
20090271594 | SEMICONDUCTOR INTEGRATED CIRCUIT, SEMICONDUCTOR INTEGRATED CIRCUIT CONTROL DEVICE, LOAD DISTRIBUTION METHOD, LOAD DISTRIBUTION PROGRAM, AND ELECTRONIC DEVICE - A damage control unit includes: a switching judgment unit to judge the CPU configuration which performs smoothing of the damage ratio, according to the damage ratio of the CPUs; and a switching unit to perform switching of I/O signals of all the CPUs. The switching judgment unit observes the damage ratio calculated from values such as the temperature, voltage, current consumption amount, operation ratio, the number of accesses to the resources in the CPU, at all times or at some extent of time intervals and notifies the switching unit of the CPU configuration to be changed by using the calculation method for smoothing the damage ratio of each CPU. The switching unit makes a connection to the I/O signals of all the CPUs and a system bus and switches the I/O signal of the CPU to be switched according to the notification from the switching judgment unit. | 10-29-2009 |
20090300338 | AGGRESSIVE STORE MERGING IN A PROCESSOR THAT SUPPORTS CHECKPOINTING - Embodiments of the present invention provide a processor that merges stores in an N-entry first-in-first-out (FIFO) store queue. In these embodiments, the processor starts by executing instructions before a checkpoint is generated. When executing instructions before the checkpoint is generated, the processor is configured to perform limited or no merging of stores into existing entries in the store queue. Then, upon detecting a predetermined condition, the processor is configured to generate a checkpoint. After generating the checkpoint, the processor is configured to continue to execute instructions. When executing instructions after the checkpoint is generated, the processor is configured to freely merge subsequent stores into post-checkpoint entries in the store queue. | 12-03-2009 |
20090307469 | REGISTER SET USED IN MULTITHREADED PARALLEL PROCESSOR ARCHITECTURE - A parallel hardware-based multithreaded processor is described. The processor includes a general purpose processor that coordinates system functions and a plurality of microengines that support multiple hardware threads or contexts. The processor maintains execution threads. The execution threads access a register set organized into a plurality of relatively addressable windows of registers that are relatively addressable per thread. | 12-10-2009 |
20100011194 | STATE AS A FIRST-CLASS CITIZEN OF AN IMPERATIVE LANGUAGE - A state component saves a present state of a program or model. This state component can be invoked by the program or model itself, thereby making state a first-class citizen. As the state of the program evolves from the saved state, the saved state remains for reflection and recall, for example, for testing, verification, transaction processing, etc. Using a state reference token, the saved state of the program or model can be accessed by the program or model. For example, the program or model by utilizing a state component, can return itself to the saved state. After returning to the saved state, a second execution path can be introduced without requiring re-execution of the actions leading to the saved state. In another example, the state space of an executing model is saved in order to generate inputs required to exercise a program or model. | 01-14-2010 |
20100082951 | MULTI-THREADED PARALLEL PROCESSOR METHODS AND APPARATUS - A processor system may implement multiple contexts on one or more processors having a local memory. Code and/or data for first and second contexts may be respectively stored simultaneously in first and second regions of a processor's local memory, storing code and/or data for a second context in a second region of the local memory, the secondary processor may execute the first context while the second context waits. Code and/or data for the first context may be transferred from the first region to the second and code and/or data for the second context may be transferred from the second region to the first, and the processor may execute the second context during a pause or stoppage of execution of the first context. Alternatively, the code and/or data for the second context may be transferred to another processor's local memory. | 04-01-2010 |
20100088494 | Total cost based checkpoint selection - A method, system, and computer usable program product for total cost based checkpoint selection are provided in the illustrative embodiments. A cost associated with taking a checkpoint is determined. The cost includes an energy cost. An interval between checkpoints is computed so as to minimize the cost. An instruction is sent to schedule the checkpoints at the computed interval. The energy cost may further include a cost of energy consumed in collecting and saving data at a checkpoint, a cost of energy consumed in re-computing a computation lost due to a failure after taking the checkpoint, or a combination thereof. The cost may further include, converted to a cost equivalent, administration time consumed in recovering from a checkpoint, computing resources expended in taking a checkpoint, computing resources expended after a failure in restoring information from a checkpoint, performance degradation of an application while taking a checkpoint, or a combination thereof. | 04-08-2010 |
20100095100 | Checkpointing A Hybrid Architecture Computing System - A method, apparatus, and program product checkpoint an application in a parallel computing system of the type that includes a plurality of hybrid nodes. Each hybrid node includes a host element and a plurality of accelerator elements. Each host element may include at least one multithreaded processor, and each accelerator element may include at least one multi-element processor. In a first hybrid node from among the plurality of hybrid nodes, checkpointing the application includes executing at least a portion of the application in the host element and at least one accelerator element and, in response to receiving a command to checkpoint the application, checkpointing the host element separately from the at least one accelerator element. | 04-15-2010 |
20100095101 | Capturing Context Information in a Currently Occurring Event - According to a sample embodiment, a method is provided for capturing context information about an event. A data collector is created comprising instructions to collect specific context data in response to specific conditions in a call stack, and the data collector is registered with a first failure data capture application. In a sample embodiment the first failure data capture application receives a registration for a context data collector. Then, in response to being called, the first failure data capture application looks for at least one of a class and a method defined in the context data collection registration that matches conditions of the call stack. In response to said call stack conditions being met, the first failure data capture application calls the data collector to collect context data from the call stack, receives context data from the context data collector; and presents the context data. | 04-15-2010 |
20100115249 | Support of a Plurality of Graphic Processing Units - Included are systems and methods for supporting a plurality of Graphics Processing Units (GPUs). At least one embodiment of a system includes a context status register configured to send data related to a status of at least one context and a context switch configuration register configured to send instructions related to at least one event for the at least one context. At least one embodiment of a system includes a context status management component coupled to the context status register and the context switch configuration register. | 05-06-2010 |
20100115250 | CONTEXT SWITCHING AND SYNCHRONIZATION - A method, computer-readable medium, and apparatus for context switching between a first thread and a second thread. The method includes detecting an exception, wherein the exception is generated in response to receiving a packet of information directed to one of the first thread and the second thread, and in response to detecting the exception, invoking an exception handler. The exception handler is configured to execute one or more instructions removing access to at least a portion of a processor cache. The portion of the processor cache contains cached information for the first thread using a first address translation. Removing access to the portion of the processor cache prevents the second thread using a second address translation from accessing the cached information in the processor cache. The exception handler is also configured to branch to at least one of the first thread and the second thread. | 05-06-2010 |
20100125722 | Multithreaded Processing Unit With Thread Pair Context Caching - A circuit arrangement and method utilize thread pair context caching, where a pair of hardware threads in a multithreaded processor, which are each capable of executing a process, are effectively paired together, at least temporarily, to perform context switching operations such as context save and/or load operations in advance of context switches performed in one or more of such paired hardware threads. By doing so, the overall latency of a context switch, where both the context for a process being switched from must be saved, and the context for the process being switched to must be loaded, may be reduced. | 05-20-2010 |
20100161948 | Apparatus and Method for Processing Complex Instruction Formats in a Multi-Threaded Architecture Supporting Various Context Switch Modes and Virtualization Schemes - A unified architecture for dynamic generation, execution, synchronization and parallelization of complex instructions formats includes a virtual register file, register cache and register file hierarchy. A self-generating and synchronizing dynamic and static threading architecture provides efficient context switching. | 06-24-2010 |
20100169622 | PROCESSOR REGISTER RECOVERY AFTER FLUSH OPERATION - An information handling system includes a processor that may perform general purpose register recovery operations after an instruction flush operation that an exception, such as a branch misprediction causes. The processor receives an instruction stream that may include multiple instructions that operate on a particular target register that stores instruction result information. The general purpose register may temporarily store instruction opcode and register bits information for use during dispatch, execution and other operations. The processor includes a recovery buffer unit for use during flush recovery operations. The processor may use recovery valid and recovery pending bits that correspond with each instruction during the register recovery from flush operation. | 07-01-2010 |
20100180106 | ASYNCHRONOUS CHECKPOINTING WITH AUDITS IN HIGH AVAILABILITY NETWORKS - Example embodiments are directed to methods of ensuring high availability of a network using asynchronous checkpointing of application state data related to an object. Example embodiments include a method of asynchronous checkpointing application state data related to at least one object, including receiving application events and processing the application events to obtain new application state data. The method further includes modifying at least a portion of previously stored application state data and asynchronously and independently checkpointing the modified application state data based on whether the modified application state data has reached a stable state. Example embodiments also include a method of ensuring consistent application state data across a network. This method may include having at least two CAPs independently and asynchronously storing application state data related to at least one object at two different network nodes and automatically auditing the stored application state data to ensure data consistency. | 07-15-2010 |
20100191942 | Information processor and control method - A northbridge, when detecting a synchronization break of a redundant CPU, stops the operation of an abnormal CPU bus where an error has occurred and the firmware in an FWH instructs the northbridge to inhibit an external instruction. In addition, the firmware save the inside information of a normal CPU connected to a normal CPU bus and cache data on a memory and the northbridge issues reset to all CPUs in the home system board. The firmware then restores the inside information of the CPU save on the memory to the all CPUs and instructs the northbridge to cancel the inhibition of the external instruction. | 07-29-2010 |
20100262812 | REGISTER CHECKPOINTING MECHANISM FOR MULTITHREADING - Methods and apparatus are disclosed for using a register checkpointing mechanism to resolve multithreading mis-speculations. Valid architectural state is recovered and execution is rolled back. Some embodiments include memory to store checkpoint data. Multiple thread units concurrently execute threads. They execute a checkpoint mask instruction to initialize memory to store active checkpoint data including register contents and a checkpoint mask indicating the validity of stored register contents. As register contents change, threads execute checkpoint write instructions to store register contents and update the checkpoint mask. Threads also execute a recovery function instruction to store a pointer to a checkpoint recovery function, and in response to mis-speculation among the threads, branch to the checkpoint recovery function. Threads then execute one or more checkpoint read instructions to copy data from a valid checkpoint storage area into the registers necessary to recover a valid architectural state, from which execution may resume. | 10-14-2010 |
20100306512 | COMPILER TECHNIQUE FOR EFFICIENT REGISTER CHECKPOINTING TO SUPPORT TRANSACTION ROLL-BACK - A method and apparatus for efficient register checkpointing is herein described. A transaction is detected in program code. A recovery block is inserted in the program code to perform recovery operations in response to an abort of the first transaction. A roll-back edge is potentially inserted from an abort point to the recovery block. A control flow edge is inserted from the recovery block to a entry point of the transaction. Checkpoint code is inserted before the entry point to backup live-in registers in backup storage elements and recovery code is inserted in the recovery block to restore the live-in registers from the backup storage elements in response to an abort of the transaction. | 12-02-2010 |
20100332809 | Methods and Devices for Saving and/or Restoring a State of a Pattern-Recognition Processor - Systems and methods are disclosed for saving and restoring the search state of a pattern-recognition processor. Embodiments include a pattern-recognition processor having a state variable array and a state variable storage array stored in on-chip memory (on-silicon memory with the processor). State variable storage control logic of the pattern-recognition processor may control the saving of state variables from the state variable array to the state variable storage array. The state variable storage control logic may also control restoring of the state variables from the state variable storage array to restore a search state. | 12-30-2010 |
20100332810 | Reconfigurable Functional Unit Having Instruction Context Storage Circuitry To Support Speculative Execution of Instructions - A functional unit is described. The functional unit includes a reconfigurable logic circuitry and instruction context storage circuitry to store instruction context information generated from instructions executed by the reconfigurable logic circuitry within the reconfigurable functional unit. The instructions include speculatively executed instructions. | 12-30-2010 |
20110029761 | Method and apparatus of reducing CPU chip size - A new compression method and apparatus compresses instructions embedded in a CPU chip which significantly reduces the density of storage device of storing the program. Multiple groups of instructions in the form of binary code are compressed separately by a mapping unit indicating the starting location of a group of instructions which helps quickly recovering the corresponding instructions. A mapping unit is applied to interpret the corresponding address of a group of data for quickly recovering the corresponding instructions for a CPU to execute smoothly. | 02-03-2011 |
20110047364 | Recovering from an Error in a Fault Tolerant Computer System - A leading thread and a trailing thread are executed in parallel. Assuming that no transient fault occurs in each section, a system is speculatively executed in the section, with the leading thread and the trailing thread preferably being assigned to two different cores. At this time, the leading thread and the trailing thread are simultaneously executed, performing a buffering operation on a thread local area without performing a write operation on a shared memory. When the respective execution results of the two threads match each other, the content buffered to the thread local area is committed and written to the shared memory. When the respective execution results of the two threads do not match each other, the leading thread and the trailing thread are rolled back to a preceding commit point and re-executed. | 02-24-2011 |
20110066830 | CACHE PREFILL ON THREAD MIGRATION - Techniques for pre-filling a cache associated with a second core prior to migration of a thread from a first core to the second core are generally disclosed. The present disclosure contemplates that some computer systems may include a plurality of processor cores, and that some cores may have hardware capabilities different from other cores. In order to assign threads to appropriate cores, thread/core mapping may be utilized and, in some cases, a thread may be reassigned from one core to another core. In a probabilistic anticipation that a thread may be migrated from a first core to a second core, a cache associated with the second core may be pre-filled (e.g., may become filled with some data before the thread is rescheduled on the second core). Such a cache may be a local cache to the second core and/or an associated buffer cache, for example. | 03-17-2011 |
20110066831 | SYSTEM AND METHOD FOR SOFTWARE INITIATED CHECKPOINT OPERATIONS - A method, system and computer program product for issuing one or more software initiated operations for creating a checkpoint of a register file and memory, and for restoring a register file and memory to the checkpointed state. At the execution of a checkpoint operation, the system returns a condition code indicating success or failure. When the condition code is set equal to one, one or more checkpoints are initiated. Contents of the register file and gated store buffer are stored each time the one or more checkpoints are initiated. When the checkpoint is created, the system notifies software when a hardware checkpoint capacity has been reached. One or more of the software checkpoint, hardware checkpoint, and handler checkpoint are utilized to provide a more precise point of restoration. During software execution, the register file and gated store buffer can be restored as defined by the one or more previous checkpoints. | 03-17-2011 |
20110072247 | FAST APPLICATION PROGRAMMABLE TIMERS - Methods, systems, and computer program products for implementing fast application programmable timers are provided. A computer program product includes a tangible storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method. The method includes receiving a request to set a user accessible timer, the request received from an application thread. The user accessible timer is set in response to receiving the request, the setting including initializing a counter. The counter is decremented until an interrupt threshold has been reached. An interrupt signal is transmitted to the application thread in response to detecting that the interrupt threshold has been reached. | 03-24-2011 |
20110113222 | METHOD AND APPARATUS FOR ASSIGNING THREAD PRIORITY IN A PROCESSOR OR THE LIKE - In a multi-threaded processor, thread priority variables are set up in memory. The actual assignment of thread priority is based on the expiration of a thread precedence counter. To further augment, the effectiveness of the thread precedence counters, starting counters are associated with each thread that serve as a multiplier for the value to be used in the thread precedence counter. The value in the starting counters are manipulated so as to prevent one thread from getting undue priority to the resources of the multi-threaded processor. | 05-12-2011 |
20110131397 | MULTIPROCESSOR SYSTEM AND MULTIPROCESSOR CONTROL METHOD - A multiprocessor system includes a memory that stores a program; an address notification register; a first processor; and a second processor, in which the first processor stores address information indicating an address from which the program is executed in the address notification register, when the first processor notifies an interrupt request to the second processor and causes the second processor to execute the program, and the second processor obtains the interrupt request notified from the first processor and the address information stored in the address notification register, and starts to execute the program from the address indicated by the obtained address information. | 06-02-2011 |
20110145552 | Handling Operating System (OS) Transitions In An Unbounded Transactional Memory (UTM) Mode - In one embodiment, the present invention includes a method for receiving control in a kernel mode via a ring transition from a user thread during execution of an unbounded transactional memory (UTM) transaction, updating a state of a transaction status register (TSR) associated with the user thread and storing the TSR with a context of the user thread, and later restoring the context during a transition from the kernel mode to the user thread. In this way, the UTM transaction may continue on resumption of the user thread. Other embodiments are described and claimed. | 06-16-2011 |
20110153999 | METHODS AND APPARATUS TO MANAGE PARTIAL-COMMIT CHECKPOINTS WITH FIXUP SUPPORT - Example methods and apparatus to manage partial commit-checkpoints are disclosed. A disclosed example method includes identifying a commit instruction associated with a region of instructions executed by a processor, identifying candidate instructions from the region of instructions, and generating a processor partial commit-checkpoint to save a current state of the processor, the checkpoint based on calculated register values associated with live instructions, and including instruction reference addresses to link the candidate instructions. | 06-23-2011 |
20110154000 | Adaptive optimized compare-exchange operation - A technique to perform a fast compare-exchange operation is disclosed. More specifically, a machine-readable medium, processor, and system are described that implement a fast compare-exchange operation as well as a cache line mark operation that enables the fast compare-exchange operation. | 06-23-2011 |
20110161639 | Event counter checkpointing and restoring - A method of one aspect may include storing an event count of an event counter that counts events that occur during execution within a logic device. The method may further include restoring the event counter to the stored event count after the event counter has counted additional events. Other methods are also disclosed. Apparatus, systems, and machine-readable medium having software are also disclosed. | 06-30-2011 |
20110179258 | PRECISE DATA RETURN HANDLING IN SPECULATIVE PROCESSORS - The described embodiments provide a system for executing instructions in a processor. In the described embodiments, upon detecting a return of input data for a deferred instruction while executing instructions in an execute-ahead mode, the processor determines whether a replay bit is set in a corresponding entry for the returned input data in a miss buffer. If the replay bit is set, the processor transitions to a deferred-execution mode to execute deferred instructions. Otherwise, the processor continues to execute instructions in the execute-ahead mode. | 07-21-2011 |
20110219218 | DISTRIBUTED ORDER ORCHESTRATION SYSTEM WITH ROLLBACK CHECKPOINTS FOR ADJUSTING LONG RUNNING ORDER MANAGEMENT FULFILLMENT PROCESSES - A computer-readable medium, computer-implemented method, and system are provided. In one embodiment, a rollback checkpoint for a step in an executable process is established, and the executable process is executed. A change request is received, and the step with the established rollback checkpoint is adjusted. Any subsequent steps of the executable process are also adjusted. | 09-08-2011 |
20110238962 | Register Checkpointing for Speculative Modes of Execution in Out-of-Order Processors - A mechanism is provided for generating a checkpoint for a speculatively executed portion of code. The mechanisms identify, during a speculative execution of a portion of code, a register renaming operation occurring to an entry in a register renaming table of the processor. In response to the register renaming operation occurring to the register renaming table, a determination is made as to whether an update to an entry in a hardware-implemented recovery renaming table is to be performed. If so, the entry in the hardware-implemented recovery renaming table is updated. The entry in the recovery renaming table is part of the checkpoint for the speculative execution of the portion of code. | 09-29-2011 |
20110264898 | CHECKPOINT ALLOCATION IN A SPECULATIVE PROCESSOR - The embodiments described in the instant application provide a system for generating checkpoints. In the described embodiments, while speculatively executing instructions with one or more checkpoints in use, upon detecting an occurrence of a predetermined operating condition or encountering a predetermined type of instruction, the system is configured to determine whether an additional checkpoint is to be generated by computing a factor based on one or more operating conditions of the processor. When the factor is greater than a predetermined value, the processor is configured to generate the additional checkpoint. | 10-27-2011 |
20110283095 | Hardware Assist Thread for Increasing Code Parallelism - Mechanisms are provided for offloading a workload from a main thread to an assist thread. The mechanisms receive, in a fetch unit of a processor of the data processing system, a branch-to-assist-thread instruction of a main thread. The branch-to-assist-thread instruction informs hardware of the processor to look for an already spawned idle thread to be used as an assist thread. Hardware implemented pervasive thread control logic determines if one or more already spawned idle threads are available for use as an assist thread. The hardware implemented pervasive thread control logic selects an idle thread from the one or more already spawned idle threads if it is determined that one or more already spawned idle threads are available for use as an assist thread, to thereby provide the assist thread. In addition, the hardware implemented pervasive thread control logic offloads a portion of a workload of the main thread to the assist thread. | 11-17-2011 |
20110289303 | SETJMP/LONGJMP FOR SPECULATIVE EXECUTION FRAMEWORKS - A process for check pointing in speculative execution frameworks, identifies calls to a set of setjmp/longjmp instructions to form identified calls to setjmp/longjmp, determines a control flow path between a call to a setjmp and a longjmp pair of instructions in the identified calls to setjmp/longjmp and replaces calls to the setjmp/longjmp pair of instructions with calls to an improved_setjmp and improved_longjmp instruction pair. The process creates a context data structure in memory, computes a non-volatile save/restore set and replaces the call to improved_setjmp of the setjmp/longjmp pair of instructions with instructions to save all required non-volatile and special purpose registers and replaces a call to improved_longjmp of the setjmp/longjmp pair of instructions with instructions to restore all required non-volatile and special purpose registers and to branch to an instruction immediately following a block of code containing the call to improved_setjmp. | 11-24-2011 |
20110296148 | Transactional Memory System Supporting Unbroken Suspended Execution - Mechanisms are provided, in a data processing system having a processor and a transactional memory, for executing a transaction in the data processing system. These mechanisms execute a transaction comprising one or more instructions that modify at least a portion of the transactional memory. The transaction is suspended in response to a transaction suspend instruction being executed by the processor. A suspended block of code is executed in a non-transactional manner while the transaction is suspended. A determination is made as to whether an interrupt occurs while the transaction is suspended. In response to an interrupt occurring while the transaction is suspended, a transaction abort operation is delayed until after the transaction suspension is discontinued. | 12-01-2011 |
20120005461 | System and Method for Performing Incremental Register Checkpointing in Transactional Memory - Systems and methods described herein for performing incremental register checkpointing may employ a special register to indicate which registers have already been checkpointed. This register may include one bit per register. These systems may also include a special pointer register whose value identifies a location in user memory or in dedicated on-chip storage at which a copy of a register's value should be saved by a checkpointing operation. Only registers modified during speculative execution or execution of a transaction may be checkpointed (e.g., when register modifying instructions are encountered) and subsequently restored (e.g., due to misspeculation or transaction abort), rather than all of the registers of the processor. Each register may be checkpointed at most once for a given speculative episode or atomic transaction. Setting a bit in the special register may prevent checkpointing of the corresponding register. Setting all of the bits in the special register may disable checkpointing. | 01-05-2012 |
20120036340 | Data processing apparatus and method using checkpointing - A data processing apparatus and method of data processing are provided. The data processing apparatus comprises execution circuitry configured to execute a sequence of program instructions. Checkpoint circuitry is configured to identify an instance of a predetermined type of instruction in the sequence of program instructions and to store checkpoint information associated with that instance. The checkpoint information identifies a state of the data processing apparatus prior to execution of that instance of the predetermined type of instruction, wherein the predetermined type of instruction has an expected long completion latency. If the execution circuitry does not complete execution of that instance of the predetermined type of instruction due to occurrence of a predetermined event, the data processing apparatus is arranged to reinstate the state of the data processing apparatus with reference to the checkpoint information, such that the execution circuitry is then configured to recommence execution of the sequence of program instructions at that instance of the predetermined type of instruction. | 02-09-2012 |
20120089821 | DEBUGGING APPARATUS AND METHOD - A debugging apparatus and method are provided. The debugging apparatus may include a breakpoint setting unit configured to store a first instruction corresponding to a breakpoint in a table, stop a program currently being executed, and insert a breakpoint instruction including current location information of the first instruction into the breakpoint; and an instruction execution unit configured to selectively execute one of the breakpoint instruction and the first instruction according to a value of a status bit. | 04-12-2012 |
20120117361 | Processing Data Communications Events In A Parallel Active Messaging Interface Of A Parallel Computer - Processing data communications events in a parallel active messaging interface (‘PAMI’) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context. | 05-10-2012 |
20120191958 | SYSTEM AND METHOD FOR CONTEXT MIGRATION ACROSS CPU THREADS - One embodiment of the present invention sets forth a technique for associating arbitrary parallel processing unit (PPU) contexts with a given central processing unit (CPU) thread. The technique introduces two operators used to manage the PPU contexts. The first operator is a PPU context push, which causes a PPU driver to store the current PPU context of a calling thread on a PPU context stack and to associate a named PPU context with the calling thread. The second operator is a PPU context pop, which causes the PPU driver to restore the PPU context of a calling function to the PPU context at the top of the PPU context stack. By performing a PPU context push at the beginning of a function and a PPU context pop prior to returning from the function, the function may execute within a single CPU thread, but operate on a two distinct PPU contexts. | 07-26-2012 |
20120278596 | APPARATUS AND METHOD FOR CHECKPOINT REPAIR IN A PROCESSING DEVICE - A data processing device maintains register map information that maps accesses to architectural registers, as identified by instructions being executed, to physical registers of the data processing device. In response to determining that an instruction, such as a speculatively-executing conditional branch, indicates a checkpoint, the data processing device stores the register map information for subsequent retrieval depending on the resolution of the instruction. In addition, in response to the checkpoint indication the data processing device generates new register map information such that accesses to the architectural registers are mapped to different physical registers. The data processing device maintains a list, referred to as a free register list, of physical registers available to be mapped to an architectural registers. | 11-01-2012 |
20130042093 | CONTEXT STATE MANAGEMENT FOR PROCESSOR FEATURE SETS - Embodiments of an invention related to context state management based on processor features are disclosed. In one embodiment, a processor includes instruction logic and state management logic. The instruction logic is to receive a state management instruction having a parameter to identify a subset of the features supported by the processor. The state management logic is to perform a state management operation specified by the state management instruction. | 02-14-2013 |
20130042094 | COMPUTING SYSTEM WITH TRANSACTIONAL MEMORY USING MILLICODE ASSISTS - A computing system processes memory transactions for parallel processing of multiple threads of execution with millicode assists. The computing system transactional memory support provides a Transaction Table in memory and a method of fast detection of potential conflicts between multiple transactions. Special instructions may mark the boundaries of a transaction and identify memory locations applicable to a transaction. A ‘private to transaction’ (PTRAN) tag, directly addressable as part of the main data storage memory location, enables a quick detection of potential conflicts with other transactions that are concurrently executing on another thread of said computing system. The tag indicates whether (or not) a data entry in memory is part of a speculative memory state of an uncommitted transaction that is currently active in the system. Program millicode provides transactional memory functions including creating and updating transaction tables, committing transactions and controlling the rollback of transactions which fail. | 02-14-2013 |
20130046963 | ACCESS TO CONTEXT INFORMATION IN A HETEROGENEOUS APPLICATION ENVIRONMENT - Various embodiments of systems and methods to provide access to context information in a heterogeneous application environment are described herein. The context information of a source application is received. The context information is based on the execution of the source application. Further, the context information is stored in one or more context vectors of a global context unit, the one or more context vectors corresponding to the source application and one or more target applications. Furthermore, access to the context information of the global context unit is provided for the one or more target applications upon receiving invoking access indication from the one or more target applications. Also, the source application and the one or more target applications are integrated with the global context unit. | 02-21-2013 |
20130097411 | TRANSFERRING ARCHITECTED STATE BETWEEN CORES - A method and apparatus for transferring architected state bypasses system memory by directly transmitting architected state between processor cores over a dedicated interconnect. The transfer may be performed by state transfer interface circuitry with or without software interaction. The architected state for a thread may be transferred from a first processing core to a second processing core when the state transfer interface circuitry detects an error that prevents proper execution of the thread corresponding to the architected state. A program instruction may be used to initiate the transfer of the architected state for the thread to one or more other threads in order to parallelize execution of the thread or perform load balancing between multiple processor cores by distributing processing of multiple threads. | 04-18-2013 |
20130111194 | METHOD AND SYSTEM TO PROVIDE USER-LEVEL MULTITHREADING | 05-02-2013 |
20130132711 | COMPUTE THREAD ARRAY GRANULARITY EXECUTION PREEMPTION - One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity. | 05-23-2013 |
20130173894 | SHARING VIRTUAL FUNCTIONS IN A SHARED VIRTUAL MEMORY BETWEEN HETEROGENEOUS PROCESSORS OF A COMPUTING PLATFORM - A computing platform may include heterogeneous processors (e.g., CPU and a GPU) to support sharing of virtual functions between such processors. In one embodiment, a CPU side vtable pointer used to access a shared object from the CPU | 07-04-2013 |
20130179665 | RESTORING A REGISTER RENAMING MAP - A technique for restoring a register renaming map is described. In one example, a restore table having a number of storage locations saves a copy of the register renaming map whenever a flow-risk instruction is passed to a re-order buffer. When all storage locations are full, further instructions still pass to the re-order buffer, but a copy of the map is not saved. A storage location subsequently becomes available when its associated flow-risk instruction is executed. A register renaming map state for an unrecorded flow-risk instruction passed to the re-order buffer whilst the storage locations were full is generated and stored in the available location. This is generated using the restore table entry for a previous flow-risk instruction and re-order buffer values for intervening instructions between the previous and unrecorded flow-risk instructions. The restore table can be used to restore the map if an unexpected change in instruction flow occurs. | 07-11-2013 |
20130179666 | MULTI-CORE PROCESSOR SYSTEM, SYNCHRONIZATION CONTROL SYSTEM, SYNCHRONIZATION CONTROL APPARATUS, INFORMATION GENERATING METHOD, AND COMPUTER PRODUCT - A multi-core processor system includes a given core that includes a detecting unit that detects migration of a thread under execution by a synchronization source core to a synchronization destination core in the multi-core processor; an identifying unit that refers to a table identifying a combination of a thread and a register associated with the thread, and identifies a particular register corresponding to the thread for which migration was detected; a generating unit that generates synchronization control information identifying the synchronization destination core and the particular register; and a synchronization controller that, communicably connected to the multi-core processor, acquires from the given core, the synchronization control information, reads in from the particular register of the synchronization source core, a value of the particular register obtainable from the synchronization control information, and writes to the particular register of the synchronization destination core, the value. | 07-11-2013 |
20130179667 | METHODS AND SYSTEMS FOR STATE SWITCHING - Disclosed are methods and systems for state switching. The method is applied to a first hardware system. The first hardware system is connected with a second hardware system. The first hardware system has a first operation state and a second operation state. The second hardware system includes a memory unit. The memory unit has a first access state and a second access state. The memory unit is in the first access state currently. The method includes: the first hardware system sends an access state switching instruction to the second hardware system when the first hardware system enters the second operation state from the first operation state, wherein, the access state switching instruction is adapted to switch the memory unit of the second hardware system from the first access state to the second access state. The application of the present invention can ensure the security of key data, avoid the access of key data by malicious software, reduce the implementation costs and has a higher extensibility. | 07-11-2013 |
20130219154 | CONTEXT STATE MANAGEMENT FOR PROCESSOR FEATURE SETS - Embodiments of an invention related to context state management based on processor features are disclosed. In one embodiment, a processor includes instruction logic and state management logic. The instruction logic is to receive a state management instruction having a parameter to identify a subset of the features supported by the processor. The state management logic is to perform a state management operation specified by the state management instruction. | 08-22-2013 |
20130227254 | DIFFERENTIAL STACK-BASED SYMMETRIC CO-ROUTINES - A computing device initiates execution of a first co-routine on the computing device. The first co-routine utilizes an execution stack in a memory of the computing device. A differential symmetric co-routine module pauses execution of the first co-routine and, subsequently, resumes execution of the first co-routine utilizing the same execution stack. | 08-29-2013 |
20130238882 | MULTI-CORE PROCESSOR SYSTEM, MONITORING CONTROL METHOD, AND COMPUTER PRODUCT - A multi-core processor system includes a given core among multiple cores, wherein the given core is configured to detect execution of a process by the cores; and generate upon detecting the execution of the process, a specific thread that saves state information indicating an executed state of the process and an executed state of each thread to be monitored of the process. | 09-12-2013 |
20130290688 | Method of Concurrent Instruction Execution and Parallel Work Balancing in Heterogeneous Computer Systems - Embodiments of the present invention provide for concurrent instruction execution in heterogeneous computer systems by forming a parallel execution context whenever a first software thread encounters a parallel execution construct. The parallel execution context may comprise a reference to instructions to be executed concurrently, a reference to data said instructions may depend on, and a parallelism level indicator whose value specifies the number of times said instructions are to be executed. The first software thread may then signal to other software threads to begin concurrent execution of instructions referenced in said context. Each software thread may then decrease the parallelism level indicator and copy data referenced in the parallel execution context to said thread's private memory location and modify said data to accommodate for the new location. Software threads may be executed by a processor and operate on behalf of other processing devices or remote computer systems. | 10-31-2013 |
20140006758 | Extension of CPU Context-State Management for Micro-Architecture State | 01-02-2014 |
20140019735 | Computer Processor Providing Exception Handling with Reduced State Storage - A computer architecture allows for simplified exception handling by restarting the program after exceptions at the beginning of idempotent regions, the idempotent regions allowing re-execution without the need for restoring complex state information from checkpoints. Recovery from mis-speculation may be provided by a similar mechanism but using smaller idempotent regions reflecting a more frequent occurrence of mis-speculation. A compiler generating different idempotent regions for speculation and exception handling is also disclosed. | 01-16-2014 |
20140032884 | Out-of-Order Checkpoint Reclamation in a Checkpoint Processing and Recovery Core Microarchitecture - Reclaiming checkpoints in a system in an order that differs from the order when the checkpoints are created. Reclaiming the checkpoints includes: creating one or more checkpoints, each of which having an initial state using system resources and holding the checkpoints state; identifying the completion of all the instructions associated with the checkpoint; reassigning all the instructions associated with the identified checkpoint to an immediately preceding checkpoint; and freeing the resources associated with the identified checkpoint. The checkpoint is created when the instruction that is checked is a conditional branch having a direction that cannot be predicted with a predetermined confidence level. | 01-30-2014 |
20140032885 | METHODS AND APPARATUS TO MANAGE PARTIAL-COMMIT CHECKPOINTS WITH FIXUP SUPPORT - Example methods and apparatus to manage partial commit-checkpoints are disclosed. A disclosed example method includes identifying a commit instruction associated with a region of instructions executed by a processor, identifying candidate instructions from the region of instructions, and generating a processor partial commit-checkpoint to save a current state of the processor, the checkpoint based on calculated register values associated with live instructions, and including instruction reference addresses to link the candidate instructions. | 01-30-2014 |
20140095847 | INSTRUCTION AND HIGHLY EFFICIENT MICRO-ARCHITECTURE TO ENABLE INSTANT CONTEXT SWITCH FOR USER-LEVEL THREADING - A processor uses multiple banks of an extended register set to store the contexts of multiple user-level threads. A current bank register provides a pointer to the bank that is currently active. A first thread saves its context (first context) in a first bank of the extended register set and a second thread saves its context (second context) in a second bank of the extended register set. When the processor receives an instruction for exchanging contexts between the first thread and the second thread, the processor changes the pointer from the first bank to the second bank, and executes the second thread using the second context stored in the second bank. | 04-03-2014 |
20140095848 | Tracking Operand Liveliness Information in a Computer System and Performing Function Based on the Liveliness Information - Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module. | 04-03-2014 |
20140122844 | INTELLIGENT CONTEXT MANAGEMENT - Intelligent context management for thread switching is achieved by determining that a register bank has not been used by a thread for a predetermined number of dispatches, and responsively disabling the register bank for use by that thread. A counter is incremented each time the thread is dispatched but the register bank goes unused. Usage or non-usage of the register bank is inferred by comparing a previous checksum for the register bank to a current checksum. If the previous and current checksums match, the system concludes that the register bank has not been used. If a thread attempts to access a disabled bank, the processor takes an interrupt, enables the bank, and resets the corresponding counter. For a system utilizing transactional memory, it is preferable to enable all of the register banks when thread processing begins to avoid aborted transactions from register banks disabled by lazy context management techniques. | 05-01-2014 |
20140122845 | OVERLAPPING ATOMIC REGIONS IN A PROCESSOR - In one embodiment, the present invention includes a processor having a core to execute instructions. This core can include various structures and logic that enable instructions of different atomic regions to be executed in an overlapping manner. To this end, the core can include a register file having registers to store data for use in execution of the instructions, and multiple shadow register files each to store a register checkpoint on initiation of a given atomic region. In this way, overlapping execution of atomic regions identified by a programmer or compiler can occur. Other embodiments are described and claimed. | 05-01-2014 |
20140156976 | METHOD, APPARATUS AND SYSTEM FOR SELECTIVE EXECUTION OF A COMMIT INSTRUCTION - Techniques and mechanisms for a processor to determine whether a commit action is to be performed. In an embodiment, a processor performs operations to determine whether a commit instruction is for contingent performance of a commit action. In another embodiment, one or more conditions of processor state are evaluated in response to determining that the commit instruction is for contingent performance of the commit action, where the evaluation is performed to determine whether the commit action indicated by the commit instruction is to be performed. | 06-05-2014 |
20140189328 | POWER REDUCTION BY USING ON-DEMAND RESERVATION STATION SIZE - A computer processor, a computer system and a corresponding method involve a reservation station that stores instructions which are not ready for execution. The reservation station includes a storage area that is divided into bundles of entries. Each bundle is switchable between an open state in which instructions can be written into the bundle and a closed state in which instructions cannot be written into the bundle. A controller selects which bundles are open based on occupancy levels of the bundles. | 07-03-2014 |
20140189329 | COOPERATIVE THREAD ARRAY GRANULARITY CONTEXT SWITCH DURING TRAP HANDLING - Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handling routine that includes a context switch. The execution units perform this context switch for at least one of the execution units as part of the trap handling routine while allowing the remaining execution units to exit the trap handling routine before the context switch. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors. | 07-03-2014 |
20140208083 | MULTI-THREADED LOGGING - A data slot may be reserved for a first thread selected from a plurality of threads executed by a computer system. A memory of the computer system may comprise a plurality of log files and a next free data slot pointer. Each log file may comprise a plurality of data slots and each of the data slots may be of a common size. Reserving the data slot for the first thread may comprise attempting to perform a first atomic operation to write to a first data slot pointed to by a current value of the next free data slot pointer an indication that the first data slot is filled. If the first atomic operation is successful, the computer system may update the next free data slot pointer to point to a second data slot positioned sequentially after the first data slot. If the first atomic operation is unsuccessful, the computer system may analyze the second data slot. | 07-24-2014 |
20140244985 | INTELLIGENT CONTEXT MANAGEMENT - Intelligent context management for thread switching is achieved by determining that a register bank has not been used by a thread for a predetermined number of dispatches, and responsively disabling the register bank for use by that thread. A counter is incremented each time the thread is dispatched but the register bank goes unused. Usage or non-usage of the register bank is inferred by comparing a previous checksum for the register bank to a current checksum. If the previous and current checksums match, the system concludes that the register bank has not been used. If a thread attempts to access a disabled bank, the processor takes an interrupt, enables the bank, and resets the corresponding counter. For a system utilizing transactional memory, it is preferable to enable all of the register banks when thread processing begins to avoid aborted transactions from register banks disabled by lazy context management techniques. | 08-28-2014 |
20140281437 | Robust and High Performance Instructions for System Call - Robust system call and system return instructions are executed by a processor to transfer control between a requester and an operating system kernel. The processor includes execution circuitry and registers that store pointers to data structures in memory. The execution circuitry receives a system call instruction from a requester to transfer control from a first privilege level of the requester to a second privilege level of an operating system kernel. In response, the execution circuitry swaps the data structures that are pointed to by the registers between the requester and the operating system kernel in one atomic transition. | 09-18-2014 |
20140325192 | MEMRISTOR BASED MULTITHREADING - A method and a device that includes a set of multiple pipeline stages, wherein the set of multiple pipeline stages is arranged to execute a first thread of instructions; multiple memristor based registers that are arranged to store a state of another thread of instructions that differs from the first thread of instructions; and a control circuit that is arranged to control a thread switch between the first thread of instructions and the other thread of instructions by controlling a storage of a state of the first thread of instructions at the multiple memristor based registers and by controlling a provision of the state of the other thread of instructions by the set of multiple pipeline stages; wherein the set of multiple pipeline stages is arranged to execute the other thread of instructions upon a reception of the state of the other thread of instructions. | 10-30-2014 |
20140325193 | DYNAMIC INSTRUMENTATION - Techniques for dynamic instrumentation are provided. A method for instrumentation preparation may include obtaining address data of an original instruction in an original instruction stream, obtaining kernel mode data comprising a kernel breakpoint handler, obtaining user mode data comprising a user breakpoint handler, allocating a page of a process address space, creating a trampoline, associating the trampoline with a breakpoint instruction, and replacing the original instruction with the breakpoint instruction. A method for instrumentation may include detecting the breakpoint instruction, calling the kernel breakpoint handler, modifying an instruction pointer via the kernel breakpoint handler such that the instruction pointer points to the trampoline, and executing the trampoline. The system for instrumentation may include a breakpoint setup module and a breakpoint execution module for respectively setting up and completing instrumentation involving the trampoline. | 10-30-2014 |
20150026441 | METHOD AND SYSTEM OF INSERTING MARKING VALUES USED TO CORRELATE TRACE DATA AS BETWEEN PROCESSOR CORES - A method and system of inserting marker values used to correlate trace data as between processor cores. At least some of the illustrative embodiments are integrated circuit devices comprising a first processor core, a first data collection portion coupled to the first processor core and configured to gather data comprising addresses of instructions executed by the first processor core, a second processor core communicatively coupled to the first processor core, and a second data collection portion coupled to the first processor core and configured to gather data comprising addresses of instructions executed by the second processor core. The integrated circuit device is configured to insert marker values into the data of the first and second processor cores which allow correlation of the data such that contemporaneously executed instruction are identifiable. | 01-22-2015 |
20150039869 | Handling Operating System (Os) Transitions In An Unbounded Transactional Memory (Utm) Mode - In one embodiment, the present invention includes a method for receiving control in a kernel mode via a ring transition from a user thread during execution of an unbounded transactional memory (UTM) transaction, updating a state of a transaction status register (TSR) associated with the user thread and storing the TSR with a context of the user thread, and later restoring the context during a transition from the kernel mode to the user thread. In this way, the UTM transaction may continue on resumption of the user thread. Other embodiments are described and claimed. | 02-05-2015 |
20150095627 | TWO LEVEL RE-ORDER BUFFER - In response to detecting one or more conditions are met, a checkpoint of a current state of a thread may be created. One or more incomplete instructions may be moved from a first level of a re-order buffer to a second level of the re-order buffer. Each incomplete instruction may be currently executing or awaiting execution. | 04-02-2015 |
20150113255 | SHARING VIRTUAL FUNCTIONS IN A SHARED VIRTUAL MEMORY BETWEEN HETEROGENEOUS PROCESSORS OF A COMPUTING PLATFORM - A computing platform may include heterogeneous processors (e.g., CPU and a GPU) to support sharing of virtual functions between such processors. In one embodiment, a CPU side vtable pointer used to access a shared object from the CPU | 04-23-2015 |
20150317161 | SYSTEM AND METHOD OF CONTEXT SWITCHING - Techniques related to systems, articles, and methods of context switching. | 11-05-2015 |
20150347132 | THREAD CONTEXT PRESERVATION IN A MULTITHREADING COMPUTER SYSTEM - According to one aspect, a computer-implemented method for thread context preservation in a configuration including a core configurable between a single thread (ST) mode and a multithreading (MT) mode is provided. The ST mode addresses a primary thread, and the MT mode addresses the primary thread and one or more secondary threads on shared resources of the core. Based on determining, by the core in the MT mode, that MT is to be disabled, switching from the MT mode to the ST mode is performed, where the primary thread of the MT mode is maintained as the primary thread of the ST mode. A thread context including program accessible register values and program counter values of the one or more secondary threads is made inaccessible to programs. Based on the switching, any one of clearing the program accessible register values or retaining the program accessible register values is performed. | 12-03-2015 |
20150370570 | Computer Processor Employing Temporal Addressing For Storage Of Transient Operands - A computer processor including a plurality of storage elements logically organized as a fixed length queue referenced by logical temporal addresses. The fixed length queue operates over multiple cycles to temporarily store operands referenced by at least one instruction utilizing the logical temporal addresses. A plurality of functional units performs operations over the multiple cycles, wherein the operations produce and access operands stored in the logical fixed length queue. Operands can be added to the front of the logical fixed length queue according to the temporal order that operands are produced by the functional units, and operands can drop from the end of the logical fixed length queue as operands are added to the front of the fixed length queue. A plurality of operands produced by the plurality of functional units (possibly with different latencies in producing such operands) can be added to the logical fixed length queue in a single cycle. A plurality of operands operated on by the functional units can be accessed from the logical fixed length queue in a single cycle. | 12-24-2015 |
20160004533 | Methods And Apparatuses For Reducing Power Consumption Of Processor Switch Operations - Methods and apparatuses for reducing power consumption of processor switch operations are disclosed. One or more embodiments may comprise specifying a subset of registers or state storage elements to be involved in a register or state storage operation, performing the register or state storage operation, and performing a switch operation. The embodiments may minimize the number of registers or state storage elements involved with the standby operation by specifying only the subset of registers or state storage elements, which may involve considerably fewer than the total number of registers or state storage or elements of the processor. The switch operation may be switch from one mode to another, such as a transition to or from a sleep mode, a context switch, or the execution of various types of instructions. | 01-07-2016 |
20160019066 | EXECUTION OF DIVERGENT THREADS USING A CONVERGENCE BARRIER - A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared. | 01-21-2016 |
20160092222 | INSTRUCTION AND LOGIC FOR BULK REGISTER RECLAMATION - A processor includes a front end, a decoder, an allocator, and a retirement unit. The decoder includes logic to identify an end-of-live-range (EOLR) indicator. The EOLR indicator specifies an architectural register and a location in code for which the architectural register is unused. The allocator includes logic to scan for a mapping of the architectural register to a physical register, based upon the EOLR indicator. The allocator also includes logic to generate a request to disassociate the architectural register from the physical register. The retirement unit includes logic to disassociate the architectural register from the physical register. | 03-31-2016 |
20160092224 | CHECKPOINTS FOR A SIMULTANEOUS MULTITHREADING PROCESSOR - According to an aspect, a system for checkpoint acceleration in a simultaneous multithreading (SMT) processor includes circuitry of a processor core of the SMT processor to execute one or more threads in a processing pipeline. The processing pipeline includes a completion stage followed by a checkpoint stage. The system also includes a checkpoint accelerator disposed between the completion stage and the checkpoint stage. The checkpoint accelerator includes a backlog queue that stores a list of next-to-complete groups of instructions from the one or more threads anticipated to complete in an upcoming cycle. The checkpoint accelerator also includes a selection control that drives one or more of the next-to-complete groups of instructions from the backlog queue to the checkpoint stage based on one or more completion indicators that identify which of the next-to-complete groups of instructions actually completed. | 03-31-2016 |
20160092225 | CHECKPOINTS FOR A SIMULTANEOUS MULTITHREADING PROCESSOR - According to an aspect, a method of checkpoint acceleration in a simultaneous multithreading (SMT) processor includes executing one or more threads in a processing pipeline of a processor core of the SMT processor, where the processing pipeline includes a completion stage followed by a checkpoint stage. A list of next-to-complete groups of instructions from the one or more threads anticipated to complete in an upcoming cycle is stored in a backlog queue. One or more of the next-to-complete groups of instructions are driven from the backlog queue to the checkpoint stage based on one or more completion indicators identifying which of the next-to-complete groups of instructions actually completed. | 03-31-2016 |
20160098273 | SERVICING MULTIPLE COUNTERS BASED ON A SINGLE ACCESS CHECK - A system and method for implementing a servicing instruction for a plurality of counters that includes determining a counter set based on the servicing instruction, whether access is authorized to the counter set, and a block of storage in a memory based on the service instruction. In response to the determining that the access is authorized, the system and method extracts the plurality of counters within the counter set in response to the determining that the access is authorized and storing the plurality of counters in the block of storage. | 04-07-2016 |
20160110197 | PROCESSOR STRESSMARKS GENERATION - One aspect is a method that includes analyzing, by a processor of an analysis system, an instruction set architecture of a targeted processor to generate an instruction set profile for each instruction of the instruction set architecture. A combination of instruction sequences for the targeted processor is determined from the instruction set profile that corresponds to a desired stressmark type. The desired stressmark type defines a metric representative of functionality of interest of the targeted processor. Performance of the targeted processor is monitored with respect to the desired stressmark type while executing each of the instruction sequences. One of the instruction sequences is identified as most closely aligning with the desired stressmark type based on performance results of execution of the instruction sequences with respect to the desired stressmark type. | 04-21-2016 |
20160110198 | GENERATION AND APPLICATION OF STRESSMARKS IN A COMPUTER SYSTEM - One aspect is a method that includes analyzing, by a processor of an analysis system, an instruction set architecture of a targeted complex-instruction set computer (CISC) processor to generate an instruction set profile for each CISC architectural instruction variant of the instruction set architecture. A combination of instruction sequences for the targeted CISC processor is determined from the instruction set profile that corresponds to a desired stressmark type. The desired stressmark type defines a metric representative of functionality of interest of the targeted CISC processor. Performance of the targeted CISC processor is monitored with respect to the desired stressmark type while executing each of the instruction sequences. One of the instruction sequences is identified as most closely aligning with the desired stressmark type based on performance results of execution of the instruction sequences with respect to the desired stressmark type. | 04-21-2016 |
20160117169 | INSTRUCTIONS CONTROLLING ACCESS TO SHARED REGISTERS OF A MULTI-THREADED PROCESSOR - Atomic instructions, including a Compare And Swap Register, a Load and AND Register, and a Load and OR Register instruction, use registers instead of storage to communicate and share information in a multi-threaded processor. The registers are accessible to multiple threads of the multi-threaded processor, and the instructions operate on these shared registers. Access to the shared registers is controlled by the instructions via interlocking. | 04-28-2016 |
20160139922 | CONTEXT SENSITIVE BARRIERS IN DATA PROCESSING - Apparatus for data processing and a method of data processing are provided, according to which the processing circuitry of the apparatus can access a memory system and execute data processing instructions in one context of multiple contexts which it supports. When the processing circuitry executes a barrier instruction, the resulting access ordering constraint may be limited to being enforced for accesses which have been initiated by the processing circuitry when operating in an identified context, which may for example be the context in which the barrier instruction has been executed. This provides a separation between the operation of the processing circuitry in its multiple possible contexts and in particular avoids delays in the completion of the access ordering constraint, for example relating to accesses to high latency regions of memory, from affecting the timing sensitivities of other contexts. | 05-19-2016 |
20160147559 | MODIFICATION OF CONTEXT SAVING FUNCTIONS - A method for modifying a context saving function is disclosed. The method identifies a context saving function within a code fragment. The method further modifies the context saving function to determine a size of a register save buffer, allocate the register save buffer using the determined size, and save a register value in the register save buffer. | 05-26-2016 |
20160154649 | SWITCHING METHODS FOR CONTEXT MIGRATION AND SYSTEMS THEREOF | 06-02-2016 |