Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Flachs, TX

Brian Flachs, Georgetown, TX US

Patent application number	Description	Published
20080307201	Method and Apparatus for Cooperative Software Multitasking In A Processor System with a Partitioned Register File - A processor system executes multiple applet programs within a software application program in an information handling system. The information handling system includes operating system software that manages processor system hardware and software in a multi-tasking environment. In particular, the operating system software manages partitioning of a register file in the processor system to achieve a cooperative relationship among multiple applet programs within respective partitions of the register file. In one embodiment, the operating system software manages unique applet ID's to modify register file partition sizes and locations during applet program instruction text execution. In one embodiment, applet ID masking hardware provides sharing of register file space among multiple copies of applet program code.	12-11-2008
20090070654	Design Structure For A Processor System With Background Error Handling Feature - A design structure for a processor system may be embodied in a machine readable medium for designing, manufacturing or testing a processor integrated circuit. The design structure may embody a processor system that integrates error correcting code (ECC) detection and correction hardware within an memory management circuit. The design structure may specify ECC hardware circuitry that provides detection, correction and generation of ECC data bits in conjunction with memory data read and writes. The design structure for the processor system may permit the detection and correction of soft single bit errors read from local memory in-line while using read modify write DMA circuit logic to correct local memory data. The design structure may provide for local memory data error detection and correction in a background memory scrub process without the need for additional in-line data logic.	03-12-2009
20090204781	System for Limiting the Size of a Local Storage of a Processor - A system for limiting the size of a local storage of a processor is provided. A facility is provided in association with a processor for setting a local storage size limit. This facility is a privileged facility and can only be accessed by the operating system running on a control processor in the multiprocessor system or the associated processor itself. The operating system sets the value stored in the local storage limit register when the operating system initializes a context switch in the processor. When the processor accesses the local storage using a request address, the local storage address corresponding to the request address is compared against the local storage limit size value in order to determine if the local storage address, or a modulo of the local storage address, is used to access the local storage.	08-13-2009
20110145503	ON-LINE OPTIMIZATION OF SOFTWARE INSTRUCTION CACHE - A method for computing includes executing a program, including multiple cacheable lines of executable code, on a processor having a software-managed cache. A run-time cache management routine running on the processor is used to assemble a profile of inter-line jumps occurring in the software-managed cache while executing the program. Based on the profile, an optimized layout of the lines in the code is computed, and the lines of the program are re-ordered in accordance with the optimized layout while continuing to execute the program.	06-16-2011
20110320785	Binary Rewriting in Software Instruction Cache - Mechanisms are provided for dynamically rewriting branch instructions in a portion of code. The mechanisms execute a branch instruction in the portion of code. The mechanisms determine if a target instruction of the branch instruction, to which the branch instruction branches, is present in an instruction cache associated with the processor. Moreover, the mechanisms directly branch execution of the portion of code to the target instruction in the instruction cache, without intervention from an instruction cache runtime system, in response to a determination that the target instruction is present in the instruction cache. In addition, the mechanisms redirect execution of the portion of code to the instruction cache runtime system in response to a determination that the target instruction cannot be determined to be present in the instruction cache.	12-29-2011
20110320786	Dynamically Rewriting Branch Instructions in Response to Cache Line Eviction - Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.	12-29-2011
20110321002	Rewriting Branch Instructions Using Branch Stubs - Mechanisms are provided for rewriting branch instructions in a portion of code. The mechanisms receive a portion of source code having an original branch instruction. The mechanisms generate a branch stub for the original branch instruction. The branch stub stores information about the original branch instruction including an original target address of the original branch instruction. Moreover, the mechanisms rewrite the original branch instruction so that a target of the rewritten branch instruction references the branch stub. In addition, the mechanisms output compiled code including the rewritten branch instruction and the branch stub for execution by a computing device. The branch stub is utilized by the computing device at runtime to determine if execution of the rewritten branch instruction can be redirected directly to a target instruction corresponding to the original target address in an instruction cache of the computing device without intervention by an instruction cache runtime system.	12-29-2011
20110321021	Arranging Binary Code Based on Call Graph Partitioning - Mechanisms are provided for arranging binary code to reduce instruction cache conflict misses. These mechanisms generate a call graph of a portion of code. Nodes and edges in the call graph are weighted to generate a weighted call graph. The weighted call graph is then partitioned according to the weights, affinities between nodes of the call graph, and the size of cache lines in an instruction cache of the data processing system, so that binary code associated with one or more subsets of nodes in the call graph are combined into individual cache lines based on the partitioning. The binary code corresponding to the partitioned call graph is then output for execution in a computing device.	12-29-2011
20120023316	PARALLEL LOOP MANAGEMENT - The illustrative embodiments comprise a method, data processing system, and computer program product having a processor unit for processing instructions with loops. A processor unit creates a first group of instructions having a first set of loops and second group of instructions having a second set of loops from the instructions. The first set of loops have a different order of parallel processing from the second set of loops. A processor unit processes the first group. The processor unit monitors terminations in the first set of loops during processing of the first group. The processor unit determines whether a number of terminations being monitored in the first set of loops is greater than a selectable number of terminations. In response to a determination that the number of terminations is greater than the selectable number of terminations, the processor unit ceases processing the first group and processes the second group.	01-26-2012
20120057637	Arithmetic Decoding Acceleration - Mechanisms for performing decoding of context-adaptive binary arithmetic coding (CABAC) encoded data. The mechanisms receive, in a first single instruction multiple data (SIMD) vector register of the data processing system, CABAC encoded data of a bit stream. The CABAC encoded data comprises a value to be decoded and bit stream state information. The mechanisms receive, in a second SIMD vector register of the data processing system, CABAC decoder context information. The mechanisms process the value, the bit stream state information, and the CABAC decoder context information in a non-recursive manner to generate a decoded value, updated bit stream state information, and updated CABAC decoder context information. The mechanisms store, in a third SIMD vector register, a result vector that combines the decoded value, updated bit stream state information, and updated CABAC decoder context information. The mechanisms use the decoded value to generate a video output on the data processing system.	03-08-2012
20120110348	Secure Page Tables in Multiprocessor Environments - A system comprises a memory module configured to store signed page table data and a selected processing element coupled to the memory module. The selected processing element is one of a plurality of processing elements, which together comprise a portion of a multiprocessor system. The selected processing element is configured to authenticate page table management code and, based on authenticated page table management code, to sign page table data that is subsequently stored in the memory module, and to verify signed page table data that is read from the memory module.	05-03-2012
20120198169	Binary Rewriting in Software Instruction Cache - Mechanisms are provided for dynamically rewriting branch instructions in a portion of code. The mechanisms execute a branch instruction in the portion of code. The mechanisms determine if a target instruction of the branch instruction, to which the branch instruction branches, is present in an instruction cache associated with the processor. Moreover, the mechanisms directly branch execution of the portion of code to the target instruction in the instruction cache, without intervention from an instruction cache runtime system, in response to a determination that the target instruction is present in the instruction cache. In addition, the mechanisms redirect execution of the portion of code to the instruction cache runtime system in response to a determination that the target instruction cannot be determined to be present in the instruction cache.	08-02-2012
20120198170	Dynamically Rewriting Branch Instructions in Response to Cache Line Eviction - Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.	08-02-2012
20120198429	Arranging Binary Code Based on Call Graph Partitioning - Mechanisms are provided for arranging binary code to reduce instruction cache conflict misses. These mechanisms generate a call graph of a portion of code. Nodes and edges in the call graph are weighted to generate a weighted call graph. The weighted call graph is then partitioned according to the weights, affinities between nodes of the call graph, and the size of cache lines in an instruction cache of the data processing system, so that binary code associated with one or more subsets of nodes in the call graph are combined into individual cache lines based on the partitioning. The binary code corresponding to the partitioned call graph is then output for execution in a computing device.	08-02-2012
20120204016	Rewriting Branch Instructions Using Branch Stubs - Mechanisms are provided for rewriting branch instructions in a portion of code. The mechanisms receive a portion of source code having an original branch instruction. The mechanisms generate a branch stub for the original branch instruction. The branch stub stores information about the original branch instruction including an original target address of the original branch instruction. Moreover, the mechanisms rewrite the original branch instruction so that a target of the rewritten branch instruction references the branch stub. In addition, the mechanisms output compiled code including the rewritten branch instruction and the branch stub for execution by a computing device. The branch stub is utilized by the computing device at runtime to determine if execution of the rewritten branch instruction can be redirected directly to a target instruction corresponding to the original target address in an instruction cache of the computing device without intervention by an instruction cache runtime system.	08-09-2012
20140250275	SELECTION OF POST-REQUEST ACTION BASED ON COMBINED RESPONSE AND INPUT FROM THE REQUEST SOURCE - A data structure includes a plurality of entries each corresponding to a different systemwide combined response of a data processing system. A particular entry includes identifiers of multiple possible actions that can be taken in response to a systemwide combined response. Master logic issues a memory access request on a system fabric of the data processing system. The master logic, responsive to receiving the systemwide combined response and a selection of one of the multiple possible actions from a source of the memory access request prior to receipt of the systemwide combined response, selects the particular entry based on the systemwide combined response and selects one of the multiple possible actions identified in the particular entry based on the received selection. The master logic services the memory access request in accordance with the systemwide combined response by performing the selected one of the multiple possible actions.	09-04-2014
20140250276	SELECTION OF POST-REQUEST ACTION BASED ON COMBINED RESPONSE AND INPUT FROM THE REQUEST SOURCE - A data structure includes a plurality of entries each corresponding to a different systemwide combined response of a data processing system. A particular entry includes identifiers of multiple possible actions that can be taken in response to a systemwide combined response. Master logic issues a memory access request on a system fabric of the data processing system. The master logic, responsive to receiving the systemwide combined response and a selection of one of the multiple possible actions from a source of the memory access request prior to receipt of the systemwide combined response, selects the particular entry based on the systemwide combined response and selects one of the multiple possible actions identified in the particular entry based on the received selection. The master logic services the memory access request in accordance with the systemwide combined response by performing the selected one of the multiple possible actions.	09-04-2014

Patent applications by Brian Flachs, Georgetown, TX US

Brian Flachs, Austin, TX US

Patent application number	Description	Published
20110161548	Efficient Multi-Level Software Cache Using SIMD Vector Permute Functionality - A cache manager receives a request for data, which includes a requested effective address. The cache manager determines whether the requested effective address matches a most recently used effective address stored in a mapped tag vector. When the most recently used effective address matches the requested effective address, the cache manager identifies a corresponding cache location and retrieves the data from the identified cache location. However, when the most recently used effective address fails to match the requested effective address, the cache manager determines whether the requested effective address matches a subsequent effective address stored in the mapped tag vector. When the cache manager determines a match to a subsequent effective address, the cache manager identifies a different cache location corresponding to the subsequent effective address and retrieves the data from the different cache location.	06-30-2011
20110161641	SPE Software Instruction Cache - An application thread executes a direct branch instruction that is stored in an instruction cache line. Upon execution, the direct branch instruction branches to a branch descriptor that is also stored in the instruction cache line. The branch descriptor includes a trampoline branch instruction and a target instruction space address. Next, the trampoline branch instruction sends a branch descriptor pointer, which points to the branch descriptor, to an instruction cache manager. The instruction cache manager extracts the target instruction space address from the branch descriptor, and executes a target instruction corresponding to the target instruction space address. In one embodiment, the instruction cache manager generates a target local store address by masking off a portion of bits included in the target instruction space address. In turn, the application thread executes the target instruction located at the target local store address accordingly.	06-30-2011
20120198425	MANAGEMENT OF CONDITIONAL BRANCHES WITHIN A DATA PARALLEL SYSTEM - A compiler of a single instruction multiple data (SIMD) information handling system (IHS) identifies “if-then-else” statements that offer opportunity for conditional branch conversion. The compiler converts those “if-then-else” statements into “conditional branch and prepare” statements as well as “branch return” statements. The compiler compiles source code file information containing “if-then-else” statement opportunities into compiled code, namely an executable program. The SIMD IHS employs a processor or processors to execute the executable program. During execution, the processor generates and updates SIMD lane mask information to track and manage the conditional branch loops of the executing program. The processor saves branch addresses and employs SIMD lane masks to identify conditional branch loops with different branch conditions than previous conditional branch loops. The processor may reduce SIMD IHS processing time during processing of compiled code of the original “if-then-else” statements. The processor continues processing next statements inline after all SIMD lanes are complete, while providing speculative and parallel processing capability for multiple data operations of the executable program.	08-02-2012

Brian K. Flachs, Georgetown, TX US

Patent application number	Description	Published
20090112550	System and Method for Generating a Worst Case Current Waveform for Testing of Integrated Circuit Devices - A system and method for generating a worst case current waveform for testing of integrated circuit devices are provided. Architectural analysis of an integrated circuit device is first performed to determine an initial worst case power workload to be applied to the integrated circuit device. Thereafter, the derived worst case power workload is applied to a model and is simulated to generate a worst case current waveform that is input to an electrical model of the integrated circuit device to generate a worst case noise budget value. The worst case noise budget value is then compared to measured noise from application of the worst case power workload to a hardware implemented integrated circuit device. The worst case current waveform may be selected for future testing of integrated circuit devices or modifications to the simulation models may be performed and the process repeated based on the results of the comparison.	04-30-2009
20100161846	Multithreaded Programmable Direct Memory Access Engine - A mechanism programming a direct memory access engine operating as a multithreaded processor is provided. A plurality of programs is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the plurality of programs located in the local memory is to be executed. The direct memory access engine executes two or more of the plurality of programs without intervention by a host processor. As each of the two or more of the plurality of programs completes execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution.	06-24-2010
20100161848	Programmable Direct Memory Access Engine - A mechanism for programming a direct memory access engine operating as a single thread processor is provided. A program is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the program located in the local memory is to be executed. The direct memory access engine executes the program without intervention by a host processor. Responsive to the program completing execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution.	06-24-2010
20110066769	Multithreaded Programmable Direct Memory Access Engine - A mechanism programming a direct memory access engine operating as a multithreaded processor is provided. A plurality of programs is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the plurality of programs located in the local memory is to be executed. The direct memory access engine executes two or more of the plurality of programs without intervention by a host processor. As each of the two or more of the plurality of programs completes execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution.	03-17-2011
20110161623	Data Parallel Function Call for Determining if Called Routine is Data Parallel - Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.	06-30-2011
20110161624	Floating Point Collect and Operate - Mechanisms are provided for performing a floating point collect and operate for a summation across a vector for a dot product operation. A routing network placed before the single instruction multiple data (SIMD) unit allows the SIMD unit to perform a summation across a vector with a single stage of adders. The routing network routes the vector elements to the adders in a first cycle. The SIMD unit stores the results of the adders into a results vector register. The routing network routes the summation results from the results vector register to the adders in a second cycle. The SIMD unit then stores the results from the second cycle in the results vector register.	06-30-2011
20110161642	Parallel Execution Unit that Extracts Data Parallelism at Runtime - Mechanisms for extracting data dependencies during runtime are provided. With these mechanisms, a portion of code having a loop is executed. A first parallel execution group is generated for the loop, the group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The first parallel execution group is executed by executing each iteration in parallel. Store data for iterations are stored in corresponding store caches of the processor. Dependency checking logic of the processor determines, for each iteration, whether the iteration has a data dependence. Only the store data for stores where there was no data dependence determined are committed to memory.	06-30-2011
20110161643	Runtime Extraction of Data Parallelism - Mechanisms for extracting data dependencies during runtime are provided. The mechanisms execute a portion of code having a loop and generate, for the loop, a first parallel execution group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The mechanisms further execute the first parallel execution group and determining, for each iteration in the subset of iterations, whether the iteration has a data dependence. Moreover, the mechanisms commit store data to system memory only for stores performed by iterations in the subset of iterations for which no data dependence is determined. Store data of stores performed by iterations in the subset of iterations for which a data dependence is determined is not committed to the system memory.	06-30-2011
20110252260	Reducing Power Requirements of a Multiple Core Processor - A mechanism is provided for reducing power consumed by a multi-core processor. Responsive to a number of properly functioning processor cores being more than a required number of processor cores in a multi-core processor, the power consumption measurement module determines a number of the properly functioning processor cores to disable. The power consumption measurement module initiates an equal amount of workload to be processed by each of the properly functioning processor cores. The power consumption measurement module determines power consumed by each of the properly functioning processor cores. The power consumption measurement module deactivates one or more of the properly functioning processor cores that have maximum power in order that the number of properly functioning processor cores deactivated is equal to the number of properly functioning processor cores to disable.	10-13-2011
20120180031	Data Parallel Function Call for Determining if Called Routine is Data Parallel - Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.	07-12-2012
20120191953	Parallel Execution Unit that Extracts Data Parallelism at Runtime - Mechanisms for extracting data dependencies during runtime are provided. With these mechanisms, a portion of code having a loop is executed. A first parallel execution group is generated for the loop, the group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The first parallel execution group is executed by executing each iteration in parallel. Store data for iterations are stored in corresponding store caches of the processor, Dependency checking logic of the processor determines, for each iteration, whether the iteration has a data dependence. Only the store data for stores where there was no data dependence determined are committed to memory.	07-26-2012
20120192167	Runtime Extraction of Data Parallelism - Mechanisms for extracting data dependencies during runtime are provided. The mechanisms execute a portion of code having a loop and generate, for the loop, a first parallel execution group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The mechanisms further execute the first parallel execution group and determining, for each iteration in the subset of iterations, whether the iteration has a data dependence. Moreover, the mechanisms commit store data to system memory only for stores performed by iterations in the subset of iterations for which no data dependence is determined. Store data of stores performed by iterations in the subset of iterations for which a data dependence is determined is not committed to the system memory.	07-26-2012
20120246354	Multithreaded Programmable Direct Memory Access Engine - A mechanism programming a direct memory access engine operating as a multithreaded processor is provided. A plurality of programs is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the plurality of programs located in the local memory is to be executed. The direct memory access engine executes two or more of the plurality of programs without intervention by a host processor. As each of the two or more of the plurality of programs completes execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution.	09-27-2012

Patent applications by Brian K. Flachs, Georgetown, TX US

Brian King Flachs, Georgetown, TX US

Patent application number	Description	Published
20110072170	Systems and Methods for Transferring Data to Maintain Preferred Slot Positions in a Bi-endian Processor - A bi-endian multiprocessor system having multiple processing elements, each of which includes a processor core, a local memory and a memory flow controller. The memory flow controller transfers data between the local memory and data sources external to the processing element. If the processing element and the data source implement data representations having the same endian-ness, each multi-word line of data is stored in the local memory in the same word order as in the data source. If the processing element and the data source implement data representations having different endian-ness, the words of each multi-word line of data are transposed when data is transferred between local memory and the data source. The processing element may incorporate circuitry to add doublewords, wherein the circuitry can alternately carry bits from a first word to a second word or vice versa, depending upon whether the words in lines of data are transposed.	03-24-2011

Patent applications by Brian King Flachs, Georgetown, TX US