Jamie R. Kuesel, Rochester US

Jamie R. Kuesel, Rochester, MN US

Patent application number	Description	Published
20080288780	LOW-LATENCY DATA DECRYPTION INTERFACE - Methods and apparatus for reducing the impact of latency associated with decrypting encrypted data are provided. Rather than wait until an entire packet of encrypted data is validated (e.g., by checking for data transfer errors), the encrypted data may be pipelined to a decryption engine as it is received, thus allowing decryption to begin prior to validation. In some cases, the decryption engine may be notified of data transfer errors detected during the validation process, in order to prevent reporting false security violations.	11-20-2008
20090063867	Method, System and Computer Program Product for Preventing Execution of Software Without a Dynamically Generated Key - A method, system and computer program product for partitioning the binary image of a software program, and partially removing code bits to create an encrypted software key, to increase software security. The software program's binary image is partitioned along a random segment length or a byte/nibble segment length, and the code bits removed, and stored, along with their positional data in a software key. The software key is encrypted and is separately distributed from the inoperable binary image to the end user. The encrypted key is stored on a secure remote server. When the end user properly authenticates with the developer's remote servers, the encrypted security key is downloaded from the secure remote server and is locally decrypted. The removed code bits are reinserted into the fractioned binary image utilizing the positional location information. The binary image is then operable to complete execution of the software program.	03-05-2009
20090063868	Method, System and Computer Program Product for Preventing Execution of Pirated Software - A method, system and computer program product for preventing execution of pirated software. A file is loaded on an end user's computer containing a binary image that is generated by removing one or more code bits from an executable code. A request is sent to a remote server to return a software key required for execution of the executable code from the binary image. The software key is downloaded to the end user's computer on which the binary image is loaded. One or more bits from the software key is inserted into the appropriate location of the binary image to regenerate the executable code. The executable code is enabled for execution on the end user's computer only following the embedding of the one or more bits.	03-05-2009
20090144564	DATA ENCRYPTION INTERFACE FOR REDUCING ENCRYPT LATENCY IMPACT ON STANDARD TRAFFIC - Methods and apparatus that may be utilized in systems to reduce the impact of latency associated with encrypting data on non-encrypted data are provided. Secure and non-secure data may be routed independently. Thus, non-secure data may be forwarded on (e.g., to targeted write buffers), without waiting for previously sent secure data to be encrypted. As a result, non-secure data may be made available for subsequent processing much earlier than in conventional systems utilizing a common data path for both secure and non-secure data.	06-04-2009
20090157976	Network on Chip That Maintains Cache Coherency With Invalidate Commands - A network on chip (‘NOC’) that maintains cache coherency with invalidate commands, the NOC comprising integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controller, each IP block adapted to a router through a memory communications controller and a network interface controller, the NOC also including a port on a router of the network through which is received an invalidate command, the invalidate command including an identification of a cache line, the invalidate command representing an instruction to invalidate the cache line, the router configured to send the invalidate command to an IP block served by the router; the router further configured to send the invalidate command horizontally and vertically to neighboring routers if the port is a vertical port; and the router further configured to send the invalidate command only horizontally to neighboring routers if the port is a horizontal port.	06-18-2009
20090187716	Network On Chip that Maintains Cache Coherency with Invalidate Commands - A network on chip (‘NOC’) that maintains cache coherency, the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controller, each IP block adapted to a router through a memory communications controller and a network interface controller, at least one memory communications controller further comprising a cache coherency controller each memory communications controller controlling communication between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, wherein the memory communications controller configured to execute a memory access instruction and configured to determine a state of a cache line addressed by the memory access instruction, the state of the cache line being one of shared, exclusive, or invalid; the memory communications controller configured to broadcast an invalidate command to a plurality of IP blocks of the NOC if the state of the cache line is shared; and the memory communications controller configured to transmit an invalidate command only to an IP block that controls a cache where the cache line is stored if the state of the cache line is exclusive.	07-23-2009
20090201302	Graphics Rendering On A Network On Chip - Graphics rendering on a network on chip (‘NOC’) including receiving, in the geometry processor, a representation of an object to be rendered; converting, by the geometry processor, the representation of the object to two dimensional primitives; sending, by the geometry processor, the primitives to the plurality of scan converters; converting, by the scan converters, the primitives to fragments, each fragment comprising one or more portions of a pixel; for each fragment: selecting, by the scan converter for the fragment in dependence upon sorting rules, a pixel processor to process the fragment; sending, by the scan converter to the pixel processor, the fragment; and processing, by the pixel processor, the fragment to produce pixels for an image.	08-13-2009
20090271597	Branch Prediction In A Computer Processor - Methods, apparatus, and products for branch prediction in a computer processor are disclosed that include: recording for a sequence of occurrences of a branch, in an algorithm in which the branch occurs more than once, each result of the branch, including maintaining a pointer to a location of a most recently recorded result; resetting the pointer to a location of the first recorded result upon completion of the algorithm; and predicting subsequent results of the branch, in subsequent occurrences of the branch, in dependence upon the recorded results.	10-29-2009
20090282214	Network On Chip With Low Latency, High Bandwidth Application Messaging Interconnects That Abstract Hardware Inter-Thread Data Communications Into An Architected State of A Processor - Data processing on a network on chip (‘NOC’) that includes integrated processor (‘IP’) blocks, each of a plurality of the IP blocks including at least one computer processor, each such computer processor implementing a plurality of hardware threads of execution; low latency, high bandwidth application messaging interconnects; memory communications controllers; network interface controllers; and routers; each of the IP blocks adapted to a router through a separate one of the low latency, high bandwidth application messaging interconnects, a separate one of the memory communications controllers, and a separate one of the network interface controllers; each application messaging interconnect abstracting into an architected state of each processor, for manipulation by computer programs executing on the processor, hardware inter-thread communications among the hardware threads of execution; each memory communications controller controlling communication between an IP block and memory; each network interface controller controlling inter-IP block communications through routers.	11-12-2009
20090287885	Administering Non-Cacheable Memory Load Instructions - Administering non-cacheable memory load instructions in a computing environment where cacheable data is produced and consumed in a coherent manner without harming performance of a producer, the environment including a hierarchy of computer memory that includes one or more caches backed by main memory, the caches controlled by a cache controller, at least one of the caches configured as a write-back cache. Embodiments of the present invention include receiving, by the cache controller, a non-cacheable memory load instruction for data stored at a memory address, the data treated by the producer as cacheable; determining by the cache controller from a cache directory whether the data is cached; if the data is cached, returning the data in the memory address from the write-back cache without affecting the write-back cache's state; and if the data is not cached, returning the data from main memory without affecting the write-back cache's state.	11-19-2009
20120215988	Administering Non-Cacheable Memory Load Instructions - Administering non-cacheable memory load instructions in a computing environment where cacheable data is produced and consumed in a coherent manner without harming performance of a producer, the environment including a hierarchy of computer memory that includes one or more caches backed by main memory, the caches controlled by a cache controller, at least one of the caches configured as a write-back cache. Embodiments of the present invention include receiving, by the cache controller, a non-cacheable memory load instruction for data stored at a memory address, the data treated by the producer as cacheable; determining by the cache controller from a cache directory whether the data is cached; if the data is cached, returning the data in the memory address from the write-back cache without affecting the write-back cache's state; and if the data is not cached, returning the data from main memory without affecting the write-back cache's state.	08-23-2012
20120221711	REGULAR EXPRESSION SEARCHES UTILIZING GENERAL PURPOSE PROCESSORS ON A NETWORK INTERCONNECT - A first hardware node in a network interconnect receives a data packet from a network. The first hardware node examines the data packet for a regular expression. In response to the first hardware node failing to identify the regular expression in the data packet, the data packet is forwarded to a second hardware node in the network interconnect for further examination of the data packet in order to search for the regular expression in the data packet.	08-30-2012
20120260252	SCHEDULING SOFTWARE THREAD EXECUTION - A computer-implemented method, system, and/or computer program product schedules execution of software threads. A first software thread is executed together with a second software thread as a first software thread pair. A first content, which resulted from executing the first software pair together, of at least one performance counter, is stored. The first software thread is then executed with a third software thread as a second software thread pair, and the resulting second content of the performance counter(s) is stored. An identification is made of a most efficient software thread pair from the first and second software thread pairs. Upon receiving a request to re-execute the first software thread, the first software thread is selectively matched with either the second software thread or the third software thread for execution based on whether the first software thread pair or the second software thread pair has been identified as the most efficient software thread pair.	10-11-2012
20130160026	INDIRECT INTER-THREAD COMMUNICATION USING A SHARED POOL OF INBOXES - A circuit arrangement, method, and program product for communicating data between hardware threads of a network on a chip processing unit utilizes shared inboxes to communicate data to pools of hardware threads. The associated hardware in the pools threads receive data packets from the shared inboxes in response to issuing work requests to an associated shared inbox. Data packets include a source identifier corresponding to a hardware thread from which the data packet was generated, and the shared inboxes may manage data packet distribution to associated hardware threads based on the source identifier of each data packet. A shared inbox may also manage workload distribution and uneven workload lengths by communicating data packets to hardware threads associated with the shared inbox in response to receiving work requests from associated hardware threads.	06-20-2013
20130185704	PROVIDING PERFORMANCE TUNED VERSIONS OF COMPILED CODE TO A CPU IN A SYSTEM OF HETEROGENEOUS CORES - A compiler may optimize source code and any referenced libraries to execute on a plurality of different processor architecture implementations. For example, if a compute node has three different types of processors with three different architecture implementations, the compiler may compile the source code and generate three versions of object code where each version is optimized for one of the three different processor types. After compiling the source code, the resultant executable code may contain the necessary information for selecting between the three versions. For example, when a program loader assigns the executable code to the processor, the system determines the processor's type and ensures only the optimized version that corresponds to that type is executed. Thus, the operating system is free to assign the executable code to any processor based on, for example, the current status of the processor (i.e., whether its CPU is being fully utilized) and still enjoy the benefits of executing code that is optimized for whichever processor is assigned the executable code.	07-18-2013
20130185705	PROVIDING PERFORMANCE TUNED VERSIONS OF COMPILED CODE TO A CPU IN A SYSTEM OF HETEROGENEOUS CORES - A compiler may optimize source code and any referenced libraries to execute on a plurality of different processor architecture implementations. For example, if a compute node has three different types of processors with three different architecture implementations, the compiler may compile the source code and generate three versions of object code where each version is optimized for one of the three different processor types. After compiling the source code, the resultant executable code may contain the necessary information for selecting between the three versions. For example, when a program loader assigns the executable code to the processor, the system determines the processor's type and ensures only the optimized version that corresponds to that type is executed. Thus, the operating system is free to assign the executable code to any of the different types of processors.	07-18-2013
20130191600	COMBINED CACHE INJECT AND LOCK OPERATION - A circuit arrangement and method utilize cache injection logic to perform a cache inject and lock operation to inject a cache line in a cache memory and automatically lock the cache line in the cache memory in parallel with communication of the cache line to a main memory. The cache injection logic may additionally limit the maximum number of locked cache lines that may be stored in the cache memory, e.g., by aborting a cache inject and lock operation, injecting the cache line without locking, or unlocking and/or evicting another cache line in the cache memory.	07-25-2013
20140143557	DISTRIBUTED CHIP LEVEL POWER SYSTEM - A method, circuit arrangement, and program product for dynamically reallocating power consumption at a component level of a processor. Power tokens representative of a power consumption metric are allocated to interconnected IP blocks of the processor, and as additional power is required by an IP block to perform assigned operations, the IP block may communicate a request for additional power tokens to one or more interconnected IP blocks. The interconnected IP blocks may grant power tokens for the request based on a priority, availability, and/or power consumption target. The requesting IP block may modify power consumption based on power tokens granted by interconnected IP blocks for the request.	05-22-2014
20140143558	DISTRIBUTED CHIP LEVEL MANAGED POWER SYSTEM - A method, circuit arrangement, and program product for dynamically reallocating power consumption at a component level of a processor. Power tokens representative of a power consumption metric are allocated to interconnected IP blocks of the processor, and as additional power is required by an IP block to perform assigned operations, the IP block may communicate a request for additional power tokens to one or more interconnected IP blocks. The interconnected IP blocks may grant power tokens for the request based on a priority, availability, and/or power consumption target. The requesting IP block may modify power consumption based on power tokens granted by interconnected IP blocks for the request. A power management block may adjust power token allocation of one or more IP blocks by communicating a command to one or more IP blocks and/or by adjusting a power token request.	05-22-2014
20140164703	CACHE SWIZZLE WITH INLINE TRANSPOSITION - A method and circuit arrangement selectively swizzle data in one or more levels of cache memory coupled to a processing unit based upon one or more swizzle-related page attributes stored in a memory address translation data structure such as an Effective To Real Translation (ERAT) or Translation Lookaside Buffer (TLB). A memory address translation data structure may be accessed, for example, in connection with a memory access request for data in a memory page, such that attributes associated with the memory page in the data structure may be used to control whether data is swizzled, and if so, how the data is to be formatted in association with handling the memory access request.	06-12-2014
20140164704	CACHE SWIZZLE WITH INLINE TRANSPOSITION - A method and circuit arrangement selectively swizzle data in one or more levels of cache memory coupled to a processing unit based upon one or more swizzle-related page attributes stored in a memory address translation data structure such as an Effective To Real Translation (ERAT) or Translation Lookaside Buffer (TLB). A memory address translation data structure may be accessed, for example, in connection with a memory access request for data in a memory page, such that attributes associated with the memory page in the data structure may be used to control whether data is swizzled, and if so, how the data is to be formatted in association with handling the memory access request.	06-12-2014
20150020078	THREAD SCHEDULING ACROSS HETEROGENEOUS PROCESSING ELEMENTS WITH RESOURCE MAPPING - A system, method, and program product for scheduling processes of a workload on a plurality of hardware threads configured in a plurality of processing elements of a multithreading parallel computing system for processing thereby. Process dimensions for each process are determined based on processing attributes associated with each process, and a place and route algorithm is utilized to map the processes to a processor space representative of the processing resources of the computing system based at least in part on the process dimensions to thereby distribute the processes of the workload.	01-15-2015

Patent applications by Jamie R. Kuesel, Rochester, MN US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Jamie R. Kuesel, Rochester US

Jamie R. Kuesel, Rochester, MN US