Patent application number | Description | Published |
20130159812 | MEMORY ARCHITECTURE FOR READ-MODIFY-WRITE OPERATIONS - According to one embodiment, a memory architecture implemented method is provided, where the memory architecture includes a logic chip and one or more memory chips on a single die, and where the method comprises: reading values of data from the one or more memory chips to the logic chip, where the one or more memory chips and the logic chip are on a single die; modifying, via the logic chip on the single die, the values of data; and writing, from the logic chip to the one or more memory chips, the modified values of data. | 06-20-2013 |
20130262780 | Apparatus and Method for Fast Cache Shutdown - An apparatus and method to enable a fast cache shutdown is disclosed. In one embodiment, a cache subsystem includes a cache memory and a cache controller coupled to the cache memory. The cache controller is configured to, upon restoring power to the cache subsystem, inhibit writing of modified data exclusively into the cache memory. | 10-03-2013 |
20140040532 | STACKED MEMORY DEVICE WITH HELPER PROCESSOR - A processing system comprises one or more processor devices and other system components coupled to a stacked memory device having a set of stacked memory layers and a set of one or more logic layers. The set of logic layers implements a helper processor that executes instructions to perform tasks in response to a task request from the processor devices or otherwise on behalf of the other processor devices. The set of logic layers also includes a memory interface coupled to memory cell circuitry implemented in the set of stacked memory layers and coupleable to the processor devices. The memory interface operates to perform memory accesses for the processor devices and for the helper processor. By virtue of the helper processor's tight integration with the stacked memory layers, the helper processor may perform certain memory-intensive operations more efficiently than could be performed by the external processor devices. | 02-06-2014 |
20140040698 | STACKED MEMORY DEVICE WITH METADATA MANGEMENT - A processing system comprises one or more processor devices and other system components coupled to a stacked memory device having a set of stacked memory layers and a set of one or more logic layers. The set of logic layers implements a metadata manager that offloads metadata management from the other system components. The set of logic layers also includes a memory interface coupled to memory cell circuitry implemented in the set of stacked memory layers and coupleable to the devices external to the stacked memory device. The memory interface operates to perform memory accesses for the external devices and for the metadata manager. By virtue of the metadata manager's tight integration with the stacked memory layers, the metadata manager may perform certain memory-intensive metadata management operations more efficiently than could be performed by the external devices. | 02-06-2014 |
20140068304 | METHOD AND APPARATUS FOR POWER REDUCTION DURING LANE DIVERGENCE - A method and device for reducing power during an instruction lane divergence includes idling an inactive execution lane during the lane divergence. | 03-06-2014 |
20140089699 | POWER MANAGEMENT SYSTEM AND METHOD FOR A PROCESSOR - The present disclosure relates to a method and apparatus for dynamically controlling power consumption by at least one processor. A power management method includes monitoring, by power control logic of the at least one processor, performance data associated with each of a plurality of executions of a repetitive workload by the at least one processor. The method includes adjusting, by the power control logic following an execution of the repetitive workload, an operating frequency of at least one of a compute unit and a memory controller upon a determination that the at least one processor is at least one of compute-bound and memory-bound based on monitored performance data associated with the execution of the repetitive workload. | 03-27-2014 |
20140136870 | TRACKING MEMORY BANK UTILITY AND COST FOR INTELLIGENT SHUTDOWN DECISIONS - A device receives an indication that a memory bank is to be powered down, and determines, based on receiving the indication, shutdown scores corresponding to powered up memory banks. Each shutdown score is based on a shutdown metric associated with powering down a powered up memory bank. The device may power down a selected memory bank based on the shutdown scores. | 05-15-2014 |
20140136873 | TRACKING MEMORY BANK UTILITY AND COST FOR INTELLIGENT POWER UP DECISIONS - A device receives an indication that a memory bank is to be powered up, and determines, based on receiving the indication, power scores corresponding to powered down memory banks. Each power score corresponds to a power metric associated with powering up a powered down memory bank. The device powers up a selected memory bank based on the plurality of power scores. | 05-15-2014 |
20140143492 | Using Predictions for Store-to-Load Forwarding - The described embodiments include a core that uses predictions for store-to-load forwarding. In the described embodiments, the core comprises a load-store unit, a store buffer, and a prediction mechanism. During operation, the prediction mechanism generates a prediction that a load will be satisfied using data forwarded from the store buffer because the load loads data from a memory location in a stack. Based on the prediction, the load-store unit first sends a request for the data to the store buffer in an attempt to satisfy the load using data forwarded from the store buffer. If data is returned from the store buffer, the load is satisfied using the data. However, if the attempt to satisfy the load using data forwarded from the store buffer is unsuccessful, the load-store unit then separately sends a request for the data to a cache to satisfy the load. | 05-22-2014 |
20140143493 | Bypassing a Cache when Handling Memory Requests - The described embodiments include a computing device that handles memory requests. In some embodiments, when a memory request is to be sent to a cache in the computing device or to be bypassed to a next lower level of a memory hierarchy in the computing device based on expected memory request resolution times, a bypass mechanism is configured to send the memory request to the cache or bypass the memory request to the next lower level of the memory hierarchy. | 05-22-2014 |
20140143495 | METHODS AND APPARATUS FOR SOFT-PARTITIONING OF A DATA CACHE FOR STACK DATA - A method of partitioning a data cache comprising a plurality of sets, the plurality of sets comprising a plurality of ways, is provided. Responsive to a stack data request, the method stores a cache line associated with the stack data in one of a plurality of designated ways of the data cache, wherein the plurality of designated ways is configured to store all requested stack data. | 05-22-2014 |
20140143498 | METHODS AND APPARATUS FOR FILTERING STACK DATA WITHIN A CACHE MEMORY HIERARCHY - A method of storing stack data in a cache hierarchy is provided. The cache hierarchy comprises a data cache and a stack filter cache. Responsive to a request to access a stack data block, the method stores the stack data block in the stack filter cache, wherein the stack filter cache is configured to store any requested stack data block. | 05-22-2014 |
20140143499 | METHODS AND APPARATUS FOR DATA CACHE WAY PREDICTION BASED ON CLASSIFICATION AS STACK DATA - A method of way prediction for a data cache having a plurality of ways is provided. Responsive to an instruction to access a stack data block, the method accesses identifying information associated with a plurality of most recently accessed ways of a data cache to determine whether the stack data block resides in one of the plurality of most recently accessed ways of the data cache, wherein the identifying information is accessed from a subset of an array of identifying information corresponding to the plurality of most recently accessed ways; and when the stack data block resides in one of the plurality of most recently accessed ways of the data cache, the method accesses the stack data block from the data cache. | 05-22-2014 |
20140149710 | CREATING SIMD EFFICIENT CODE BY TRANSFERRING REGISTER STATE THROUGH COMMON MEMORY - Methods, media, and computing systems are provided. The method includes, the media are configured for, and the computing system includes a processor with control logic for allocating memory for storing a plurality of local register states for work items to be executed in single instruction multiple data hardware and for repacking wavefronts that include work items associated with a program instruction responsive to a conditional statement. The repacking is configured to create repacked wavefronts that include at least one of a wavefront containing work items that all pass the conditional statement and a wavefront containing work items that all fail the conditional statement. | 05-29-2014 |
20140156941 | Tracking Non-Native Content in Caches - The described embodiments include a cache with a plurality of banks that includes a cache controller. In these embodiments, the cache controller determines a value representing non-native cache blocks stored in at least one bank in the cache, wherein a cache block is non-native to a bank when a home for the cache block is in a predetermined location relative to the bank. Then, based on the value representing non-native cache blocks stored in the at least one bank, the cache controller determines at least one bank in the cache to be transitioned from a first power mode to a second power mode. Next, the cache controller transitions the determined at least one bank in the cache from the first power mode to the second power mode. | 06-05-2014 |
20140156975 | Redundant Threading for Improved Reliability - In some embodiments, a method for improving reliability in a processor is provided. The method can include replicating input data for first and second lanes of a processor, the first and second lanes being located in a same cluster of the processor and the first and second lanes each generating a respective value associated with an instruction to be executed in the respective lane, and responsive to a determination that the generated values do not match, providing an indication that the generated values do not match. | 06-05-2014 |
20140164708 | SPILL DATA MANAGEMENT - A processor discards spill data from a memory hierarchy in response to the final access to the spill data has been performed by a compiled program executing at the processor. In some embodiments, the final access determined based on a special-purpose load instruction configured for this purpose. In some embodiments the determination is made based on the location of a stack pointer indicating that a method of the executing program has returned, so that data of the returned method that remains in the stack frame is no longer to be accessed. Because the spill data is discarded after the final access, it is not transferred through the memory hierarchy. | 06-12-2014 |
20140173378 | PARITY DATA MANAGEMENT FOR A MEMORY ARCHITECTURE - A processor system as presented herein includes a processor core, cache memory coupled to the processor core, a memory controller coupled to the cache memory, and a system memory component coupled to the memory controller. The system memory component includes a plurality of independent memory channels configured to store data blocks, wherein the memory controller controls the storing of parity bits in at least one of the plurality of independent memory channels. In some implementations, the system memory is realized as a die-stacked memory component. | 06-19-2014 |
20140173379 | DIRTY CACHELINE DUPLICATION - A method of managing memory includes installing a first cacheline at a first location in a cache memory and receiving a write request. In response to the write request, the first cacheline is modified in accordance with the write request and marked as dirty. Also in response to the write request, a second cacheline is installed that duplicates the first cacheline, as modified in accordance with the write request, at a second location in the cache memory. | 06-19-2014 |
20140181410 | MANAGEMENT OF CACHE SIZE - In response to a processor core exiting a low-power state, a cache is set to a minimum size so that fewer than all of the cache's entries are available to store data, thus reducing the cache's power consumption. Over time, the size of the cache can be increased to account for heightened processor activity, thus ensuring that processing efficiency is not significantly impacted by a reduced cache size. In some embodiments, the cache size is increased based on a measured processor performance metric, such as an eviction rate of the cache. In some embodiments, the cache size is increased at regular intervals until a maximum size is reached. | 06-26-2014 |
20140181412 | MECHANISMS TO BOUND THE PRESENCE OF CACHE BLOCKS WITH SPECIFIC PROPERTIES IN CACHES - A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache and one or more sources for memory requests. In response to receiving a request to allocate data of a first type, a cache controller allocates the data in the cache responsive to determining a limit of an amount of data of the first type permitted in the cache is not reached. The controller maintains an amount and location information of the data of the first type stored in the cache. Additionally, the cache may be partitioned with each partition designated for storing data of a given type. Allocation of data of the first type is dependent at least upon the availability of a first partition and a limit of an amount of data of the first type in a second partition. | 06-26-2014 |
20140181414 | MECHANISMS TO BOUND THE PRESENCE OF CACHE BLOCKS WITH SPECIFIC PROPERTIES IN CACHES - A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, wherein a first bank is powered down. In response a write request to a second bank for data indicated to be stored in the powered down first bank, the cache controller determines a respective bypass condition for the data. If the bypass condition exceeds a threshold, then the cache controller invalidates any copy of the data stored in the second bank. If the bypass condition does not exceed the threshold, then the cache controller stores the data with a clean state in the second bank. The cache controller writes the data in a lower-level memory for both cases. | 06-26-2014 |
20140181421 | PROCESSING ENGINE FOR COMPLEX ATOMIC OPERATIONS - A system includes an atomic processing engine (APE) coupled to an interconnect. The interconnect is to couple to one or more processor cores. The APE receives a plurality of commands from the one or more processor cores through the interconnect. In response to a first command, the APE performs a first plurality of operations associated with the first command. The first plurality of operations references multiple memory locations, at least one of which is shared between two or more threads executed by the one or more processor cores. | 06-26-2014 |
20140181427 | Compound Memory Operations in a Logic Layer of a Stacked Memory - Some die-stacked memories will contain a logic layer in addition to one or more layers of DRAM (or other memory technology). This logic layer may be a discrete logic die or logic on a silicon interposer associated with a stack of memory dies. Additional circuitry/functionality is placed on the logic layer to implement functionality to perform various data movement and address calculation operations. This functionality would allow compound memory operations—a single request communicated to the memory that characterizes the accesses and movement of many data items. This eliminates the performance and power overheads associated with communicating address and control information on a fine-grain, per-data-item basis from a host processor (or other device) to the memory. This approach also provides better visibility of macro-level memory access patterns to the memory system and may enable additional optimizations in scheduling memory accesses. | 06-26-2014 |
20140181453 | Processor with Host and Slave Operating Modes Stacked with Memory - A system, method, and computer program product are provided for a memory device system. One or more memory dies and at least one logic die are disposed in a package and communicatively coupled. The logic die comprises a processing device configurable to manage virtual memory and operate in an operating mode. The operating mode is selected from a set of operating modes comprising a slave operating mode and a host operating mode. | 06-26-2014 |
20140181457 | Write Endurance Management Techniques in the Logic Layer of a Stacked Memory - A system, method, and memory device embodying some aspects of the present invention for remapping external memory addresses and internal memory locations in stacked memory are provided. The stacked memory includes one or more memory layers configured to store data. The stacked memory also includes a logic layer connected to the memory layer. The logic layer has an Input/Output (I/O) port configured to receive read and write commands from external devices, a memory map configured to maintain an association between external memory addresses and internal memory locations, and a controller coupled to the I/O port, memory map, and memory layers, configured to store data received from external devices to internal memory locations. | 06-26-2014 |
20140181458 | DIE-STACKED MEMORY DEVICE PROVIDING DATA TRANSLATION - A die-stacked memory device incorporates a data translation controller at one or more logic dies of the device to provide data translation services for data to be stored at, or retrieved from, the die-stacked memory device. The data translation operations implemented by the data translation controller can include compression/decompression operations, encryption/decryption operations, format translations, wear-leveling translations, data ordering operations, and the like. Due to the tight integration of the logic dies and the memory dies, the data translation controller can perform data translation operations with higher bandwidth and lower latency and power consumption compared to operations performed by devices external to the die-stacked memory device. | 06-26-2014 |
20140181467 | HIGH LEVEL SOFTWARE EXECUTION MASK OVERRIDE - Methods, and media, and computer systems are provided. The method includes, the media includes control logic for, and the computer system includes a processor with control logic for overriding an execution mask of SIMD hardware to enable at least one of a plurality of lanes of the SIMD hardware. Overriding the execution mask is responsive to a data parallel computation and a diverged control flow of a workgroup. | 06-26-2014 |
20140181483 | Computation Memory Operations in a Logic Layer of a Stacked Memory - Some die-stacked memories will contain a logic layer in addition to one or more layers of DRAM (or other memory technology). This logic layer may be a discrete logic die or logic on a silicon interposer associated with a stack of memory dies. Additional circuitry/functionality is placed on the logic layer to implement functionality to perform various computation operations. This functionality would be desired where performing the operations locally near the memory devices would allow increased performance and/or power efficiency by avoiding transmission of data across the interface to the host processor. | 06-26-2014 |
20140223445 | Selecting a Resource from a Set of Resources for Performing an Operation - The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism is configured to perform a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the identified resource is not available for performing the operation and until a resource is selected for performing the operation, the selection mechanism is configured to identify a next resource in the table and select the next resource for performing the operation when the next resource is available for performing the operation. | 08-07-2014 |
20140372711 | SCHEDULING MEMORY ACCESSES USING AN EFFICIENT ROW BURST VALUE - A memory accessing agent includes a memory access generating circuit and a memory controller. The memory access generating circuit is adapted to generate multiple memory accesses in a first ordered arrangement. The memory controller is coupled to the memory access generating circuit and has an output port, for providing the multiple memory accesses to the output port in a second ordered arrangement based on the memory accesses and characteristics of an external memory. The memory controller determines the second ordered arrangement by calculating an efficient row burst value and interrupting multiple row-hit requests to schedule a row-miss request based on the efficient row burst value. | 12-18-2014 |
20140376320 | SPARE MEMORY EXTERNAL TO PROTECTED MEMORY - A memory subsystem employs spare memory cells external to one or more memory devices. In some embodiments, a processing system uses the spare memory cells to replace individual selected cells at the protected memory, whereby the selected cells are replaced on a cell-by-cell basis, rather than exclusively on a row-by-row, column-by-column, or block-by-block basis. This allows faulty memory cells to be replaced efficiently, thereby improving memory reliability and manufacturing yields, without requiring large blocks of spare memory cells. | 12-25-2014 |
20150016172 | QUERY OPERATIONS FOR STACKED-DIE MEMORY DEVICE - An integrated circuit (IC) package includes a stacked-die memory device. The stacked-die memory device includes a set of one or more stacked memory dies implementing memory cell circuitry. The stacked-die memory device further includes a set of one or more logic dies electrically coupled to the memory cell circuitry. The set of one or more logic dies includes a query controller and a memory controller. The memory controller is coupleable to at least one device external to the stacked-die memory device. The query controller is to perform a query operation on data stored in the memory cell circuitry responsive to a query command received from the external device. | 01-15-2015 |
20150019813 | MEMORY HIERARCHY USING ROW-BASED COMPRESSION - A system includes a first memory and a device coupleable to the first memory. The device includes a second memory to cache data from the first memory. The second memory includes a plurality of rows, each row including a corresponding set of compressed data blocks of non-uniform sizes and a corresponding set of tag blocks. Each tag block represents a corresponding compressed data block of the row. The device further includes decompression logic to decompress data blocks accessed from the second memory. The device further includes compression logic to compress data blocks to be stored in the second memory. | 01-15-2015 |
20150019834 | MEMORY HIERARCHY USING PAGE-BASED COMPRESSION - A system includes a device coupleable to a first memory. The device includes a second memory to cache data from the first memory. The second memory is to store a set of compressed pages of the first memory and a set of page descriptors. Each compressed page includes a set of compressed data blocks. Each page descriptor represents a corresponding page and includes a set of location identifiers that identify the locations of the compressed data blocks of the corresponding page in the second memory. The device further includes compression logic to compress data blocks of a page to be stored to the second memory and decompression logic to decompress compressed data blocks of a page accessed from the second memory. | 01-15-2015 |
20150026511 | PARTITIONABLE DATA BUS - A method and a system are provided for partitioning a system data bus. The method can include partitioning off a portion of a system data bus that includes one or more faulty bits to form a partitioned data bus. Further, the method includes transferring data over the partitioned data bus to compensate for data loss due to the one or more faulty bits in the system data bus. | 01-22-2015 |