Entries |
Document | Title | Date |
20080209183 | FAST SPARSE LIST WALKER - Provided are a method, information processing system, and computer readable medium for identifying active bits in a vector. The method comprises receiving a pointer associated with a vector of bits. The pointer is associated with a current bit within the vector of bits. The vector of bits if grouped into groups of a mathematical power of two, which is any non-negative integer powers of two. One or more current groups are determined which are the groups of the mathematical power of two comprising the current bit. The one or more current groups of the power of two are analyzed. A largest group of the power of two is identified in the one or more current groups comprising all empty bits. The pointer is set to point to a bit following a last bit in the identified largest group of the power of two comprising all empty bits. | 08-28-2008 |
20080209184 | PROCESSOR WITH RECONFIGURABLE FLOATING POINT UNIT - A technique of operating a processor includes determining whether a floating point unit (FPU) of the processor is to operate in a full-bit mode or a reduced-bit mode. An instruction is fetched and the instruction is decoded into a single operation, when the full-bit mode is indicated, or multiple operations, when the reduced-bit mode is indicated. | 08-28-2008 |
20080209185 | Processor with reconfigurable floating point unit - A technique of operating a processor includes determining whether a floating point unit (FPU) of the processor is to operate in a full-bit mode or a reduced-bit mode. An instruction is fetched and the instruction is decoded into one or more full-bit operations, when the full-bit mode is indicated, or one or more reduced-bit operations, when the reduced-bit mode is indicated. | 08-28-2008 |
20080222398 | Programmable processor with group floating-point operations - A programmable processor that comprises a general purpose processor architecture, capable of operation independent of another host processor, having a virtual memory addressing unit, an instruction path and a data path; an external interface; a cache operable to retain data communicated between the external interface and the data path; at least one register file configurable to receive and store data from the data path and to communicate the stored data to the data path; and a multi-precision execution unit coupled to the data path. The multi-precision execution unit is configurable to dynamically partition data received from the data path to account for an elemental width of the data and is capable of performing group floating-point operations on multiple operands in partitioned fields of operand registers and returning catenated results. In other embodiments the multi-precision execution unit is additionally configurable to execute group integer and/or group data handling operations. | 09-11-2008 |
20080244241 | HANDLING FLOATING POINT OPERATIONS - A computing system capable of handling floating point operations during program code conversion is described, comprising a processor including a floating point unit and an integer unit. The computing system further comprises a translator unit arranged to receive subject code instructions including at least one instruction relating to a floating point operation and in response to generate corresponding target code for execution on said processor. To handle floating point operations a floating point status unit and a floating point control unit are provided within the translator. These units are cause the translator unit to generate either: target code for performing the floating point operations directly on the floating point unit; or target code for performing the floating point operations indirectly, for example using a combination of the integer unit and the floating point unit. In this way the efficiency of the computing system is improved. | 10-02-2008 |
20080263335 | Representation of Modal Intervals within a Computer - A modal interval representation having improved computational utility is provided. The modal interval representation generally includes a binary quantifier, and a set theoretical interval for select permutations of marks of a pair of marks of an IEEE standard 754 digital scale. The set theoretical interval includes combinations of real numbers, infinities, signed zeros, and pseudo-numbers, with select permutations of the marks comprising bounded, unbounded, pointwise and indefinite modal intervals. | 10-23-2008 |
20080263336 | Processor Having Efficient Function Estimate Instructions - High-precision floating-point function estimates are split in two instructions each: a low precision table lookup instruction and a linear interpolation instruction. Estimates of different functions can be implemented using this scheme: A separate table-lookup instruction is provided for each different function, while only a single interpolation instruction is needed, since the single interpolation instruction can perform the interpolation step for any of the functions to be estimated. Thus, significantly less overhead is incurred than would be incurred with specialized hardware, while still maintaining a uniform FPU latency, which allows for much simpler control logic. | 10-23-2008 |
20080288756 | "OR" BIT MATRIX MULTIPLY VECTOR INSTRUCTION - A processor is operable to execute a bit matrix multiply instruction. In further examples, the processor is operable to perform a vector bit matrix multiply instruction, and is a part of a computerized system. | 11-20-2008 |
20080313438 | Unified Cascaded Delayed Execution Pipeline for Fixed and Floating Point Instructions - Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times. | 12-18-2008 |
20090094441 | Perform Floating Point Operation Instruction - A method and system are disclosed for executing a machine instruction in a central processing unit. The method comprise the steps of obtaining a perform floating-point operation instruction; obtaining a test bit; and determining a value of the test bit. If the test bit has a first value, (a) a specified floating-point operation function is performed, and (b) a condition, code is set to a value determined by said specified function. If the test bit has a second value, (c) a check is made to determine if said specified function is valid and installed on the machine, (d) if said specified function is valid and installed on the machine, the condition code is set to one code value, and (c) if said specified function is either not valid or not installed on the machine, the condition code is set to a second code value. | 04-09-2009 |
20090100252 | VECTOR PROCESSING SYSTEM - A vector processing system for executing vector instructions, each instruction defining multiple pairs of values, an operation to be executed on each of said value pairs and a scalar modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and to implement the defined operation on said value pair to generate a respective result; and a scalar result unit for receiving the results of the parallel processing units and for using said results in a manner defined by the scalar modifier to generate a single output value for said instruction. | 04-16-2009 |
20090113186 | MICROCONTROLLER AND CONTROLLING SYSTEM - A microcontroller and a controlling system having the same are provided, in which the increase in the program code for performing floating-point arithmetic, in particular, the increase in the amount of code due to a variable are suppressed, and the processing overhead for converting fixed-point data into floating-point data is reduced. The microcontroller includes a floating-point converter which inputs integer data and corresponding decimal point position data as fixed-point data and which converts the input data into floating-point data by acquiring a fraction part, an exponent part, and a sign of the floating type from the input data, and a floating-point arithmetic logic unit which receives the output of the floating-point converter and calculates the floating-point data. The floating-point converter acquires the exponent part by performing addition and subtraction of the decimal point position data and the shift amount of the fraction part to the integer data. | 04-30-2009 |
20090158012 | Method and Apparatus for Performing Improved Group Instructions - Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored registers in a register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, wherein the execution unit is capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions. | 06-18-2009 |
20090158013 | Method and Apparatus Implementing a Minimal Area Consumption Multiple Addend Floating Point Summation Function in a Vector Microprocessor - Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises transferring more than two operands to a vector unit, each operand being transferred to a respective one of a plurality of processing lanes of the vector unit. The operands may be transferred from the vector unit to a dot product unit wherein an arithmetic operation using the more than two operands may be performed. | 06-18-2009 |
20090177869 | EFFICIENT CHECK NODE MESSAGE TRANSFORM APPROXIMATION FOR LDPC DECODER - In modern iterative coding systems such as LDPC decoder and turbo-convolutional decoder in which the invention may be used, the core computations can often be reduced to a sequence of additions and subtractions alternating between logarithm and linear domains A computationally efficient and robust approximation method for log and exp functions is described which involves using a simple bit mapping between fixed point fractional data format and floating point format. The method avoids costly lookup tables and complex computations and further reduces the core processing to a sequence of additions and subtractions using alternating fixed point and floating point processing units. The method is well suited for use in highly optimized hardware implementations which can take advantage of modern advances in standard floating point arithmetic circuit design as well as for software implementation on a wide class of processors equipped with FPU where the invention avoids the need for a typical multi-cycle series of log/exp instructions and especially on a SIMD FPU-equipped processors where log/exp functions are typically scalar. | 07-09-2009 |
20090182991 | PROCESSOR INCLUDING EFFICIENT SIGNATURE GENERATION FOR LOGIC ERROR PROTECTION - A processor core includes an instruction decode unit that may dispatch a same integer instruction stream to a plurality of integer execution units operating in lock-step. The processor core also includes signature generation logic that may generate, concurrently with execution of the integer instructions, a respective signature from result signals conveyed on respective result buses in one or more pipeline stages within each of the integer execution units in response to the result signals becoming available. The processor core also includes compare logic that may detect a mismatch between signatures from each of the integer execution units. Further, in response to the compare logic detecting any mismatch, the compare logic may cause instructions causing the mismatch to be re-executed. | 07-16-2009 |
20090187746 | Apparatus and method for performing permutation operations on data - An apparatus for processing data is provided comprising processing circuitry having permutation circuitry for performing permutation operations, a register bank having a plurality of registers for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to be responsive to a control-generating instruction to generate in dependence upon a bit-mask control signals to configure permutation circuitry for performing permutation operation on an input operand. The bit-mask identifies within the input operand the first group of data elements having a first ordering and a second group of data elements having a second ordering and the permutation operation is such that it preserves one of the first ordering and the second ordering but changes the other of the first ordering and the second ordering. | 07-23-2009 |
20090198974 | METHODS FOR CONFLICT-FREE, COOPERATIVE EXECUTION OF COMPUTATIONAL PRIMITIVES ON MULTIPLE EXECUTION UNITS - A method for executing multiple computational primitives is provided in accordance with exemplary embodiments. A first computational unit and at least a second computational unit cooperate to execute multiple computational primitives. The first computational unit independently computes other computational primitives. By virtue of arbitration for shared source operand buses or shared result buses, availability of the first and second computational units needed to execute cooperatively the multiple computational primitives is assured by a process of reservation as used for a computational primitive executed on a dedicated computational unit. | 08-06-2009 |
20090210678 | Handling of Denormals In Floating Point Number Processim - A data processing apparatus operate to process floating point operands is disclosed. The data processing apparatus comprises: an instruction decoder operable to decode an instruction for processing floating point operands; and a data processor operable to perform data processing operations controlled by the instruction decoder wherein: in response to the decoded instruction indicating operation according to a flush-to-zero semantic, the data processor is operable to process the floating point operands in accordance with the decoded instruction such that floating point operands having a denormal value are treated as zero operands; and in response to the decoded instruction indicating operation according to a denormal semantic, the data processor is operable to process the floating point operands in accordance with the decoded instruction such that floating point operands having a denormal value are treated as denormal operands. | 08-20-2009 |
20090240927 | PROCESSOR AND INFORMATION PROCESSING APPARATUS - A processor capable of executing conditional store instructions without being limited by the number of condition codes is provided. Condition data is stored in floating-point registers, and an operation unit executes a conditional floating-point store instruction of determining whether to store, in cache, store data. | 09-24-2009 |
20090249040 | Embedded Control System - An embedded control system capable of ensuring precision in arithmetic with data in the floating-point format and also avoiding a shortage of the storage area of a memory is provided. | 10-01-2009 |
20090265529 | Processor apparatus and method of processing multiple data by single instructions - A processor (and method) of processing multiple data by a single instruction includes first and second register sets each of which includes a plurality of registers, and an arithmetic unit to rearrange data being registered in the first and second register sets according to a relative size of an absolute value of the data between the first and second register sets so that the relative size is defined before executing an instruction considering the relative size. | 10-22-2009 |
20090327665 | Efficient parallel floating point exception handling in a processor - Methods and apparatus are disclosed for handling floating point exceptions in a processor that executes single-instruction multiple-data (SIMD) instructions. In one embodiment a numerical exception is identified for a SIMD floating point operation and SIMD micro-operations are initiated to generate two packed partial results of a packed result for the SIMD floating point operation. A SIMD denormalization micro-operation is initiated to combine the two packed partial results and to denormalize one or more elements of the combined packed partial results to generate a packed result for the SIMD floating point operation having one or more denormal elements. Flags are set and stored with packed partial results to identify denormal elements. In one embodiment a SIMD normalization micro-operation is initiated to generate a normalized pseudo internal floating point representation prior to the SIMD floating point operation when it uses multiplication. | 12-31-2009 |
20100031009 | Floating Point Execution Unit for Calculating a One Minus Dot Product Value in a Single Pass - A floating point execution unit calculates a one minus dot product value in a single pass. As such, the dependency that otherwise would be required to perform the calculations is eliminated, resulting in a substantially faster performance of such calculations. The floating point execution unit may be used, for example, to accelerate pixel shading algorithms such as Fresnel and electron microscope effects. | 02-04-2010 |
20100042815 | METHOD AND APPARATUS FOR EXECUTING PROGRAM CODE - The described embodiments provide a system that executes program code. While executing program code, the processor encounters at least one vector instruction and at least one vector-control instruction. The vector instruction includes a set of elements, wherein each element is used to perform an operation for a corresponding iteration of a loop in the program code. The vector-control instruction identifies elements in the vector instruction that may be operated on in parallel without causing an error due to a runtime data dependency between the iterations of the loop. The processor then executes the loop by repeatedly executing the vector-control instruction to identify a next group of elements that can be operated on in the vector instruction and selectively executing the vector instruction to perform the operation for the next group of elements in the vector instruction, until the operation has been performed for all elements of the vector instruction. | 02-18-2010 |
20100042816 | BREAK, PRE-BREAK, AND REMAINING INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a system that sets elements in a result vector based on an input vector. During operation, the system determines a location of a key element within the input vector. Next, the system generates a result vector. When generating the result vector, the system sets one or more elements of the result vector based on the location of the key element in the input vector. | 02-18-2010 |
20100042817 | SHIFT-IN-RIGHT INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with shifted values from an input vector. During operation, the processor receives an input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain shifted values or propagated values from the input vector, depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector. | 02-18-2010 |
20100042818 | COPY-PROPAGATE, PROPAGATE-POST, AND PROPAGATE-PRIOR INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with copied or propagated values from an input vector. During operation, the processor receives at least one input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain copied propagated values from the input vector(s), depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector. | 02-18-2010 |
20100049950 | RUNNING-SUM INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with summed values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element in the second input vector. The processor then writes the sum of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector. | 02-25-2010 |
20100049951 | RUNNING-AND, RUNNING-OR, RUNNING-XOR, AND RUNNING-MULTIPLY INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then writes the product of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector. | 02-25-2010 |
20100058037 | RUNNING-SHIFT INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then determines a number of bit positions to shift the base value using selected relevant elements in the first input vector. The processor then shifts the copy of the base value by the number of bit positions and writes the value into a corresponding element in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector. | 03-04-2010 |
20100095097 | Floating Point Only Single Instruction Multiple Data Instruction Set Architecture - Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system. | 04-15-2010 |
20100095098 | Generating and Executing Programs for a Floating Point Single Instruction Multiple Data Instruction Set Architecture - Mechanisms for generating and executing programs for a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA) are provided. A computer program product comprising a computer recordable medium having a computer readable program recorded thereon is provided. The computer readable program, when executed on a computing device, causes the computing device to receive one or more instructions and execute the one or more instructions using logic in an execution unit of the computing device. The logic implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA), based on data stored in a vector register file of the computing device. The vector register file is configured to store both scalar and floating point values as vectors having a plurality of vector elements. | 04-15-2010 |
20100095099 | SYSTEM AND METHOD FOR STORING NUMBERS IN FIRST AND SECOND FORMATS IN A REGISTER FILE - A system and a method for storing numbers in a register file are provided. The system and the method store single precision numbers in double precision format in a register file that is shared between floating point computational units and computational units not supporting floating point numbers. | 04-15-2010 |
20100100713 | FAST FLOATING POINT COMPARE WITH SLOWER BACKUP FOR CORNER CASES - A floating point processor unit executes a floating point compare instruction with two operands of the same or different precision by comparing the two operands in integer format, which speeds up the execution of the floating point compare instruction significantly. The floating point processor now executes the floating point compare instruction at least twice as fast or faster (e.g., two clock cycles instead of five clock cycles in the prior art) for nearly most operand cases (e.g., 99% of all cases). Only the rare corner cases require additional operations on one of the operands and thus require additional cycles of execution time because the integer compare operation will not work for these corner cases. This is due to the fact that one operand is a single precision subnormal number in an unnormalized representation (i.e., has two representations) and the other operand is in the SP subnormal range such that the integer compare operation will fail. | 04-22-2010 |
20100122070 | Combined associative and distributed arithmetics for multiple inner products - Subvector slices x(i,r,s) of a first vector x(i) are stored (e.g., in a CAM array) in a bit-parallel word-serial manner. For each of the stored subvector slices and in parallel on bits of said each subvector slice, an operation is executed that outputs a pre-calculated inner product result of the said bits and a second vector a. If the subvector slices x(i,r,s) of the first vector x(i) are initially stored in a bit-serial word-serial manner, there is a transform to store them in the bit-parallel word serial manner by copying relevant bits of each of the subvector slices from a 0 | 05-13-2010 |
20100153692 | Media Action Script Acceleration Apparatus - Exemplary apparatus, method, and system embodiments provide for accelerated hardware processing of an action script for a graphical image for visual display. An exemplary apparatus comprises: a first memory; and a plurality of processors to separate the action script from other data, to convert a plurality of descriptive elements of the action script into a plurality of hardware-level operational or control codes, and to perform one or more operations corresponding to an operational code of the plurality of operational codes using corresponding data to generate pixel data for the graphical image. In an exemplary embodiment, at least one processor further is to parse the action script into the plurality of descriptive elements and the corresponding data, and to extract data from the action script and to store the extracted data in the first memory as a plurality of control words having the corresponding data in predetermined fields. | 06-17-2010 |
20100174891 | RECONFIGURABLE SIMD PROCESSOR AND METHOD FOR CONTROLLING ITS INSTRUCTION EXECUTION - In a reconfigurable SIMD processor, a unit of operation for executing an instruction corresponds to one group, and the one group that includes a plurality of PEs implements at least a part of an operation unit that executes at least one of an integer divide instruction: a floating decimal point add/subtract instruction; a floating decimal point multiply instruction; and a floating decimal point divide instruction, using operation units and general purpose registers provided in a plurality of the PEs. The number of the PEs that compose the one group is varied in accordance with the instruction. | 07-08-2010 |
20100191939 | TRIGONOMETRIC SUMMATION VECTOR EXECUTION UNIT - A unique instruction and exponent adjustment adder selectively shift outputs from multiple execution units, including a plurality of multipliers, in a processor core in order to scale mantissas for related trigonometric functions used in a vector dot product. | 07-29-2010 |
20100205411 | HANDLING COMPLEX REGEX PATTERNS STORAGE-EFFICIENTLY USING THE LOCAL RESULT PROCESSOR - A result processor access a result table for an entry associated with a predetermined sub-expression of a regular expression in response to a finite state machine finding the predetermined sub-expression in the input stream. The result processor executes an instruction associated with the entry, the instruction including one or more operations to be performed on one or more bits in a bit vector register, and determines as a function of the one or more bits in the bit vector register whether the complex regular expression has been found in the input stream. | 08-12-2010 |
20100268920 | MECHANISM FOR HANDLING UNFUSED MULTIPLY-ACCUMULATE ACCRUED EXCEPTION BITS IN A PROCESSOR - A mechanism for handling unfused multiply-add accrued exception bits includes a processor including a floating point unit, a storage, and exception logic. The floating-point unit may be configured to execute an unfused multiply-accumulate instruction defined with the instruction set architecture (ISA). The unfused multiply-accumulate instruction may include a multiply sub-operation and an accumulate sub-operation. The storage may be configured to maintain floating-point exception state information. The exception logic may be configured to capture the floating-point exception state after completion of the multiply sub-operation and prior to completion of the accumulate sub-operation, for example, and to update the storage to reflect the floating-point exception state. | 10-21-2010 |
20100281239 | RELIABLE EXECUTION USING COMPARE AND TRANSFER INSTRUCTION ON AN SMT MACHINE - A system and method for efficient reliable execution on a simultaneous multithreading machine. A processor is placed in a reliable execution mode (REM) to detect possible errors during execution of a mission critical software application. Only two threads may be configured to operate in this mode. Floating-point store and integer-transfer unary instructions may be converted to new binary instructions. Each new instruction has two source operands, each one corresponding to a different thread is specified by a same logical register number as a single source operand of the original unary instruction. All other instructions are replicated, wherein the original instruction and its twin are assigned to different threads. Simultaneous multi-threaded (SMT) floating-point logic may only be able to provide lockstep execution when it communicates using the new instruction with instantiated integer independent clusters. The new instruction cannot begin until both source operands are ready, which are subsequently compared to determine any mismatches or errors. | 11-04-2010 |
20100318772 | SUPERSCALAR REGISTER-RENAMING FOR A STACK-ADDRESSED ARCHITECTURE - A system and method for increasing processor throughput by decreasing a loop critical path. In one embodiment, a table comprises multiple stack entries, each comprising an x87 floating-point (FP) stack specifier. The combinatorial logic for operand translation of N FP instructions per clock cycle may require N instantiated copies of a combinatorial logic block. Each instantiated copy may determine a new ordering of the stack entries. Control logic may receive necessary information from the corresponding N FP instructions and determine a corresponding combined computational effect, or stack reordering, on entries within the table based on two or more instructions. Resulting control signals are conveyed to the N instantiated copies. A resulting accumulative delay from an input of the first copy to the output of the Nth copy may be less than or equal to (N−1)*time_delay versus a longer N*time_delay. | 12-16-2010 |
20100325397 | Data processing apparatus and method - A data processing apparatus is described which comprises processing circuitry responsive to data processing instructions to execute integer data processing operations and floating point data processing operations, a first set of integer registers useable by the processing circuitry in executing the integer data processing operations, and a second set of floating point registers useable by the processing circuitry in executing the floating point data processing operations. The processing circuitry is responsive to an interrupt request to perform one of an integer state preservation function in which at least a subset of only the integer registers are copied to a stack memory, and a floating point state preservation function in which at least a subset of both the integer registers and the floating point registers are copied to the stack memory, the one of said integer state preservation function and the floating point state preservation function being selected by the processing circuitry in dependence on state information. In this way, it is possible to reduce the memory size requirement through reduced stack sizes, and to reduce the number of memory accesses required compared with the basic solution of always preserving floating point registers. As a result, power usage and interrupt latency can be reduced. | 12-23-2010 |
20100325398 | RUNNING-MIN AND RUNNING-MAX INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector that contains results from a comparison operation. During operation, the processor receives a first input vector, a second input vector, and a control vector. When subsequently generating a result vector, the processor first captures a base value from a key element position in the first input vector. For selected elements in the result vector, processor compares the base value and values from relevant elements to the left of a corresponding element in the second input vector, and writes the result into the element in the result vector. In the described embodiments, the key element position and the relevant elements can be defined by the control vector and an optional predicate vector. | 12-23-2010 |
20100325399 | VECTOR TEST INSTRUCTION FOR PROCESSING VECTORS - The described embodiments provide a processor that executes a vector instruction. The processor starts by receiving a vector instruction that uses at least one vector of values that includes N elements as an input. In addition, the processor optionally receives a predicate vector that includes N elements. The processor then executes the vector instruction. In the described embodiments, when executing the vector instruction, if the predicate vector is received, for one or more selected elements in the vector of values for which a corresponding element in the predicate vector is active, otherwise, for one or more selected elements in the vector of values, the processor checks the one or more selected elements to determine if the selected elements contain a predetermined value. When the selected elements contain the predetermined value, the processor sets a corresponding status flag. | 12-23-2010 |
20110047358 | In-Data Path Tracking of Floating Point Exceptions and Store-Based Exception Indication - Mechanisms are provided for tracking exceptions in the execution of vectorized code. A speculative instruction is executed on a vector element of a vector. An exception condition is detected in association with the vector element based on a result of executing the speculative instruction on the vector element. A special exception value is stored in the vector element in a vector register corresponding to the vector, indicative of the exception condition, without invoking an exception handler for the exception condition. The special exception value is propagated with the vector element of the vector through a processor architecture of the processor, without invoking the exception handler for the exception condition. An exception corresponding to the exception condition indicated by the special exception value is generated only in response to a non-speculative instruction being executed that performs a non-speculative operation on the vector element. | 02-24-2011 |
20110047359 | Insertion of Operation-and-Indicate Instructions for Optimized SIMD Code - Mechanisms are provided for inserting indicated instructions for tracking and indicating exceptions in the execution of vectorized code. A portion of first code is received for compilation. The portion of first code is analyzed to identify non-speculative instructions performing designated non-speculative operations in the first code that are candidates for replacement by replacement operation-and-indicate instructions that perform the designated non-speculative operations and further perform an indication operation for indicating any exception conditions corresponding to special exception values present in vector register inputs to the replacement operation-and-indicate instructions. The replacement is performed and second code is generated based on the replacement of the at least one non-speculative instruction. The data processing system executing the compiled code is configured to store special exception values in vector output registers, in response to a speculative instruction generating an exception condition, without initiating exception handling. | 02-24-2011 |
20110060892 | SPECULATIVE FORWARDING OF NON-ARCHITECTED DATA FORMAT FLOATING POINT RESULTS - A microprocessor having an instruction set architecture (ISA) that specifies at least one architected data format (ADF) for floating-point operands includes first and second floating-point units. The first floating-point unit is configured to speculatively forward a non-ADF result generated by the first floating-point unit to the second floating-point unit. The non-ADF result is associated with a first instruction. The second floating-point unit is configured to use the speculatively forwarded non-ADF result associated with the first instruction as a source operand to generate a result of a second instruction. The second floating-point unit is further configured to convert the non-ADF result to an ADF result and to determine whether the non-ADF result creates an exception condition when converted to the ADF result. The microprocessor is configured to cancel the second instruction, in response to determining that the non-ADF result creates an exception condition when converted to the ADF result. | 03-10-2011 |
20110093686 | Register state saving and restoring - In a data processing apparatus | 04-21-2011 |
20110119471 | Method and apparatus to extract integer and fractional components from floating-point data - A method is presented including decomposing a first value into many parts. Decomposing includes shifting ( | 05-19-2011 |
20110138155 | VECTOR COMPUTER AND INSTRUCTION CONTROL METHOD THEREFOR - A vector computer executing vector operations via vector pipeline processing is restructured to dynamically perform an overtaking control on vector gather/scatter instructions. Minimum/maximum values among vector elements of vector registers are determined based on the result of fixed-point calculation defining an address dependency source instruction in accordance with a vector gather/scatter instruction, wherein minimum/maximum values are determined in a redundant time owing to a short turnaround time of the fixed-point calculation compared to floating-point calculation. An access range of addresses attributed to the vector gather/scatter instruction is specified based on minimum/maximum values. An overtaking control is performed on the vector gather/scatter instruction in light of the access range of addresses. | 06-09-2011 |
20110153996 | Parallel and Vectored Gilbert-Johnson-Keerthi Graphics Processing - Parallel and vectored data structures may be used in a single instruction multiple data processor that applies the Gilbert-Johnson-Keerthi algorithm. As a result, the performance of multi-core processors doing graphics processing may be increased in some cases. | 06-23-2011 |
20110173421 | MULTI-INPUT AND BINARY REPRODUCIBLE, HIGH BANDWIDTH FLOATING POINT ADDER IN A COLLECTIVE NETWORK - To add floating point numbers in a parallel computing system, a collective logic device receives the floating point numbers from computing nodes. The collective logic devices converts the floating point numbers to integer numbers. The collective logic device adds the integer numbers and generating a summation of the integer numbers. The collective logic device converts the summation to a floating point number. The collective logic device performs the receiving, the converting the floating point numbers, the adding, the generating and the converting the summation in one pass. One pass indicates that the computing nodes send inputs only once to the collective logic device and receive outputs only once from the collective logic device. | 07-14-2011 |
20110231636 | APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR PERFORMING A CYCLIC REDUNDANCY CHECK (CRC) - Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41. In some embodiments, the first instance of the CRC instruction specifies an initial value to be used in performing the first CRC operation, the set of data, and a storage location in which the cryptographic unit is configured to store the checksum value produced by the first CRC operation. | 09-22-2011 |
20110258418 | Load/Move Duplicate Instructions for a Processor - A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register. | 10-20-2011 |
20110276790 | INSTRUCTION SUPPORT FOR PERFORMING MONTGOMERY MULTIPLICATION - Techniques are disclosed relating to a processor including instruction support for performing a Montgomery multiplication. The processor may issue, for execution, programmer-selectable instruction from a defined instruction set architecture (ISA). The processor may include an instruction execution unit configured to receive instructions including a first instance of a Montgomery-multiply instruction defined within the ISA. The Montgomery-multiply instruction is executable by the processor to operate on at least operands A, B, and N residing in respective portions of a general-purpose register file of the processor, where at least one of operands A, B, N spans at least two registers of general-purpose register file. The instruction execution unit is configured to calculate P mod N in response to receiving the first instance of the Montgomery-multiply instruction, where P is the product of at least operand A, operand B, and R̂−1. | 11-10-2011 |
20110283092 | GETFIRST AND ASSIGNLAST INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments comprise a processor that executes vector instructions. In the described embodiments, while executing program code, the processor receives a vector instruction that indicates an input vector that includes N elements, wherein receiving the vector instruction comprises optionally receiving a predicate vector that includes N elements. The processor then executes the vector instruction. When executing the vector instruction, if the predicate vector is received, based on active elements in the predicate vector, otherwise, if the predicate vector is not received, based on an assumed predicate vector for which each element is active, the processor sets a value in a scalar register equal to a predetermined element of the input vector. In the described embodiments, the vector instruction can be a GetFirst, an AssignLast1P, or an AssignLast2P instruction. | 11-17-2011 |
20110296146 | HARDWARE INSTRUCTIONS TO ACCELERATE TABLE-DRIVEN MATHEMATICAL FUNCTION EVALUATION - A set of instructions for implementation in a floating-point unit or other computer processor hardware is disclosed herein. In one embodiment, an extended-range fused multiply-add operation, a first look-up operation, and a second look-up operation are each embodied in hardware instructions configured to be operably executed in a processor. These operations are accompanied by a table which provides a set of defined values in response to various function types, supporting the computation of elementary functions such as reciprocal, square, cube, fourth roots and their reciprocals, exponential, and logarithmic functions. By allowing each of these functions to be computed with a hardware instruction, branching and predicated execution may be reduced or eliminated, while also permitting the use of distributed instructions across a number of execution units. | 12-01-2011 |
20110296147 | METHOD OF TESTING COMPUTER, COMPUTER TEST APPARATUS AND NON-TRANSITORY COMPUTER-READABLE MEDIUM - A method of testing a computer, the method has designating a register as an input-only register having a setting of a value which does not cause an exception interruption with an execution of a specific type of instruction, generating a test instruction array having a plurality of instructions for a test, by assigning a register excluding the input-only register as an output destination of an execution result of each of the plurality of instructions, executing the plurality of instructions included in the generated test instruction array, and evaluating the execution results by the computer. | 12-01-2011 |
20110302394 | SYSTEM AND METHOD FOR PROCESSING REGULAR EXPRESSIONS USING SIMD AND PARALLEL STREAMS - A system and method for performing regular expression computations includes loading a plurality of input values corresponding to one or more input streams as elements of a vector register implemented on programmable storage media. New state indexes are computed using the input values, and current state values corresponding to different automata by using single instruction, multiple data (SIMD) vector operations. New state values associated with the different automata are determined using the new state indexes to look up new state values such that state transitions for a plurality of regular expressions are processed concurrently. | 12-08-2011 |
20120011348 | Matrix Multiplication Operations Using Pair-Wise Load and Splat Operations - Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand. | 01-12-2012 |
20120060020 | VECTOR INDEX INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a start value and an increment value, and optionally receiving a predicate vector with N elements as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element in the result vector, the processor sets the element in the result vector equal to the start value plus a product of the increment value multiplied by a specified number of elements to the left of the element in the result vector. | 03-08-2012 |
20120079252 | PERFORMING A MULTIPLY-MULTIPLY-ACCUMULATE INSTRUCTION - In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed. | 03-29-2012 |
20120079253 | FUNCTIONAL UNIT FOR VECTOR LEADING ZEROES, VECTOR TRAILING ZEROES, VECTOR OPERAND 1s COUNT AND VECTOR PARITY CALCULATION - A method of performing vector operations on a semiconductor chip is described. The method includes performing a first vector instruction with a vector functional unit implemented on the semiconductor chip and performing a second vector instruction with the vector functional unit. The first vector instruction is a vector multiply add instruction. The second vector instruction is a vector leading zeros count instruction. | 03-29-2012 |
20120096244 | METHOD, SYSTEM, AND PRODUCT FOR PERFORMING UNIFORMLY FINE-GRAIN DATA PARALLEL COMPUTING - A method is disclosed that includes computing, using at least one uniformly fine-grain data parallel computing unit, a mean-square error regression within a regression clustering algorithm. The mean-square error regression is represented in the form of at least one summation of a vector-vector multiplication. A computer program product and a computer system are also disclosed. | 04-19-2012 |
20120151191 | REDUCING POWER CONSUMPTION IN MULTI-PRECISION FLOATING POINT MULTIPLIERS - Methods and apparatus relating to reducing power consumption in multi-precision floating point multipliers are described. In an embodiment, certain portions of a multiplier are disabled in response to two or more multiplication operations with the same data size and data type occurring back-to-back. Other embodiments are also claimed and described. | 06-14-2012 |
20120173854 | PROCESSOR HAVING INCREASED EFFECTIVE PHYSICAL FILE SIZE VIA REGISTER MAPPING - Methods and apparatuses are provided for an efficient technique for processing registers having a known value while improving processor performance. The apparatus comprises a processor having a plurality of physical registers available for use in computations and a decoder for determining that a logical register contains a known value. A renaming unit maps the logical register containing the known value to an address outside an address range for the plurality of physical registers once the known value is determined. Thereafter, scheduling and execution units perform computations using the known value without storing the known value in one of the plurality of physical registers. The method comprises determining that a logical register of a processor has a known value and then mapping that logical register to a physical register address outside an expected range of physical register addresses; which indicates that the logical register represents the known value. Thereafter the processor processes any instruction using the known value without storing the known value in a physical register. | 07-05-2012 |
20120191955 | METHOD AND SYSTEM FOR FLOATING POINT ACCELERATION ON FIXED POINT DIGITAL SIGNAL PROCESSORS - A system for performing floating point operations comprising a floating point multiply function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor. A floating point add function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor. A floating point normalize function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor. | 07-26-2012 |
20120191956 | PROCESSOR HAVING INCREASED PERFORMANCE AND ENERGY SAVING VIA OPERAND REMAPPING - Methods and apparatuses are provided for achieving increased processor performance and energy saving via reordering operand mapping as opposed to the actual operand data. The apparatus comprises a plurality of physical registers available for use storing operands and includes a unit capable of mapping logical registers to the plurality of physical registers. A multiplexer then reorders the operands by reordering the mapping of logical registers to the plurality of physical registers, which increases processor performance and energy saving by reordering narrow registers instead of wide registers. The method comprises mapping logical registers storing to physical registers storing operands in a processor and then reordering the mapping to achieve the equivalent of reordering the operands without reordering the operands from the physical registers in the processor. | 07-26-2012 |
20120191957 | PREDICTING A RESULT FOR AN ACTUAL INSTRUCTION WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters an Actual instruction. Upon determining that a result of the Actual instruction is predictable, the processor dispatches a prediction micro-operation associated with the Actual instruction, wherein the prediction micro-operation generates a predicted result vector for the Actual instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true. | 07-26-2012 |
20120204013 | SYSTEM AND APPARATUS FOR GROUP FLOATING-POINT ARITHMETIC OPERATIONS - Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results. | 08-09-2012 |
20120221837 | RUNNING MULTIPLY-ACCUMULATE INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments include RunningMAC1P and RunningMAC2P instructions. In the described embodiments, a processor receives a first input vector, a second input vector, a third input vector, and a control vector. Upon executing a RunningMAC1P or a RunningMAC2P instruction, the processor sets a base value equal to a value from an element at a key element position in the first input vector. Next, the processor generates the result vector by, for each element of the result vector to the right of the key element position, setting the element in the result vector equal to a sum of the base value and a result of multiplying a value in each relevant element of the second input vector by a value in a corresponding element of the third input vector, from an element at the key element position to and including a predetermined element in the second input vector. | 08-30-2012 |
20120239910 | CONDITIONAL EXTRACT INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a vector processor that executes a ConditionalExtract instruction. In the described embodiments, the processor receives an input scalar variable, an input vector, and a predicate vector, wherein each of the vectors has N elements. The processor then executes the ConditionalExtract instruction, which causes the processor to determine if at least one element in the predicate vector is active. If so, the processor copies a value from a last element in the input vector for which a corresponding element in the predicate vector is active into a scalar result variable. Otherwise, of no elements of the predicate vector are active, the processor copies a value from the input scalar variable into the scalar result variable. | 09-20-2012 |
20120239911 | VALUE CHECK INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a processor that executes a ValueCheck instruction. In the described embodiments, the processor receives an input vector and a predicate vector, each including N elements. The processor then executes a ValueCheck instruction, which causes the processor to generate a result vector. When generating the result vector, for each element in a set of elements in the input vector for which a corresponding element of the predicate vector is active, the processor determines if at least one of the elements in the set of elements precedes the element in the input vector and contains a different value than the element in the input vector. If so, the processor writes an identifier for a closest preceding active element that contains the different value into a corresponding element of a result vector. Otherwise, the processor writes a zero in the corresponding element of the result vector. | 09-20-2012 |
20120272046 | Vector Completion Mask Handling - Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation. | 10-25-2012 |
20120290819 | DSP BLOCK WITH EMBEDDED FLOATING POINT STRUCTURES - A specialized processing block includes a first floating-point arithmetic operator stage, a second floating-point arithmetic operator stage, and configurable interconnect within the specialized processing block for routing signals into and out of each of the first and second floating-point arithmetic operator stages. In some embodiments, the configurable interconnect may be configurable to route a plurality of block inputs to inputs of the first floating-point arithmetic operator stage, at least one of the block inputs to an input of the second floating-point arithmetic operator stage, output of the first floating-point arithmetic operator stage to an input of the second floating-point arithmetic operator stage, at least one of the block inputs to a direct-connect output to another such block, output of the first floating-point arithmetic operator stage to the direct-connect output, and a direct-connect input from another such block to an input of the second floating-point arithmetic operator stage. | 11-15-2012 |
20120317401 | Load/Move Duplicate Instructions for a Processor - A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register. | 12-13-2012 |
20130007422 | PROCESSING VECTORS USING WRAPPING ADD AND SUBTRACT INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector. | 01-03-2013 |
20130013901 | SYSTEM AND APPARATUS FOR GROUP FLOATING-POINT INFLATE AND DEFLATE OPERATIONS - Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results. | 01-10-2013 |
20130019084 | ProcessorAANM Orchard; David ArthurAACI MalvernAACO GBAAGP Orchard; David Arthur Malvern GBAANM Wilson; Rebecca AnneAACI MalvernAACO GBAAGP Wilson; Rebecca Anne Malvern GBAANM Pritchard; Jonathan Alexander SkoylesAACI MalvernAACO GBAAGP Pritchard; Jonathan Alexander Skoyles Malvern GBAANM Cooper; Martin JamesAACI MalvernAACO GBAAGP Cooper; Martin James Malvern GBAANM Shepherd; Terence JohnAACI MalvernAACO GBAAGP Shepherd; Terence John Malvern GBAANM Lewin; Andrew CharlesAACI MalvernAACO GBAAGP Lewin; Andrew Charles Malvern GBAANM Tapster; Paul RichardAACI MalvernAACO GBAAGP Tapster; Paul Richard Malvern GBAANM Bennett; Charlotte Rachel HelenAACI MalvernAACO GBAAGP Bennett; Charlotte Rachel Helen Malvern GB - Apparatus ( | 01-17-2013 |
20130024669 | PROCESSING VECTORS USING WRAPPING SHIFT INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a shift operation on another input vector dependent upon the input vector and the control vector. | 01-24-2013 |
20130024670 | PROCESSING VECTORS USING WRAPPING MULTIPLY AND DIVIDE INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a product or quotient operation on another input vector dependent upon the input vector and the control vector. | 01-24-2013 |
20130024671 | PROCESSING VECTORS USING WRAPPING NEGATION INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a negation operation dependent upon the input vector and the control vector. | 01-24-2013 |
20130024672 | PROCESSING VECTORS USING WRAPPING PROPAGATE INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive a basis vector, an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping propagate operation dependent upon the input vectors. | 01-24-2013 |
20130036296 | FLOATING POINT EXECUTION UNIT WITH FIXED POINT FUNCTIONALITY - A floating point execution unit is capable of selectively repurposing one or more adders in an exponent path of the floating point execution unit to perform fixed point addition operations, thereby providing fixed point functionality in the floating point execution unit. | 02-07-2013 |
20130042092 | MERGE OPERATIONS OF DATA ARRAYS BASED ON SIMD INSTRUCTIONS - A method and apparatus are provided to perform efficient merging operations of two or more streams of data by using SIMD instruction. Streams of data are merged together in parallel and with mitigated or removed conditional branching. The merge operations of the streams of data include Merge AND and Merge OR operations. | 02-14-2013 |
20130067203 | PROCESSING DEVICE AND A SWIZZLE PATTERN GENERATOR - A swizzle pattern generator is provided to reduce an overhead due to execution of a swizzle instruction in vector processing. The swizzle pattern generator is configured to provide swizzle patterns with respect to data sets of at least one vector register or vector processing unit. The swizzle pattern generator may be reconfigurable to generate various swizzle patterns for different vector operations. | 03-14-2013 |
20130067204 | Instructions With Floating Point Control Override - Methods and apparatus relating to instructions with floating point control override are described. In an embodiment, floating point operation settings indicated by a floating point control register may be overridden on a per instruction basis. Other embodiments are also described. | 03-14-2013 |
20130073836 | FINE-GRAINED INSTRUCTION ENABLEMENT AT SUB-FUNCTION GRANULARITY - Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.). | 03-21-2013 |
20130073837 | Input Vector Analysis for Memoization Estimation - A function's purity may be estimated by comparing a new input vector to previously analyzed input vectors. When a new input vector is within a confidence boundary, the new input vector may be treated as a known vector, even when that vector has not been evaluated. The input vector may reflect the input parameters passed to a function, and the function may be analyzed to determine whether to memoize with the input vector. The function may be a function that behaves as a pure function in some circumstances and with some input vectors, but not with others. By memoizing the function when possible, the function may be executed much faster, thereby improving performance. | 03-21-2013 |
20130086367 | Tracking operand liveliness information in a computer system and performance function based on the liveliness information - Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module. | 04-04-2013 |
20130103932 | MULTI-ADDRESSABLE REGISTER FILES AND FORMAT CONVERSIONS ASSOCIATED THEREWITH - A multi-addressable register file is addressed by a plurality of types of instructions, including scalar, vector and vector-scalar extension instructions. It may be determined that data is to be translated from one format to another format. If so determined, a convert machine instruction is executed that obtains a single precision datum in a first representation in a first format from a first register; converts the single precision datum of the first representation in the first format to a converted single precision datum of a second representation in a second format; and places the converted single precision datum in a second register. | 04-25-2013 |
20130159680 | SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR PARALLELIZING LARGE NUMBER ARITHMETIC - Methods, systems, and computer program products for the performance of arithmetic operations on large numbers. The addition of large numbers may be parallelized by adding corresponding sections of the numbers in parallel. The multiplication of large numbers may be accomplished by applying a multiplier to a multiplicand after the latter is divided into sections, where the multiplication of the sections is performed in parallel. Products for each section are saved in high and low order vectors, which may then be aligned and added. The comparison of two large numbers may be performed by comparing the numbers, section by section, in parallel. In an embodiment, these processes may be performed in a graphics processing unit (GPU) having multiple cores. In an embodiment, such a GPU may be integrated into a larger die that also incorporates one or more conventional central processing unit (CPU) cores. | 06-20-2013 |
20130159681 | VERIFYING SPECULATIVE MULTITHREADING IN AN APPLICATION - Verifying speculative multithreading in an application executing in a computing system, including: executing one or more test instructions serially thereby producing a serial result, including insuring that all data dependencies among the test instructions are satisfied; executing the test instructions speculatively in a plurality of threads thereby producing a speculative result; and determining whether a speculative multithreading error exists including: comparing the serial result to the speculative result and, if the serial result does not match the speculative result, determining that a speculative multithreading error exists. | 06-20-2013 |
20130159682 | DECIMAL FLOATING-POINT PROCESSOR - A method for operating a decimal-floating point (DFP) processor. The method includes identifying a first op-code requiring read access to a first plurality of DFP operands in a vector register of the DFP processor; granting read access from a first port of the vector register to a first execution unit of the DFP processor selected to execute the first op-code; initializing a read pointer of the first port; reading out, from the first port and based on the read pointer, a first DFP operand of the plurality of DFP operands in response to a read request from the first execution unit; and adjusting the read pointer of the first port in response to reading out the first DFP operand. | 06-20-2013 |
20130173891 | CONVERT FROM ZONED FORMAT TO DECIMAL FLOATING POINT FORMAT - Machine instructions, referred to herein as a long Convert from Zoned instruction (CDZT) and extended Convert from Zoned instruction (CXZT), are provided that read EBCDIC or ASCII data from memory, convert it to the appropriate decimal floating point format, and write it to a target floating point register or floating point register pair. Further, machine instructions, referred to herein as a long Convert to Zoned instruction (CZDT) and extended Convert to Zoned instruction (CZXT), are provided that convert a decimal floating point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and store it to a target memory location. | 07-04-2013 |
20130191619 | MULTIFUNCTION HEXADECIMAL INSTRUCTION FORM SYSTEM AND PROGRAM PRODUCT - A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle. | 07-25-2013 |
20130219153 | Load/Move and Duplicate Instructions for a Processor - A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register. | 08-22-2013 |
20130238880 | OPERATION PROCESSING DEVICE, MOBILE TERMINAL AND OPERATION PROCESSING METHOD - An operation processing device for executing a plurality of operations for aligned data by one vector instruction includes a first mask storage unit and a second mask storage unit. The first mask storage unit stores first mask data to designate each of the plurality of operations a true or false operation, and the second mask storage unit stores second mask data to designate a number to be true continuously, in the plurality of operations. | 09-12-2013 |
20130246757 | VECTOR FIND ELEMENT EQUAL INSTRUCTION - Processing of character data is facilitated. A Find Element Equal instruction is provided that compares data of multiple vectors for equality and provides an indication of equality, if equality exists. An index associated with the equal element is stored in a target vector register. Further, the same instruction, the Find Element Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare. | 09-19-2013 |
20130246758 | VECTOR STRING RANGE COMPARE - Processing of character data is facilitated. A Vector String Range Compare instruction is provided that compares each element of a vector with a range of values based on a set of controls to determine if there is a match. An index associated with the matched element or a mask representing the matched element is stored in a target vector register. Further, the same instruction, the Vector String Range Compare instruction, also searches a selected vector for null elements, also referred to as zero elements. | 09-19-2013 |
20130246759 | VECTOR FIND ELEMENT NOT EQUAL INSTRUCTION - Processing of character data is facilitated. A Find Element Not Equal instruction is provided that compares data of multiple vectors for inequality and provides an indication of inequality, if inequality exists. An index associated with the unequal element is stored in a target vector register. Further, the same instruction, the Find Element Not Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare. | 09-19-2013 |
20130262837 | PROGRAMMABLE COUNTERS FOR COUNTING FLOATING-POINT OPERATIONS IN SMD PROCESSORS - A processor includes an execution unit to execute instructions, where each operand of each executed instruction has one or more elements of an element size and at least one operand of the instruction corresponds to a register of a register size. The processor further includes a counter configured to count a number of instructions that have been executed by the execution unit associated with a particular combination of register size and element size. | 10-03-2013 |
20130275730 | APPARATUS AND METHOD OF IMPROVED EXTRACT INSTRUCTIONS - An apparatus is described that includes instruction execution logic circuitry to execute first, second, third and fourth instructions. Both the first instruction and the second instruction select a first group of input vector elements from one of multiple first non overlapping sections of respective first and second input vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction select a second group of input vector elements from one of multiple second non overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus includes masking layer circuitry to mask the first and second groups of the first and third instructions at a first granularity, where, respective resultants produced therewith are respective resultants of the first and third instructions. The masking circuitry is also to mask the first and second groups of the second and fourth instructions at a second granularity, where, respective resultants produced therewith are respective resultants of the second and fourth instructions. | 10-17-2013 |
20130275731 | VECTOR INSTRUCTION FOR PRESENTING COMPLEX CONJUGATES OF RESPECTIVE COMPLEX NUMBERS - An apparatus is described having a semiconductor chip that has an instruction execution pipeline. The instruction execution pipeline has an execution unit with logic circuitry to perform the following for an instruction: accept input vector elements representing real and imaginary parts of a plurality of complex numbers; and, present the complex conjugates of the complex numbers. | 10-17-2013 |
20130290685 | FLOATING POINT ROUNDING PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A method of an aspect includes receiving a floating point rounding instruction. The floating point rounding instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point that each of the one or more floating point data elements are to be rounded to, and indicates a destination storage location. A result is stored in the destination storage location in response to the floating point rounding instruction. The result includes one or more rounded result floating point data elements. Each of the one or more rounded result floating point data elements includes one of the floating point data elements of the source, in a corresponding position, which has been rounded to the indicated number of fraction bits. Other methods, apparatus, systems, and instructions are disclosed. | 10-31-2013 |
20130290686 | INTEGRATED CIRCUIT DEVICE AND METHOD FOR CALCULATING A PREDICATE VALUE - An integrated circuit device comprises at least one instruction processing module arranged to perform branch predication. The at least one instruction processing module comprises at least one predicate calculation module arranged to receive as an input at least one result vector for a predicate function and at least one conditional parameter value therefor and output a predicate result value from the at least one result vector based at least partly on the at least one received conditional parameter value. | 10-31-2013 |
20130326199 | METHOD AND APPARATUS FOR CONTROLLING A MXCSR - Disclosed is an apparatus and method generally related to controlling a multimedia extension control and status register (MXCSR). A processor core may include a floating point unit (FPU) to perform arithmetic functions; and a multimedia extension control register (MXCR) to provide control bits to the FPU. Further an optimizer may be used to select a speculative multimedia extension status register (SPEC_MXSR) from a plurality of SPEC_MXSRs to update a multimedia extension status register (MXSR) based upon an instruction. | 12-05-2013 |
20130332707 | SPEED UP BIG-NUMBER MULTIPLICATION USING SINGLE INSTRUCTION MULTIPLE DATA (SIMD) ARCHITECTURES - A processing apparatus may be configured to include logic to generate a first set of vectors based on a first integer and a second set of vectors based on a second integer, logic to calculate sub products by multiplying the first set of vectors to the second set of vectors, logic to split each sub product into a first half and a second half and logic to generate a final result by adding together all first and second halves at respective digit positions. | 12-12-2013 |
20130339678 | MULTI-ELEMENT INSTRUCTION WITH DIFFERENT READ AND WRITE MASKS - A method is described that includes reading a first read mask from a first register. The method also includes reading a first vector operand from a second register or memory location. The method also includes applying the read mask against the first vector operand to produce a set of elements for operation. The method also includes performing an operation of the set elements. The method also includes creating an output vector by producing multiple instances of the operation's result. The method also includes reading a first write mask from a third register, the first write mask being different than the first read mask. The method also includes applying the write mask against the output vector to create a resultant vector. The method also includes writing the resultant vector to a destination register. | 12-19-2013 |
20140006755 | VECTOR MULTIPLICATION WITH ACCUMULATION IN LARGE REGISTER SPACE | 01-02-2014 |
20140006756 | Systems, Apparatuses, and Methods for Performing a Shuffle and Operation (Shuffle-Op) | 01-02-2014 |
20140019728 | CONTROLLING AN ORDER FOR PROCESSING DATA ELEMENTS DURING VECTOR PROCESSING - A data processing apparatus includes a register bank having a plurality of registers for storing vectors being processed; a pipelined processor for processing the stream of vector instructions; the pipelined processor comprising circuitry configured to detect data dependencies for the vectors processed by the stream of vector instructions and stored in the plurality of registers and to determine constraints on timing of execution for the vector instructions such that no register data hazards arise. Register data hazards arise where two accesses to a same register, at least one of said accesses being a write, occur in an order different to an order of said instruction stream such that an access occurring later in said instruction stream starts before an access occurring earlier in said instruction stream has completed. The pipelined processor includes data element hazard determination circuitry. | 01-16-2014 |
20140040603 | VECTOR PROCESSING IN AN ACTIVE MEMORY DEVICE - Embodiments relate to vector processing in an active memory device. An aspect includes a method for vector processing in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Based on the iteration count, execution of the sub-instructions in parallel is repeated for multiple iterations by the processing element. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions. | 02-06-2014 |
20140052968 | SUPER MULTIPLY ADD (SUPER MADD) INSTRUCTION - A method of processing an instruction is described that includes fetching and decoding the instruction. The instruction has separate destination address, first operand source address and second operand source address components. The first operand source address identifies a location of a first mask pattern in mask register space. The second operand source address identifies a location of a second mask pattern in the mask register space. The method further includes fetching the first mask pattern from the mask register space; fetching the second mask pattern from the mask register space; merging the first and second mask patterns into a merged mask pattern; and, storing the merged mask pattern at a storage location identified by the destination address. | 02-20-2014 |
20140052969 | SUPER MULTIPLY ADD (SUPER MADD) INSTRUCTIONS WITH THREE SCALAR TERMS - A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand))+(b*(second vector operand))+c. | 02-20-2014 |
20140052970 | OPCODE COUNTING FOR PERFORMANCE MEASUREMENT - Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups. | 02-20-2014 |
20140059328 | MECHANISM FOR PERFORMING SPECULATIVE PREDICATED INSTRUCTIONS - A mechanism for executing speculative predicated instructions may include execution of initiating execution of a vector instruction when one or more operands upon which the vector instruction depends are available for use, even if a predicate vector that the vector instruction also depends is not available. If the predicate vector was not available, the results of the execution of the vector instruction may be temporarily held until the predicate vector becomes available, at which time, a destination vector may be updated with the results. | 02-27-2014 |
20140082333 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING AN ABSOLUTE DIFFERENCE CALCULATION BETWEEN CORRESPONDING PACKED DATA ELEMENTS OF TWO VECTOR REGISTERS - Embodiments of systems, apparatuses, and methods for performing in a computer processor absolute difference calculation in response to a single vector packed absolute difference instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described. | 03-20-2014 |
20140089644 | CIRCUIT AND METHOD FOR IDENTIFYING EXCEPTION CASES IN A FLOATING-POINT UNIT AND GRAPHICS PROCESSING UNIT EMPLOYING THE SAME - A floating-point unit and a method of identifying exception cases in a floating-point unit. In one embodiment, the floating-point unit includes: (1) a floating-point computation circuit having a normal path and an exception path and operable to execute an operation on an operand and (2) a decision circuit associated with the normal path and the exception path and configured to employ a flush-to-zero mode of the floating-point unit to determine which one of the normal path and the exception path is appropriate for carrying out the operation on the operand. | 03-27-2014 |
20140095842 | ACCELERATED INTERLANE VECTOR REDUCTION INSTRUCTIONS - A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane. | 04-03-2014 |
20140095843 | Systems, Apparatuses, and Methods for Performing Conflict Detection and Broadcasting Contents of a Register to Data Element Positions of Another Register - Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting. | 04-03-2014 |
20140136820 | Recycling Error Bits in Floating Point Units - A mechanism for recycling error bits in a floating point unit is disclosed. A system of the disclosure includes a memory and a processing device communicably coupled to the memory. In one embodiment, the processing device comprising a floating point unit (FPU) to generate a result value from applying an operation on floating point number inputs to the FPU and generate an error value using the result value. The FPU also writes the result value to a first register of the processing device dedicated to storing results from the operation of the FPU and writes the error value to a second register of the processing device dedicated to storing errors from the operation of the FPU. | 05-15-2014 |
20140149720 | FLOATING POINT EXECUTION UNIT FOR CALCULATING PACKED SUM OF ABSOLUTE DIFFERENCES - A method and circuit arrangement provide support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost. | 05-29-2014 |
20140181481 | DETECTION OF POTENTIAL NEED TO USE A LARGER DATA FORMAT IN PERFORMING FLOATING POINT OPERATIONS - Detection of whether a result of a floating point operation is safe. Characteristics of the result are examined to determine whether the result is safe or potentially unsafe, as defined by the user. An instruction is provided to facilitate detection of safe or potentially unsafe results. | 06-26-2014 |
20140189320 | Instruction for Determining Histograms - A processor is described having a functional unit of an instruction execution pipeline. The functional unit has comparison bank circuitry and adder circuitry. The comparison bank circuitry is to compare one or more elements of a first input vector against an element of a second input vector. The adder circuitry is coupled to the comparison bank circuitry to add the number of elements of the second input vector that match a value of the first input vector on an element by element basis of the first input vector. | 07-03-2014 |
20140189321 | INSTRUCTIONS AND LOGIC TO VECTORIZE CONDITIONAL LOOPS - Instructions and logic provide vectorization of conditional loops. A vector expand instruction has a parameter to specify a source vector, a parameter to specify a conditions mask register, and a destination parameter to specify a destination vector to hold n consecutive vector elements, each of the plurality of n consecutive vector elements having a same variable partition size of m bytes. In response to the processor instruction, data is copied from consecutive vector elements in the source vector, and expanded into unmasked vector elements of the specified destination vector, without copying data into masked vector elements of the destination vector, wherein n varies responsive to the processor instruction executed. The source vector may be a register and the destination vector may be in memory. Some embodiments store counts of the condition decisions. Alternative embodiments may store other data, for example such as target addresses, or table offsets, or indicators of processing directives, etc. | 07-03-2014 |
20140195783 | DOT PRODUCT PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A method of an aspect includes receiving a dot product instruction. The dot product instruction indicates a first source packed data including at least four data elements, indicates a second source packed data including at least eight data elements, and indicates a destination storage location. A result packed data is stored in the destination storage location in response to the dot product instruction. The result includes a plurality of data elements that each includes a dot product result. Each of the dot product results includes a sum of products of the at least four data elements of the first source packed data with corresponding data elements in a different subset of at least four data elements of the second source packed data. Other methods, apparatus, systems, and instructions are disclosed. | 07-10-2014 |
20140208077 | VECTOR FLOATING POINT TEST DATA CLASS IMMEDIATE INSTRUCTION - A Vector Floating Point Test Data Class Immediate instruction is provided that determines whether one or more elements of a vector specified in the instruction are of one or more selected classes and signs. If a vector element is of a selected class and sign, an element in an operand of the instruction corresponding to the vector element is set to a first defined value, and if the vector element is not of the selected class and sign, the operand element corresponding to the vector element is set to a second defined value. | 07-24-2014 |
20140208078 | VECTOR CHECKSUM INSTRUCTION - A Vector Checksum instruction. Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand. | 07-24-2014 |
20140208079 | VECTOR GALOIS FIELD MULTIPLY SUM AND ACCUMULATE INSTRUCTION - A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand. | 07-24-2014 |
20140237217 | VECTORIZATION IN AN OPTIMIZING COMPILER - An optimizing compiler includes a vectorization mechanism that optimizes a computer program by substituting code that includes one or more vector instructions (vectorized code) for one or more scalar instructions. The cost of the vectorized code is compared to the cost of the code with only scalar instructions. When the cost of the vectorized code is less than the cost of the code with only scalar instructions, the vectorization mechanism determines whether the vectorized code will likely result in processor stalls. If not, the vectorization mechanism substitutes the vectorized code for the code with only scalar instructions. When the vectorized code will likely result in processor stalls, the vectorization mechanism does not substitute the vectorized code, and the code with only scalar instructions remains in the computer program. | 08-21-2014 |
20140237218 | SIMD INTEGER MULTIPLY-ACCUMULATE INSTRUCTION FOR MULTI-PRECISION ARITHMETIC - A multiply-and-accumulate (MAC) instruction allows efficient execution of unsigned integer multiplications. The MAC instruction indicates a first vector register as a first operand, a second vector register as a second operand, and a third vector register as a destination. The first vector register stores a first factor, and the second vector register stores a partial sum. The MAC instruction is executed to multiply the first factor with an implicit second factor to generate a product, and to add the partial sum to the product to generate a result. The first factor, the implicit second factor and the partial sum have a same data width and the product has twice the data width. The most significant half of the result is stored in the third vector register, and the least significant half of the result is stored in the second vector register. | 08-21-2014 |
20140281419 | COMBINED FLOATING POINT MULTIPLIER ADDER WITH INTERMEDIATE ROUNDING LOGIC - An error handling method includes identifying a code region eligible for cumulative multiply add (CMA) optimization and translating code region instructions into interpreter code instructions, which may include translating sequences of multiply add instructions in the code region instructions into fusion code including CMA instructions. Floating point (FP) exceptions generated by the fusion code may be monitored and at least a portion of the code region instructions may be re-translated to eliminate some or all fusion code if CMA intermediate rounding exceptions exceed a threshold. | 09-18-2014 |
20140281420 | ADD-COMPARE-SELECT INSTRUCTION - An apparatus includes memory storing an instruction that identifies a first register, a second register, and a third register. Upon execution of the instruction by a processor, a vector addition operation is performed by the processor to add first values from the first register to second values from the second register. A vector subtraction operation is also performed upon execution of the instruction to subtract the second value from third values from the third register. A vector compare operation is also performed upon execution of the instruction to compare results of the vector addition operation to results of the vector subtraction operation. | 09-18-2014 |
20140281421 | ARBITRARY SIZE TABLE LOOKUP AND PERMUTES WITH CROSSBAR - An example method of updating an output data vector includes identifying a data value vector including element data values. The method also includes identifying an address value vector including a set of elements. The method further includes applying a conditional operator to each element of the set of elements in the address value vector. The method also includes for each element data value in the data value vector, determining whether to update an output data vector based on applying the conditional operator. | 09-18-2014 |
20140289502 | ENHANCED VECTOR TRUE/FALSE PREDICATE-GENERATING INSTRUCTIONS - Systems, apparatuses and methods for utilizing enhanced vector true/false instructions. The enhanced vector true/false instructions generate enhanced predicates to correspond to the request element width and/or vector size. A vector true instruction generates an enhanced predicate where all elements supported by the processing unit are active. A vector false instruction generates an enhanced predicate where all elements supported by the processing unit are inactive. The enhanced predicate specifies the requested element width in addition to designating the element selectors. | 09-25-2014 |
20140317387 | METHOD FOR PERFORMING DUAL DISPATCH OF BLOCKS AND HALF BLOCKS - A method for executing dual dispatch of blocks and half blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein each of the instruction blocks comprise two half blocks; scheduling the instructions of the instruction block to execute in accordance with a scheduler; and performing a dual dispatch of the two half blocks for execution on an execution unit. | 10-23-2014 |
20140331031 | RECONFIGURABLE PROCESSOR HAVING CONSTANT STORAGE REGISTER - A reconfigurable processor configured to include a constant storage register to store a constant is provided, thereby improving efficiency in the use of a memory space. Specifically, a reconfigurable processor includes a plurality of Functional Units (FUs), a configuration memory configured to store configuration information, and a constant storage register configured to store a constant that is used as an operand for an operation in the plurality of FUs. | 11-06-2014 |
20140344555 | Scalable Partial Vectorization - A system, method and computer program product to compute latencies of a plurality of expression trees in a basic block and to select a first and a second expression tree from the plurality of expression trees based on the computed latencies. The first expression tree is isomorphic to the second expression tree and the first and second expression trees are selected in order of largest to smallest latency. This selection ensures that the largest isomorphic expression trees are vectorized first. By vectorizing the largest isomorphic expression trees first, a basic block containing hundreds of statements can be vectorized without significant compile time. Moreover, vectorization of the largest isomorphic expression trees results in a significant improvement in system performance on SIMD processors. | 11-20-2014 |
20140351564 | SIMPLIFICATION OF LARGE NETWORKS AND GRAPHS - Embodiments relate to simplifying large and complex networks and graphs using global connectivity information based on calculated node centralities. An aspect includes calculating node centralities of a graph until a designated number of central nodes are detected. A percentage of the central nodes are then selected as pivot nodes. The neighboring nodes to each of the pivot nodes are then collapsed until the graph shrinks to a predefined threshold of total nodes. Responsive to the number of total nodes reaching the predefined threshold, the simplified graph is outputted. | 11-27-2014 |
20140351565 | SYSTEM AND APPARATUS FOR GROUP FLOATING-POINT INFLATE AND DEFLATE OPERATIONS - Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results. | 11-27-2014 |
20140351566 | MOVING AVERAGE PROCESSING IN PROCESSOR AND PROCESSOR - A processor, which executes m number of arithmetic operations in parallel, executes a partial sum instruction which takes an i-th to (i+m−1)-th elements of an input data series as input elements, so as to obtain first vector data, executes the partial sum instruction which takes a (i+x)-th to (i+x+m−1)-th elements of the input data series as the input elements, so as to obtain second vector data, and performs operations to subtract the p-th element of the first vector data and add the p-th element of the second vector data from and to a sum of the i-th to (i+x−1)-th elements of the input data series in parallel for each of the 0-th to (m−1)-th elements, so as to calculate sums of elements for m sections different from each other in parallel, and moving average processing to calculate a moving average from the sums of elements of the sections. | 11-27-2014 |
20150039866 | COMPUTER FOR AMDAHL-COMPLIANT ALGORITHMS LIKE MATRIX INVERSION - A family of computers is disclosed and claimed that supports simultaneous processes from the single core up to multi-chip Program Execution Systems (PES). The instruction processing of the instructed resources is local, dispensing with the need for large VLIW memories. The cores through the PES have maximum performance for Amdahl-compliant algorithms like matrix inversion, because the multiplications do not stall and the other circuitry keeps up. Cores with log based multiplication generators improve this performance by a factor of two for sine and cosine calculations in single precision floating point and have even greater performance for log | 02-05-2015 |
20150052335 | INTERPOLATION IMPLEMENTATION - Techniques are disclosed relating to floating-point operations in computer processors. In one embodiment, an apparatus includes a floating-point unit and circuitry configured to receive an initial value X for a floating-point operation. In this embodiment, X is between 0 and 1.0 inclusive. In this embodiment, the circuitry is configured to generate first and second floating-point values based on an exponent of X that sum to 1. In this embodiment, the floating-point unit is configured to perform an operation using the first and second floating-point values. The apparatus may reduce drift, in this embodiment, when a floating-point representation of X does not guarantee that the sum of X and (1−X) is 1. The apparatus may be configured to perform blending and/or interpolation operations using the first and second floating-point values. | 02-19-2015 |
20150052336 | SELECTIVELY CONTROLLING INSTRUCTION EXECUTION IN TRANSACTIONAL PROCESSING - Execution of instructions in a transactional environment is selectively controlled. A TRANSACTION BEGIN instruction initiates a transaction and includes controls that selectively indicate whether certain types of instructions are permitted to execute within the transaction. The controls include one or more of an allow access register modification control and an allow floating point operation control. | 02-19-2015 |
20150074383 | VECTOR GALOIS FIELD MULTIPLY SUM AND ACCUMULATE INSTRUCTION - A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand. | 03-12-2015 |
20150089205 | CONVERT FROM ZONED FORMAT TO DECIMAL FLOATING POINT FORMAT - Machine instructions, referred to herein as a long Convert from Zoned instruction (CDZT) and extended Convert from Zoned instruction (CXZT), are provided that read EBCDIC or ASCII data from memory, convert it to the appropriate decimal floating point format, and write it to a target floating point register or floating point register pair. Further, machine instructions, referred to herein as a long Convert to Zoned instruction (CZDT) and extended Convert to Zoned instruction (CZXT), are provided that convert a decimal floating point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and store it to a target memory location. | 03-26-2015 |
20150089206 | CONVERT TO ZONED FORMAT FROM DECIMAL FLOATING POINT FORMAT - Machine instructions, referred to herein as a long Convert from Zoned instruction (CDZT) and extended Convert from Zoned instruction (CXZT), are provided that read EBCDIC or ASCII data from memory, convert it to the appropriate decimal floating point format, and write it to a target floating point register or floating point register pair. Further, machine instructions, referred to herein as a long Convert to Zoned instruction (CZDT) and extended Convert to Zoned instruction (CZXT), are provided that convert a decimal floating point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and store it to a target memory location. | 03-26-2015 |
20150095623 | VECTOR INDEXED MEMORY ACCESS PLUS ARITHMETIC AND/OR LOGICAL OPERATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A processor including a decode unit to receive a vector indexed load plus arithmetic and/or logical (A/L) operation plus store instruction. The instruction is to indicate a source packed memory indices operand that is to have a plurality of packed memory indices. The instruction is also to indicate a source packed data operand that is to have a plurality of packed data elements. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to load a plurality of data elements from memory locations corresponding to the plurality of packed memory indices, perform A/L operations on the plurality of packed data elements of the source packed data operand and the loaded plurality of data elements, and store a plurality of result data elements in the memory locations corresponding to the plurality of packed memory indices. | 04-02-2015 |
20150095624 | VECTOR FLOATING POINT TEST DATA CLASS IMMEDIATE INSTRUCTION - A Vector Floating Point Test Data Class Immediate instruction is provided that determines whether one or more elements of a vector specified in the instruction are of one or more selected classes and signs. If a vector element is of a selected class and sign, an element in an operand of the instruction corresponding to the vector element is set to a first defined value, and if the vector element is not of the selected class and sign, the operand element corresponding to the vector element is set to a second defined value. | 04-02-2015 |
20150121043 | COMPUTER AND METHODS FOR SOLVING MATH FUNCTIONS - Computers and methods for performing mathematical functions are disclosed. An embodiment of a computer includes an operations level and a driver level. The operations level performs mathematical operations. The driver level includes a first lookup table and a second lookup table, wherein the first lookup table includes first data for calculating at least one mathematical function using a first level of accuracy. The second lookup table includes second data for calculating the at least one mathematical function using a second level of accuracy, wherein the first level of accuracy is greater than the second level of accuracy. A driver executes either the first data or the second data depending on a selected level of accuracy. | 04-30-2015 |
20150121044 | MERGED FLOATING POINT OPERATION USING A MODEBIT - A first floating-point operation unit receives first and second variables and performs a first operation generating a first output. A first rounding unit receives and rounds the first output to generate a second output if a control bit is in a first state. A second floating-point operation unit receives a third variable and either the first output or the second output and performs a second operation on the third variable and either the first output or the second output, to generate a third output. The second floating-point operation unit receives and operates on the first output if the control bit is in the first state, or the second output if the control bit is in the second state. A second rounding unit receives and rounds the third output. | 04-30-2015 |
20150347141 | OPCODE COUNTING FOR PERFORMANCE MEASUREMENT - Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups. | 12-03-2015 |
20150370557 | FLOATING POINT EXECUTION UNIT FOR CALCULATING PACKED SUM OF ABSOLUTE DIFFERENCES - A method provides support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost. | 12-24-2015 |
20160054995 | SINGLE-INSTRUCTION MULTIPLE DATA PROCESSOR - In accordance with at least one embodiment, a processor system is disclosed having a SIMD processor device that has a plurality of subsidiary processing elements that are controlled to process multiple data concurrently. In accordance with at least one embodiment, the SIMD processor is a vector processor (VPU) having a plurality of vector Arithmetic Units (AUs) as subsidiary processing elements, and the VPU executes an instruction to transfer table information from a global memory of the VPU to a plurality of local memories accessible by each AU. The VPU also executes an instruction that results in each processing element performing a table lookup from a table stored at its local memory. In response to the instruction, this table lookup uses a portion of a lookup value to access information from the table, and uses another portion of the lookup information to calculate an interpolated resultant based upon the accessed information. | 02-25-2016 |
20160070573 | CONDITION CODE GENERATION - A condition code can depend upon a numerical output of a floating point operation for a processing pipeline. A classification can be determined for the floating point operation of a received instruction. In response to the classification and using condition determination logic, a value can be calculated for the condition code by inferring from data that is available from the processing pipeline before the numerical output is available. The value for the condition code can be provided to branch decision logic of the processing pipeline. | 03-10-2016 |
20160124746 | VECTOR OPERANDS WITH COMPONENT REPRESENTING DIFFERENT SIGNIFICANCE PORTIONS - A data processing system supports vector operands with components representing different bit significance portions of an integer number. Processing circuitry performs a processing operation specified by a program instruction in dependence upon a number of components comprising the vector as specified by metadata for the vector. | 05-05-2016 |
20160139918 | Performing Rounding Operations Responsive To An Instruction - In one embodiment, the present invention includes a method for receiving a rounding instruction and an immediate value in a processor, determining if a rounding mode override indicator of the immediate value is active, and if so executing a rounding operation on a source operand in a floating point unit of the processor responsive to the rounding instruction and according to a rounding mode set forth in the immediate operand. Other embodiments are described and claimed. | 05-19-2016 |
20160154647 | METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATIONS | 06-02-2016 |
20160179524 | COMPILER METHOD FOR GENERATING INSTRUCTIONS FOR VECTOR OPERATIONS ON A MULTI-ENDIAN PROCESSOR | 06-23-2016 |
20160202973 | FLOATING POINT EXECUTION UNIT FOR CALCULATING PACKED SUM OF ABSOLUTE DIFFERENCES | 07-14-2016 |
20160202974 | FLOATING POINT EXECUTION UNIT FOR CALCULATING PACKED SUM OF ABSOLUTE DIFFERENCES | 07-14-2016 |
20170235573 | INFERENCE BASED CONDITION CODE GENERATION | 08-17-2017 |
20170235574 | INFERENCE BASED CONDITION CODE GENERATION | 08-17-2017 |