Entries |
Document | Title | Date |
20080198167 | Computing system capable of parallelizing the operation of graphics processing units (GPUS) supported on an integrated graphics device (IGD) and one or more external graphics cards, employing a software-implemented multi-mode parallel graphics rendering subsystem - A computing system capable of parallelizing the operation of multiple graphics processing units (GPUs) supported on external graphics cards, employing a software-implemented multi-mode parallel graphics rendering subsystem. The computing system includes (i) CPU memory space for storing one or more graphics-based applications, (ii) one or more CPUs for executing the graphics-based applications, and (iii) a bridge circuit operably connecting one or more CPUs and the CPU memory space and including an integrated graphics device (IGD) having one or more GPU. The computing system also includes (iv) one or more graphics cards supporting multiple GPUs and being connected to the bridge circuit by way of a data communication interface, (v) a multi-mode parallel graphics rendering subsystem supporting multiple modes of parallel operation, (vi) a plurality of graphic processing pipelines (GPPLs), implemented using the GPUs, and (vii) an automatic mode control module. In an illustrative embodiment, the IGD has one internal GPU, and the external graphics card(s) supports multiple GPUs. During the run-time of the graphics-based application, the automatic mode control module automatically controls the mode of parallel operation of the multi-mode parallel graphics rendering subsystem so that the GPUs are driven in a parallelized manner. | 08-21-2008 |
20080211816 | Multiple parallel processor computer graphics system - An accelerated graphics processing subsystem combines the processing power of multiple graphics processing units (GPUs) or video cards. Video processing by the multiple video cards is organized by time division such that each video card is responsible for video data processing during a different time period. For example, two video cards may take turns, with the first video card controlling a display for a certain time period and the second video sequentially assuming video processing duties for a subsequent period. In this way, as one video card is managing the display in one time period, the second video card is processing video data for the for the next time period, thereby allowing extensive processing of the video data before the start of the next time period. The present invention may further incorporate load balancing such that the duration of the processing time periods for each of the video cards is dynamically modified to maximize composite video processing. | 09-04-2008 |
20080211817 | Internet-based application profile database server system for updating graphic application profiles (GAPS) stored within the multi-mode parallel graphics rendering system of client machines running one or more graphic applications - An Internet-based application profile database server system for updating, graphic application profiles (GAPs) stored within a behavioral profile database in a multi-mode parallel graphics rendering system (MMPGRS) embodied within a client machine running one or more graphic applications. Each graphics application has a graphic application profile (GAP), and each client machine has a behavior profile associated with the running of a particular graphics application on said client machine. The Internet-based application profile database server system includes an Internet server, interfaced with a RDBMS, for communicating with one or more client machines over a communication network, such as the Internet, and programming one or more GAPs in each client machine so that the client machine supports users with high graphics performance through adaptive multi-modal parallel graphics operation. | 09-04-2008 |
20080246772 | Multi-mode parallel graphics rendering system (MMPGRS) employing multiple graphics processing pipelines (GPPLS) and real-time performance data collection and analysis during the automatic control of the mode of parallel operation of said GPPLS - A multi-mode parallel graphics rendering system (MMPGRS) employing multiple graphics processing pipelines (GPPLs) and real-time performance data collection and analysis during the automatic control of the mode of parallel operation of the GPPLs. The MMPGRS supports multiple modes of parallel operation selected from the group consisting of object division, image division, and time division. The GPPLs support a parallel graphics rendering process that employs one or more of the object division, image division and/or time division modes of parallel operation in order to execute graphic commands and process graphics data, and render pixel-composited images containing graphics for display on a display device during the run-time of the graphics-based application. An automatic mode control module automatically controls the mode of parallel operation of the MMPGRS during the run-time of the graphics-based application by (i) automatically collecting performance data from at least one of the MMPGRS and the host computing system during the run-time of the graphics-based application, and (ii) automatically profiling the graphics-based application using the performance data and the analysis thereof. | 10-09-2008 |
20080297522 | IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND COMPUTER-READABLE STORAGE MEDIUM - An image processing apparatus has a memory in which a plurality of image processing commands are stored, a dependent information producing unit which produces dependent information in each image data block becoming a target image processing, the dependent information indicating a dependency relationship between image processing of the image data block and another processing, a dependency relationship solving unit which makes a determination of a practicable image processing based on the dependent information, the dependency relationship solving unit writing an image processing command of the practicable image processing in the memory, and a plurality of image processing units which read an image processing command stored in the memory, the image processing units performing the image processing to the image data block based on the image processing command. | 12-04-2008 |
20080303833 | Asnchronous notifications for concurrent graphics operations - A method and an apparatus for notifying a display driver to update a display with a graphics frame including multiple graphics data rendered separately by multiple graphics processing units (GPUs) substantially concurrently are described. Graphics commands may be received to dispatch to each GPU for rendering corresponding graphics data. The display driver may be notified when each graphics data has been completely rendered respectively by the corresponding GPU. | 12-11-2008 |
20080316216 | Computing system capable of parallelizing the operation of multiple graphics processing pipelines (GPPLS) supported on a multi-core CPU chip, and employing a software-implemented multi-mode parallel graphics rendering subsystem - A computing system capable of parallelizing the operation of multiple graphics processing units (GPUs) supported on external graphics cards, employing a software-implemented multi-mode parallel graphics rendering subsystem. The computing system includes (i) CPU memory space for storing one or more graphics-based applications, (ii) a multi-core GPU chip including one or more CPU-cores, a memory controller for controlling the CPU memory space, and an interconnect network, and (iii) one or more external graphics cards supporting multiple GPUs and being connected to the multi-core CPU chip by way of a data communication interface. The computing system also includes (i) one or more graphics cards supporting multiple GPUs and being connected to the multi-core CPU chip by way of a data communication interface, (ii) the multi-mode parallel graphics rendering subsystem supporting multiple modes of parallel operation, (iii) a plurality of graphic processing pipelines (GPPLs) implemented using some of the CPU-cores, and (iv) an automatic mode control module. During the run-time of the graphics-based application, the automatic mode control module automatically controls the mode of parallel operation of the MMPGRS, so that the GPUs are driven in a parallelized manner. | 12-25-2008 |
20090027402 | Method of controlling the mode of parallel operation of a multi-mode parallel graphics processing system (MMPGPS) embodied within a host comuting system - A method of controlling the mode of parallel operation of a multi-mode parallel graphics processing system (MMPGPS) embodied within a host computing system having (i) host memory space (HMS) for storing one or more graphics-based applications and a graphics library for generating graphics commands and data (GCAD) during the run-time (i.e. execution) of the graphics-based application, (ii) one or more CPUs for executing said graphics-based applications, (iii) a display device for displaying images containing graphics during the execution of said graphics-based applications, and (iv) a multi-mode parallel graphics rendering subsystem supporting multiple modes of parallel operation selected from the group consisting of object division, image division, and time division and having a plurality of graphic processing pipelines (GPPLs) supporting a parallel graphics rendering process that employs one of the object division, image division and/or time division modes of parallel operation. | 01-29-2009 |
20090027403 | GRAPHIC DATA PROCESSING APPARATUS AND METHOD - The present invention relates to an apparatus and method for processing graphic data. According to an embodiment, the graphic data processing apparatus includes a CPU having at least one core; a GPU configured to process graphic data; a usage level checking unit configured to check a usage level of the CPU and/or a usage level of the GPU; and a control unit configured to compare the checked usage level of the CPU with a usage level reference of the CPU and/or to compare the checked usage level of the GPU with a usage level reference of the GPU, to allow the graphic data to be processed in parallel by the CPU and the GPU or only by the GPU according to the comparison results. | 01-29-2009 |
20090066706 | Image Processing System - The present multi-processor system performs information processing suitably. The system can receive, reproduce and record a variety of image contents. By comprising a powerful CPU in the multi-processors, a plurality of pieces of large image data, such as high definition image data or the like, can be processed simultaneously in parallel, which was difficult conventionally. Since task processing, such as demodulation processing or the like, is assigned respectively in view of the remaining processing capacity of each of the plurality of processors, the system can reproduce contents efficiently. By sharing roles, a plurality of different contents, such as an image, a voice, or the like can be processed simultaneously and can be displayed or reproduced at a desired timing. | 03-12-2009 |
20090128570 | Method And System For Automatically Analyzing GPU Test Results - A method and system for automatically analyzing graphics processing unit (“GPU”) test results are disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of identifying the GPU test results associated with a first register type, creating a template document associated with the same first register type, wherein the template document is pre-configured to store and operate on the GPU test results of the first register type, filling the GPU test results in the template document, aggregating the GPU test results associated with the first register type to establish a common output, and determining a suitable register value from a passing range of register values based on the common output without human intervention. | 05-21-2009 |
20090147013 | PROCESSOR TASK AND DATA MANAGEMENT - Task and data management systems methods and apparatus are disclosed. A processor event that requires more memory space than is available in a local storage of a co-processor is divided into two or more segments. Each segment has a segment size that is less than or the same as an amount of memory space available in the local storage. The segments are processed with one or more co-processors to produce two or more corresponding outputs. | 06-11-2009 |
20090207179 | PARALLEL PROCESSING METHOD FOR SYNTHESIZING AN IMAGE WITH MULTI-VIEW IMAGES - A parallel processing method for synthesizing multi-view images is provided, which may parallel process at least a potion of the following steps. First, multiple reference images are input, wherein each reference image is correspondingly taken from a reference viewing angle. Next, an intended synthesized image corresponding to a viewpoint and an intended viewing angle is determined. Next, the intended synthesized image is divided to obtain multiple meshes and multiple vertices of the meshes, wherein the vertices are divided into several vertex groups, and each vertex and the viewpoint form a view direction. Next, the view direction is referenced to find several near-by images from the reference images for synthesizing an image of a novel viewing angle. After the foregoing actions are totally or partially processed according to the parallel processing mechanism, separate results are combined for use in a next processing stage. | 08-20-2009 |
20090207180 | FPD for AIRCRAFT - A method and apparatus are provided for displaying information on a flat panel display. The method includes the step of providing a plurality of serial data sources, where each serial data source provides pixel data for display within a corresponding respective portion of the display where the respective corresponding portions are each discrete, incorporate a plurality of horizontal and vertical lines of pixel data and are non-overlapping. The method further includes the steps of reformatting and combining the data from the plurality of data sources into a single parallel data stream and displaying the reformatted data on the liquid crystal display. | 08-20-2009 |
20090284535 | SOFTWARE RASTERIZATION OPTIMIZATION - Systems, methods, and computer-readable media for optimizing emulated fixed-function and programmable graphics operations are provided. Data comprising fixed function and programmable states for an image or scenario to be rendered is received. The data for the image is translated into operations. One or more optimizations are applied to the operations. The optimized operations are implemented to render the scenario. | 11-19-2009 |
20090289945 | CENTRALIZED STREAMING GAME SERVER - Exemplary embodiments include an interception mechanism for rendering commands generated by interactive applications, and a feed-forward control mechanism based on the processing of the commands on a rendering engine, on a pre-filtering module, and on a visual encoder. Also a feed-back control mechanism from the encoder is described. The mechanism is compression-quality optimized subject to some constraints on streaming bandwidth and system delay. The mechanisms allow controllable levels of detail for different rendered objects, controllable post filtering of rendered images, and controllable compression quality of each object in compressed images. A mechanism for processing and streaming of multiple interactive applications in a centralized streaming application server is also described. | 11-26-2009 |
20100007668 | Systems and methods for providing scalable parallel graphics rendering capability for information handling systems - Systems and methods for providing scalability of multiple graphic processor units (GPU) that work together in a multi-coprocessor fashion to provide parallel graphics rendering methodology for an information handling system. The total number of active GPUs working together to provide parallel graphics rendering methodology for a given information handling system may be increased in a modular manner beyond one or two GPUs, e.g., so as allow as many GPUs as desired to be attached to a given information handling system such as a desktop computer or notebook computer. | 01-14-2010 |
20100066748 | Method And Apparatus For Scheduling The Processing Of Multimedia Data In Parallel Processing Systems - An efficient method and device for the parallel processing of multimedia data. Blocks (or portions thereof) are transmitted to various parallel processors, in the order of their dependency data. Earlier blocks are sent to the parallel processors first, with later blocks sent later. The blocks are stored in the parallel processors in specific locations, and shifted around as necessary, so that every block, when it is processed, has its dependency data located in a specific set of earlier blocks with specified relative positions. In this manner, its dependency data can be retrieved with the same commands. That is, earlier blocks are shifted around so that later blocks can be processed with a single set of commands that instructs each processor to retrieve its dependency data from specific known relative locations that do not vary. | 03-18-2010 |
20100141665 | System and Method for Photorealistic Imaging Workload Distribution - A graphics client receives a frame, the frame comprising scene model data. A server load balancing factor is set based on the scene model data. A prospective rendering factor is set based on the scene model data. The frame is partitioned into a plurality of server bands based on the server load balancing factor and the prospective rendering factor. The server bands are distributed to a plurality of compute servers. Processed server bands are received from the compute servers. A processed frame is assembled based on the received processed server bands. The processed frame is transmitted for display to a user as an image. | 06-10-2010 |
20100149193 | Method And System For Enabling Managed Code-Based Application Program To Access Graphics Processing Unit - One embodiment of the present invention sets forth a method for enabling an intermediate code-based application program to access a target graphics processing unit (GPU) in a parallel processing environment. The method includes the steps of compiling a source code of the intermediate code-based application program to an intermediate code, translating the intermediate code to a PTX instruction code, and translating the PTX instruction code to a machine code executable by the target graphics processing unit before delivering the machine code to the target GPU. | 06-17-2010 |
20100149194 | Method And System For Enabling Managed Code-Based Application Program To Access Graphics Processing Unit - One embodiment of the present invention sets forth a method for enabling an intermediate code-based application program to access a target graphics processing unit (GPU) in a parallel processing environment. The method includes the steps of compiling a source code of the intermediate code-based application program to an intermediate code, translating the intermediate code to a PTX instruction code, and translating the PTX instruction code to a machine code executable by the target graphics processing unit before delivering the machine code to the target GPU. | 06-17-2010 |
20100149195 | LOAD BALANCING IN MULTIPLE PROCESSOR RENDERING SYSTEMS - Methods and systems for allocating workloads in a pixel sequential rendering system comprising a plurality of processors are disclosed. Such workloads typically comprise a raster pixel image comprising a plurality of graphical objects. For each scan line ( | 06-17-2010 |
20100164964 | DISPLAY SYSTEM WITH IMPROVED GRAPHICS ABILITIES WHILE SWITCHING GRAPHICS PROCESSING UNITS - Methods and apparatuses are disclosed for improving graphics abilities while switching between graphics processing units (GPUs). Some embodiments may include a display system, including a plurality of graphics processing units (GPUs) and a memory buffer coupled to the GPUs via a timing controller, where the memory buffer stores data associated with a first video frame from a first GPU within the plurality of GPUs and where the timing controller is switching between the first GPU and a second GPU within the plurality. | 07-01-2010 |
20100271375 | ADAPTIVE LOAD BALANCING IN A MULTI PROCESSOR GRAPHICS PROCESSING SYSTEM - Systems and methods for balancing a load among multiple graphics processors that perform different portions of a rendering task. A rendering task is partitioned into portions for each of two (or more) graphics processors. The graphics processors perform their respective portions of the rendering task and return feedback data indicating completion of the assigned portion. Based on the feedback data, an imbalance can be detected between respective loads of two of the graphics processors. In the event that an imbalance exists, the rendering task is re-partitioned to increase the portion assigned to the less heavily loaded processor and to decrease the portion assigned to the more heavily loaded processor. | 10-28-2010 |
20110057937 | METHOD AND SYSTEM FOR BLOCKING DATA ON A GPU - A method is provided for optimizing computer processes executing on a graphics processing unit (GPU) and a central processing unit (CPU). Process data is subdivided into sequentially processed data and parallel processed data. The parallel processed data is subdivided into a plurality of data blocks assigned to a plurality of processing cores of the GPU. The data blocks on the GPU are processed with other data blocks in parallel on the plurality of processing cores. Sequentially processed data is processed on the CPU. Result data processed on the CPU is returned. | 03-10-2011 |
20110057938 | Variable Frequency Output To One Or More Buffers - A system and method are presented by which data on a graphics processing unit (GPU) can be output to one or more buffers with independent output frequencies. In one embodiment, a GPU includes a shader processor configured to respectively emit a plurality of data sets into a plurality of streams in parallel. Each data is emitted into at least a portion of its respective stream. Also included is a first number of counters configured to respectively track the emitted data sets. | 03-10-2011 |
20110063308 | DISPLAY SYSTEM WITH FRAME REUSE USING DIVIDED MULTI-CONNECTOR ELEMENT DIFFERENTIAL BUS CONNECTOR - A method includes reducing power of a first graphics processor by disabling or not using its rendering engine and leaving a display engine of the same first graphics processor capable of outputting display frames from a corresponding first frame buffer to a display. A display frame is rendered by a second graphics processor while the rendering engine of the first graphics processor is in a reduced power state, such as a non-rendering state. The rendered frame is stored in a corresponding second frame buffer of the second graphics processor, such as a local frame buffer and copied from the second frame buffer to the first frame buffer. The copied frame in the first frame buffer is then displayed on a display while the rendering engine of the first graphics processor is in the reduced power state. Accordingly thermal output and power output is reduced with respect to the first graphics processor since it does not do frame generation using its rendering engine, it only uses its display engine to display frames generated by the second graphics processor. | 03-17-2011 |
20110109636 | Method and System for Communicating with External Device Through Processing Unit in Graphics System - The present invention sets forth a method and system for communicating with an external device through a processing unit in a graphics system of a computing device. In one embodiment, the method comprises allocating a first set of memory buffers having a first memory buffer and a second memory buffer in the graphics system based on an identification information of the external device, and invoking a first thread processor of the processing unit of the graphics system to perform services associated with a physical layer according to the identification information of the external device by storing a first data stream received from the external device through an I/O interface of the processing unit of the graphics system in the first memory buffer and retrieving a second data stream from the second memory buffer for transmission to the external device through the I/O interface. | 05-12-2011 |
20110141121 | Parallel Processing for Distance Transforms - Parallel processing for distance transforms is described. In an embodiment a raster scan algorithm is used to compute a distance transform such that each image element of a distance image is assigned a distance value. This distance value is a shortest distance from the image element to the seed region. In an embodiment two threads execute in parallel with a first thread carrying out a forward raster scan over the distance image and a second thread carrying out a backward raster scan over the image. In an example, a thread pauses when a cross-over condition is met until the other thread meets the condition after which both threads continue. In embodiments distances may be computed in Euclidean space or along geodesics defined on a surface. In an example, four threads execute two passes in parallel with each thread carrying out a raster scan over a different quarter of the image. | 06-16-2011 |
20110141122 | DISTRIBUTED STREAM OUTPUT IN A PARALLEL PROCESSING UNIT - A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit. | 06-16-2011 |
20110157192 | Parallel Block Compression With a GPU - Disclosed is a system and method for determining, in parallel on a graphics processing unit, a block compression case which results in a least error to a block. Once determined, the block compression case may be used to compress the block. | 06-30-2011 |
20110157193 | LOAD BALANCING IN A SYSTEM WITH MULTI-GRAPHICS PROCESSORS AND MULTI-DISPLAY SYSTEMS - In typical embodiments a three GPU configuration is provided comprising three discrete video cards, each connected to a standard monitor placed horizontally for a 3× horizontal resolution. In this configuration, depending on the load on each GPU, the vertical split lines are dynamically adjusted. To adjust the load balancing according to these virtual split lines, the rendering clip rectangle of each GPU is adjusted, in order to reduce the number of pixels rendered by the heavily loaded GPU. These split lines define the boundary of the scene to be rendered by each GPU, and, according to some embodiments, may be moved horizontally. Thus for example if a GPU has a more complex rendering clip polygon to render than the other GPUs, the neighboring GPUs may render the rendering clip polygon it displays plus a portion of the rendering clip polygon to be displayed by heavily loaded GPU. The assisting GPUs transmit to the heavily loaded GPU the portion of the rendering clip polygon to be displayed by GPU via the chipset with a peer-to-peer protocol or through a communication bus. The split line is dynamically adjusted after each scene. | 06-30-2011 |
20110169840 | COMPUTING SYSTEM EMPLOYING A MULTI-GPU GRAPHICS PROCESSING AND DISPLAY SUBSYSTEM SUPPORTING SINGLE-GPU NON-PARALLEL (MULTI-THREADING) AND MULTI-GPU APPLICATION-DIVISION PARALLEL MODES OF GRAPHICS PROCESSING OPERATION - A computing system employing a multi-GPU graphics processing and display subsystem supporting single-GPU non-parallel (i.e. multi-tasking) and multi-GPU parallel application-division modes of graphics processing operations, in order to execute graphic commands and process graphics data (GCAD) render pixel-composited images containing graphics for display on a display device during the run-time of the multiple graphics-based applications, while managing and conserving electrical power and graphics processing resources. An automatic mode control module (AMCM) analyzes the application profiles assigned to graphics applications running on the computing system, and automatically controls the mode of operation of the multi-GPU graphics processing and display subsystem during the run-time of the multiple graphics-based applications. | 07-14-2011 |
20110242113 | Method And System For Processing Pixels Utilizing Scoreboarding - In a graphics processing device, a plurality of processors write fragment shading results for order-dependent data to a buffer, according to the order in which the data is received. Fragment shading results for order-neutral data is written to the buffer one batch at a time. The order-dependent data comprises spatially overlapping data. Order-neutral data may not overlap. A scheduler controls the order of reception of one batch of data at a time by the processors. The order for receiving the order-dependent data may be determined. The plurality of processors may process the data in parallel. A writing order for writing results to a buffer from the processing in parallel, may be enforced. A portion of the processors may be instructed to wait before writing results to the buffer in a specified order. Processors signal when writing results to the buffer is complete. | 10-06-2011 |
20110242114 | METHOD AND SYSTEM FOR MINIMIZING AN AMOUNT OF DATA NEEDED TO TEST DATA AGAINST SUBAREA BOUNDARIES IN SPATIALLY COMPOSITED DIGITAL VIDEO - A method and system for minimizing an amount of data needed to test data against subarea boundaries in spatially composited digital video. Spatial compositing uses a graphics unit or pipeline to render a portion (subarea) of each overall frame of digital video images. This reduces the amount of data that each processor must act on and increases the rate at which an overall frame is rendered. Optimization of spatial compositing depends on balancing the processing load among the different pipelines. The processing load typically is a direct function of the size of a given subarea and a function of the rendering complexity for objects within this subarea. Load balancing strives to measure these variables and adjust, from frame to frame, the number, sizes, and positions of the subareas. The cost of this approach is the necessity to communicate, in conjunction with each frame, the graphics data that will be rendered. Graphics data for a frame is composed of geometry chunks. Each geometry chunk is defined by its own bounding region, where the bounding region defines the space the geometry chunk occupies on the compositing window. Only the parameters that define the bounding region are communicated to each graphics unit in conjunction with the determination of which graphics unit will render the geometry chunk defined by the bounding region. The actual graphics data that comprises the geometry chunk is communicated only to those geometry units that will actually render the geometry chunk. This reduces the amount of data needed to communicate graphics data information in spatially composited digital video. | 10-06-2011 |
20110273459 | DEVICE FOR THE PARALLEL PROCESSING OF A DATA STREAM - A device for processing a data stream originating from a device generating matrices of N | 11-10-2011 |
20110273460 | SYSTEM CO-PROCESSOR - Embodiments of the invention provide assigning two different class identifiers to a device to allow loading to an operating system as different devices. The device may be a graphics device. The graphics device may be integrated in various configurations, including but not limited to a central processing unit, chipset and so forth. The processor or chipset may be associated with a first identifier associated with a graphics processor and a second device identifier that enables the processor or chipset as a co-processor. | 11-10-2011 |
20110279462 | METHOD OF AND SUBSYSTEM FOR GRAPHICS PROCESSING IN A PC-LEVEL COMPUTING SYSTEM - A graphics processing subsystem for use in a computing system, including a plurality of GPUs operating according to time division mode of graphics parallelization. At least one of the GPUs is a display-designated GPU that is connectable to a screen for displaying images produced by the graphics processing subsystem, and at least one of the GPUs is a non-display-designated GPU. The subsystem includes a hardware hub having a router, and being located between a CPU of the computing system and the plurality of GPUs. For images to be generated and displayed on the screen, the router directs to the plurality of GPUs successively a stream of geometric data and graphics commands. The geometric data and graphics commands directed to a non-display-designated GPU are processed by the GPU into image pixel data associated with a frame, the image pixel data is then redirected to the router, the image pixel data is then redirected to the display-designated GPU, and the image pixel data is then displayed on the screen. Geometric data and graphics commands directed to the display-designated GPU are processed by the GPU into image pixel data associated with a frame, and the image pixel data is then displayed on the screen. | 11-17-2011 |
20110292056 | PROGRAMMING AND MULTIPROCESSING ENVIRONMENT FOR COMPUTERIZED RECOGNITION - Embodiments of the present invention are directed to techniques for providing an environment for the efficient execution of recognition tasks. A novel environment is provided which automatically and efficiently executes a recognition program on as many computer processors as available. This program, deconstructed into separate tasks, may be executed by constructing a dependency network from known inputs and outputs of the tasks, applying project planning methods for scheduling these tasks into multiple processing threads, and dynamically assigning tasks within these threads to processors. Therefore, an efficient schedule of tasks to complete a recognition program can be created and executed automatically, for any type of recognition problem. The system will not only allow for the ability to leverage multiple processors for efficiently generating variable and customizable automatically created schedules, but will also still maintain the flexibility to use serial programming in recognition algorithms for individual objects, properties, or features. | 12-01-2011 |
20110316863 | INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD - A memory section provides an input buffer capable of holding image data being a processing target of each processing by an image processing unit, and an output buffer capable of holding image data being a processing result. Through an input section, a user selects a plurality of kinds of processing to be executed by the image processing unit, and an execution sequence of the plurality of kinds of processing. A controller section reserves, based on information selected by a user through the input section, an input buffer and an output buffer for each processing in the memory section, sets an input-output connection relation between the buffers, and notifies, based on the set connection relation, the image processing unit of address information of the input buffer in the memory section and the output buffer for each processing sequentially executed by the image processing unit. | 12-29-2011 |
20120019541 | Multi-Primitive System - Disclosed herein is a vertex core. The vertex core includes a grouper module configured to process two or more primitives during one clock period and two or more vertex translators configured to respectively receive the two or more processed primitives in parallel. | 01-26-2012 |
20120056892 | COMPUTER-AIDED PARALLELIZING OF COMPUTATION GRAPHS - An approach to automatically specifying, or assisting with the specification of, a parallel computation graph involves determining data processing characteristics of the linking elements that couple data processing elements of the graph. The characteristics of the linking elements are determined according to the characteristics of the upstream and/or downstream data processing elements associated with the linking element, for example, to enable computation by the parallel computation graph that is equivalent to computation of an associated serial graph. | 03-08-2012 |
20120092351 | FACILITATING ATOMIC SWITCHING OF GRAPHICS-PROCESSING UNITS - The disclosed embodiments provide a system that configures a computer system to switch between two graphics-processing units (GPUs). During operation, the system receives a request to switch from using a first GPU to using a second GPU to drive the display. In response to this request, the system executes a user thread that copies pixel values from a first framebuffer for the first GPU to a second framebuffer for the second GPU. Next, the user thread initiates a switch from the first framebuffer to the second framebuffer as a signal source for driving the display. Finally, the user thread sends an asynchronous notification of the switch to one or more applications, wherein the asynchronous notification allows the applications to transition from rendering graphics using the first GPU to rendering graphics using the second GPU. | 04-19-2012 |
20120127182 | PARALLEL PROCESSING OF PIXEL DATA - One or more techniques and/or systems are disclosed for processing vector-based information for an image. From a set of pixels that comprises the image, a first subset of one or more pixels that are used in a raster representation of an element in the image, such as pixel values used to render the image, is identified. A first operation is performed in parallel for the respective one or more pixels in the first subset, such as by evaluating a batched first subset of pixels using stacked instruction for the first operation. The first operation comprises instructions for at least a first portion of a function for generating an image pixel value used to represent the element in the image. | 05-24-2012 |
20120147016 | IMAGE PROCESSING DEVICE AND IMAGE PROCESSING METHOD - Disclosed are an image processing device and an image processing method which achieve an increase in the speed of image processing by designating and operating a plurality of image processing units each corresponding to a specific function for the image processing in accordance with a program. A frame memory ( | 06-14-2012 |
20120162235 | EXECUTION OF REAL TIME APPLICATIONS WITH AN AUTOMATION CONTROLLER - A method and system are provided for performing the computational execution of automation tasks with automation devices by combining one or more central processing units (CPU) and one or more Graphics Processing Units (GPU). The control tasks and/or control algorithms are executed by the single-core or multi-core control unit (CPU) and a multi-core-graphics processor (GPU) or both in parallel at the same time. | 06-28-2012 |
20120194528 | Method and System for Context Switching - Embodiments of the present invention provide a method of preempting a task. The method includes removing the task from the parallel processors via a scheduling mechanism. Responsive to the removing, the method also includes ceasing (i) retrieval of commands from a buffer associated with the task, (ii) dispatch of groups of work-items associated with the task, (iii) dispatch of wavefronts associated with the task, and (iiii) execution of the wavefronts. State information related to the task is saved. | 08-02-2012 |
20120200580 | SYNCHRONOUS PARALLEL PIXEL PROCESSING FOR SCALABLE COLOR REPRODUCTION SYSTEMS - What is disclosed is a novel system and method for parallel processing of intra-image data in a distributed computing environment. A generic architecture and method are presented which collectively facilitate image segmentation and block sorting and merging operations with a certain level of synchronization in a parallel image processing environment which has been traditionally difficult to parallelize. The present system and method enables pixel-level processing at higher speeds thus making it a viable service for a print/copy job document reproduction environment. The teachings hereof have been simulated on a cloud-based computing environment with a demonstrable increase of ≈2× with nominal 8-way parallelism, and an increase of ≈20×-100× on a graphics processor. In addition to production and office scenarios where intra-image processing are likely to be performed, these teachings are applicable to other domains where high-speed video and audio processing are desirable. | 08-09-2012 |
20120249560 | PARALLEL COMPUTATION OF MATRIX PROBLEMS - In order to perform computation concerning a large sparse matrix of values, a computer stores in its memory the nonzero values of each row and as many null or preferably zero values as are required to make up a predetermined number of stored values for each row. Associated column indices are also stored. Storage in this format can be carried out by massively parallel processing using a graphics processing unit. The format is acceptable input for programs written to expect input in conventional compressed sparse row format yet avoids the constraints which enforce serial processing in order to store in that conventional format. | 10-04-2012 |
20120262465 | Parallel Image Processing System - System and method for a parallel image processing mechanism for applying mask data patterns to substrate in a lithography manufacturing process are disclosed. In one embodiment, the parallel image processing system includes a graphics engine configured to partition an object into a plurality of trapezoids and form an edge list for representing each of the plurality of trapezoids, and a distributor configured to receive the edge list from the graphics engine and distribute the edge list to a plurality of scan line image processing units. The system further includes a sentinel configured to synchronize operations of the plurality of scan line image processing units, and a plurality of buffers configured to store image data from corresponding scan line image processing units and outputs the stored image data using the sentinel. | 10-18-2012 |
20120268469 | Parallel Entropy Encoding On GPU - An invention is disclosed for performing entropy encoding in a parallelized manner, using a GPU. In embodiments, an input sequence of integers is received, and run-length encoding is performed on any runs of zeros in parallel operations on the GPU. Then, a plurality of parallelized operations are performed on the run-length encoded sequence to entropy encode the sequence. The value N may be entropy encoded using only N and the value that precedes it in the sequence, N−1, so the encoding may be sub-divided into multiple operations that may be performed in parallel on the GPU. After entropy encoding is performed, a bitstream may be produced using parallelized operations on the GPU. | 10-25-2012 |
20120320069 | METHOD AND APPARATUS FOR TILE BASED RENDERING USING TILE-TO-TILE LOCALITY - Disclosed is a method and apparatus for performing tile-based rendering. A sequence of tiles to be processed may be determined based on a locality among the tiles. A tile dispatch unit selects a subsequent tile to be dispatched, based on the determined sequence. The tile dispatch unit may check whether an idle fragment processor exists among the plurality of fragment processors, and may dynamically dispatch the selected tile to an idle fragment processor | 12-20-2012 |
20130038616 | Dataport and Methods Thereof - A context-free (stateless) dataport may allow multiple processors to perform read and write operations on a shared memory. The operations may include, for example, structured data operations such as image and video operations. The dataport may perform addressing computations associated with block memory operations. Therefore, the dataport may be able, for example, to relieve the processors that it serves from this duty. The dataport may be accessed using a message interface that may be implemented in a standard and generalized manner and that may therefore be easily transportable between different types of processors. | 02-14-2013 |
20130063452 | CAPTURING SCREEN DISPLAYS IN VIDEO MEMORY AND DETECTING RENDER ARTIFACTS - Image data is captured from a specified area of a rendered screen display from the video memory for a number of frames. The image data can be captured in another area of video memory, enabling a video memory to video memory copy to be performed, thus bypassing system memory. This captured image data can be synchronized with event trace data, or other metadata from the operating system, associated with the application. Analysis tools can read and analyze the captured image data in real time to detect and report render artifacts. A graphics processing unit can implement the analysis and operate on the image data directly in the video memory. Such analysis can include a statistical analysis of the images in a sequence of screen captures to identify outliers in the sequence. These outliers have render artifacts. | 03-14-2013 |
20130069960 | MULTISTAGE COLLECTOR FOR OUTPUTS IN MULTIPROCESSOR SYSTEMS - Aspects include a multistage collector to receive outputs from plural processing elements. Processing elements may comprise (each or collectively) a plurality of clusters, with one or more ALUs that may perform SIMD operations on a data vector and produce outputs according to the instruction stream being used to configure the ALU(s). The multistage collector includes substituent components each with at least one input queue, a memory, a packing unit, and an output queue; these components can be sized to process groups of input elements of a given size, and can have multiple input queues and a single output queue. Some components couple to receive outputs from the ALUs and others receive outputs from other components. Ultimately, the multistage collector can output groupings of input elements. Each grouping of elements (e.g., at input queues, or stored in the memories of component) can be formed based on matching of index elements. | 03-21-2013 |
20130106871 | DMA CONTROL OF A DYNAMICALLY RECONFIGURABLE PIPELINED PRE-PROCESSOR | 05-02-2013 |
20130113809 | TECHNIQUE FOR INTER-PROCEDURAL MEMORY ADDRESS SPACE OPTIMIZATION IN GPU COMPUTING COMPILER - A device compiler and linker is configured to optimize program code of a co-processor enabled application by resolving generic memory access operations within that program code to target specific memory spaces. In situations where a generic memory access operation cannot be resolved and may target constant memory, constant variables associated with those generic memory access operations are transferred to reside in global memory. | 05-09-2013 |
20130120410 | MULTI-PASS METHOD OF GENERATING AN IMAGE FRAME OF A 3D SCENE USING AN OBJECT-DIVISION BASED PARALLEL GRAPHICS RENDERING PROCESS - A multi-pass method of generating an image frame of a 3D scene, using a parallel graphics processing system having a plurality of graphics processing pipelines (GPPLs), including a primary GPPL. In the system, each GPPL includes a color frame buffer and Z depth buffer, and the GPPLs support an object-division based parallel graphics rendering process, in which the 3D scene is decomposed into objects that are assigned to particular GPPLs for processing. The multi-pass method involves, during a first pass, providing a Global Data Map (GDM) to the Z depth buffer of each GPPL. This step involves the transmission of graphics commands and data for all objects in the frame, to all GPPLs to be rendered. Then, during subsequent passes, a complementary-type partial image is generated within the color buffer of each GPPL using the GDM and a Z test filter supported by the Z depth buffer, and transmitting graphics commands and data of objects in the image frame, to only assigned GPPLs. After subsequent passes are performed, a complete color image is recomposited within the primary GPPL, using the complementary-type partial images stored in the color buffers of the GPPLs, without comparing depth values in the Z depth buffers. | 05-16-2013 |
20130120411 | ASYNCHRONOUS NOTIFICATIONS FOR CONCURRENT GRAPHICS OPERATIONS - A method and an apparatus for notifying a display driver to update a display with a graphics frame including multiple graphics data rendered separately by multiple graphics processing units (GPUs) substantially concurrently are described. Graphics commands may be received to dispatch to each GPU for rendering corresponding graphics data. The display driver may be notified when each graphics data has been completely rendered respectively by the corresponding GPU. | 05-16-2013 |
20130141443 | SOFTWARE LIBRARIES FOR HETEROGENEOUS PARALLEL PROCESSING PLATFORMS - Systems, methods, and media for providing libraries within an OpenCL framework. Library source code is compiled into an intermediate representation and distributed to an end-user computing system. The computing system typically includes a CPU and one or more GPUs. The CPU compiles the intermediate representation of the library into an executable binary targeted to run on the GPUs. The CPU executes a host application, which invokes a kernel from the binary. The CPU retrieves the kernel from the binary and conveys the kernel to a GPU for execution. | 06-06-2013 |
20130141444 | GPU ENABLED DATABASE SYSTEMS - Methods for resolving a number of in-memory issues associated with parallel query execution of a database operation on a database utilizing a graphics processing unit (GPU) are presented including: tying a table choice to a number of accesses per second made to a table; and synchronizing threads in a same shared GPU multiprocessor to avoid compromising concurrency, and where the parallel query execution of the database operation is performed solely by the GPU. In some embodiments, methods further include storing data from the GPU to a disk to solve volatility; and enabling a user, at any time, to query the amount of memory being used by the table created by the user to monitor memory consumption. | 06-06-2013 |
20130176320 | MACHINE PROCESSOR - Disclosed are machine processors and methods performed thereby. The processor has access to processing units for performing data processing and to libraries. Functions in the libraries are implementable to perform parallel processing and graphics processing. The processor may be configured to acquire (e.g., to download from a web server) a download script, possibly with extensions specifying bindings to library functions. Running the script may cause the processor to create, for each processing unit, contexts in which functions may be run, and to run, on the processing units and within a respective context, a portion of the download script. Running the script may also cause the processor to create, for a processing unit, a memory object, transfer data into that memory object, and transfer data back to the processor in such a way that a memory address of the data in the memory object is not returned to the processor. | 07-11-2013 |
20130187935 | LOW LATENCY CONCURRENT COMPUTATION - One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks. | 07-25-2013 |
20130207983 | CENTRAL PROCESSING UNIT, GPU SIMULATION METHOD THEREOF, AND COMPUTING SYSTEM INCLUDING THE SAME - A central processing unit (CPU) according to embodiments of the inventive concept may include an upper core allocated with a main thread and a plurality of lower cores, each of the plurality of the lower cores being allocated with at least one worker thread. The worker thread may perform simulation operations on operation units of a graphic processing unit (GPU) to generate simulation data, and the main thread may generate synchronization data based on the generated simulation data. | 08-15-2013 |
20130207984 | First And Second Software Stacks And Discrete And Integrated Graphics Processing Units - A first software stack and a second software stack are run in a virtual environment. The virtual environment may be created by a hardware virtualizer. The hardware virtualizer may send the first software stack to the discrete graphics processing unit and the second software stack to the integrated graphics processing unit. | 08-15-2013 |
20130235049 | FULLY PARALLEL IN-PLACE CONSTRUCTION OF 3D ACCELERATION STRUCTURES AND BOUNDING VOLUME HIERARCHIES IN A GRAPHICS PROCESSING UNIT - A non-transitory computer-readable storage medium having computer-executable instructions for causing a computer system to perform a method for constructing bounding volume hierarchies from binary trees is disclosed. The method includes providing a binary tree including a plurality of leaf nodes and a plurality of internal nodes. Each of the plurality of internal nodes is uniquely associated with two child nodes, wherein each child node comprises either an internal node or leaf node. The method also includes determining a plurality of bounding volumes for nodes in the binary tree by traversing the binary tree from the plurality of leaf nodes upwards toward a root node, wherein each parent node is processed once by a later arriving corresponding child node. | 09-12-2013 |
20130235050 | FULLY PARALLEL CONSTRUCTION OF K-D TREES, OCTREES, AND QUADTREES IN A GRAPHICS PROCESSING UNIT - A non-transitory computer-readable storage medium having computer-executable instructions for causing a computer system to perform a method for constructing k-d trees, octrees, and quadtrees from radix trees is disclosed. The method includes assigning a Morton code for each of a plurality of primitives corresponding to leaf nodes of a binary radix tree, and sorting the plurality of Morton codes. The method includes building a radix tree requiring at most a linear amount of temporary storage with respect to the leaf nodes, wherein an internal node is built in parallel with one or more of its ancestor nodes. The method includes, partitioning the plurality of Morton codes for each node of the radix tree into categories based on a corresponding highest differing bit to build a k-d tree. A number of octree or quadtree nodes is determined for each node of the k-d tree. A total number of nodes in the octree or quadtree is determined, allocated and output. | 09-12-2013 |
20130241941 | STATIC VERIFICATION OF PARALLEL PROGRAM CODE - A symbolic encoding of predicated execution for static verification, based on a plurality of data parallel program instructions, is obtained. A result of static verification of one or more attributes associated with the plurality of data parallel program instructions is obtained, based on the symbolic encoding. | 09-19-2013 |
20130257882 | IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND RECORDING MEDIUM ON WHICH AN IMAGE PROCESSING PROGRAM IS RECORDED - An image processing device, in a case in which an image processing module, uses in image processing a processor that is different than a processor used in image processing by an image processing module of a preceding stage, is connected at a subsequent stage, carries out transfer processing that transfers image data, that has been written into a buffer by the image processing module of the preceding stage, to a buffer for transfer that is reserved in a memory space corresponding to the processor that the image processing module of the subsequent stage uses in image processing, and carries out processing that causes the image processing module of the subsequent stage to read-out the image data transferred to the buffer for transfer. | 10-03-2013 |
20130342547 | EARLY SAMPLE EVALUATION DURING COARSE RASTERIZATION - A technique for early sample evaluation during coarse rasterization of primitives reduces the number of pixel tiles that are processed during fine rasterization of the primitive. A primitive bounding box determines when a primitive is small and may not actually cover any samples within at least one fine raster tile. Early sample evaluation is performed for the small primitive during coarse rasterization and the small primitive is discarded when no samples are actually covered by the small primitive. When the small primitive lies on a boundary between at least two fine raster tiles, early sample evaluation is performed during coarse rasterization to correctly identify which, if any, of the at least two fine raster tiles includes samples that are actually covered by the small primitive. | 12-26-2013 |
20140035937 | SAVING AND LOADING GRAPHICAL PROCESSING UNIT (GPU) ARRAYS PROVIDING HIGH COMPUTATIONAL CAPABILITIES IN A COMPUTING ENVIRONMENT - A device receives, via a technical computing environment, a program that includes a parallel construct and a command to be executed by graphical processing units, and analyzes the program. The device also creates, based on the parallel construct and the analysis, one or more instances of the command to be executed in parallel by the graphical processing units, and transforms, via the technical computing environment, the one or more command instances into one or more command instances that are executable by the graphical processing units. The device further allocates the one or more transformed command instances to the graphical processing units for parallel execution, and receives, from the graphical processing units, one or more results associated with parallel execution of the one or more transformed command instances by the graphical processing units. | 02-06-2014 |
20140043345 | RENDERING PROCESSING APPARATUS AND METHOD USING MULTIPROCESSING - A rendering processing apparatus and method using multiprocessing are disclosed. The rendering processing method includes dividing an application execution window into frames and generating a rendering processing command for rendering processing of an image on a frame basis by a pre-rendering manager, generating a rendering image for a frame according to the generated rendering processing command by a rendering manager, and storing the generated rendering image in a memory. A task for generating a rendering processing command is divided into at least one task, a task for generating a rendering image is divided into at least one task, and the divided tasks can be processed simultaneously in a plurality of threads. | 02-13-2014 |
20140043346 | RENDERING PROCESSING APPARATUS AND METHOD USING MULTIPROCESSING - A rendering processing apparatus and method using multiprocessing are disclosed. The rendering processing method includes dividing an application execution window into frames and generating a rendering processing command for rendering processing of an image on a frame basis by a pre-rendering manager, generating a rendering image on a frame basis according to the rendering processing command by a rendering manager, and storing the generated rendering image in a memory. The generation of a rendering processing command and the generation of a rendering image are performed in a plurality of threads. | 02-13-2014 |
20140078156 | Work Distribution for Higher Primitive Rates - A system, method and a computer program product are provided for distributing prim groups for parallel processing in a single clock cycle. A work distributor divides a draw call for primitive processing into a plurality of prim groups according to a prim group size. The work distributor then distributes the plurality of prim groups to a plurality of shader engines for parallel processing of the plurality of prim groups during a clock cycle. The size of a prim group and a number of prim groups are scaled to the plurality of shader engines. | 03-20-2014 |
20140078157 | INFORMATION PROCESSING APPARATUS AND PARALLEL PROCESSING METHOD - According to one embodiment, an information processing apparatus includes a stage determination module, a score calculator and a pass window determination module. The stage determination module determines a process-target stage or process-target stages from plural stages, each of the plural stages rejecting a window of windows set on an image, wherein the rejected window does not include a target object. The score calculator calculates in parallel, scores of the windows in the process-target stages when the process-target stages have been determined. The pass determination module determines in parallel, pass or rejection of a window of the windows, based on two or more scores of the window in the process-target stages. | 03-20-2014 |
20140092105 | SPATIAL LIGHT MODULATOR WITH MASKING-COMPARATORS - Described is a device comprising a spatial light modulator comprising a plurality of comparators for computing a respective drive for each pixel of a plurality of pixels. | 04-03-2014 |
20140098113 | NETWORK-ENABLED GRAPHICS PROCESSING UNIT - The present invention provides an apparatus that includes a network-enabled graphics processing unit. In one embodiment, the apparatus includes integrated circuit that includes a graphics processing element, a media fragmentation engine, and a network interface controller for conveying packets to or from the integrated circuit. The media fragmentation engine translates between a packet format used by the network interface and a graphics format used by the graphics processing element. | 04-10-2014 |
20140118362 | BARRIER COMMANDS IN A CACHE TILING ARCHITECTURE - One embodiment of the present invention includes a graphics subsystem. The graphics subsystem includes a first processing entity and a second processing entity. Both the first processing entity and the second processing entity are configured to receive first and second batches of primitives, and a barrier command in between the first and second batches of primitives. The barrier command may be either a tiled or a non-tiled barrier command. A tiled barrier command is transmitted through the graphics subsystem for each cache tile. A non-tiled barrier command is transmitted through the graphics subsystem only once. The barrier command causes work that is after the barrier command to stop at a barrier point until a release signal is received. The back-end unit transmits a release signal to both processing entities after the first batch of primitives has been processed by both the first processing entity and the second processing entity. | 05-01-2014 |
20140118363 | MANAGING DEFERRED CONTEXTS IN A CACHE TILING ARCHITECTURE - A method for managing bind-render-target commands in a tile-based architecture. The method includes receiving a requested set of bound render targets and a draw command. The method also includes, upon receiving the draw command, determining whether a current set of bound render targets includes each of the render targets identified in the requested set. The method further includes, if the current set does not include each render target identified in the requested set, then issuing a flush-tiling-unit-command to a parallel processing subsystem, modifying the current set to include each render target identified in the requested set, and issuing bind-render-target commands identifying the requested set to the tile-based architecture for processing. The method further includes, if the current set of render targets includes each render target identified in the requested set, then not issuing the flush-tiling-unit-command. | 05-01-2014 |
20140118364 | DISTRIBUTED TILED CACHING - One embodiment of the present invention sets forth a graphics subsystem configured to implement distributed cache tiling. The graphics subsystem includes one or more world-space pipelines, one or more screen-space pipelines, one or more tiling units, and a crossbar unit. Each world-space pipeline is implemented in a different processing entity and is coupled to a different tiling unit. Each screen-space pipeline is implemented in a different processing entity and is coupled to the crossbar unit. The tiling units are configured to receive primitives from the world-space pipelines, generate cache tile batches based on the primitives, and transmit the primitives to the screen-space pipelines. One advantage of the disclosed approach is that primitives are processed in application-programming-interface order in a highly parallel tiling architecture. Another advantage is that primitives are processed in cache tile order, which reduces memory bandwidth consumption and improves cache memory utilization. | 05-01-2014 |
20140125681 | METHOD AND APPARATUS FOR ENABLING PARALLEL PROCESSING OF PIXELS IN AN IMAGE - A method, non-transitory computer readable medium, and apparatus for enabling parallel processing of pixels in an image are disclosed. For example, the method performs, via a multiple core processor, a one-dimensional error diffusion on the pixels in the image to reduce a number of bits per pixel to a value lower than an initial number of bits per pixel and greater than one, and performs a two-dimensional error diffusion on the pixels in the image that have undergone the one-dimensional error diffusion, to reduce the number of bits per pixel to one bit per pixel. | 05-08-2014 |
20140125682 | METHOD OF DYNAMIC LOAD-BALANCING WITHIN A PC-BASED COMPUTING SYSTEM EMPLOYING A MULTIPLE GPU-BASED GRAPHICS PIPELINE ARCHITECTURE SUPPORTING MULTIPLE MODES OF GPU PARALLELIZATION - A hub mechanism for use in a multiple graphics processing unit (GPU) system includes a hub routing unit positioned on a bus between a controller unit and multiple GPUs. The hub mechanism is used for routing data and commands over a graphic pipeline between a user interface and one or more display units. The hub mechanism also includes a hub driver for issuing commands for controlling the hub routing unit. | 05-08-2014 |
20140125683 | Automated Latency Management And Cross-Communication Exchange Conversion - A system and method for communication in a parallel computing system is applied to a system having multiple processing units, each processing unit including processor(s), memory, and a network interface, where the network interface is adapted to support virtual connections. The memory has at least a portion of a parallel processing application program and a parallel processing operating system. The system has a network fabric between processing units. The method involves identifying need for communication by the first processing unit with a group of processing units, creating virtual connections between the processing units, and transferring data between the first processing units. | 05-08-2014 |
20140132612 | BOOT DISPLAY DEVICE DETECTION AND SELECTION TECHNIQUES IN MULTI-GPU DEVICES - Techniques for selecting a boot display device in the multi-GPU configured computing device include a graphic initialization routine for determining a topology of a plurality of GPUs. It is then determined if a display is coupled to any of the plurality of GPUs. The determination of whether the display is coupled to a GPU is communicated to the other of the plurality of GPUs based upon the determined topology. Thereafter, selection of a given GPU as a primary boot device, by a system initialization routine, is influenced by representing each GPU not coupled to the display as a graphics device and the GPUs coupled to a given display as the primary boot device if one or more displays are coupled to GPUs, and by representing the given GPU as the primary boot device and all other GPUs as graphics devices when the display is not coupled to any of the GPUs. In addition or in the alternative selection of the given GPU as the primary boot device may be influenced by hiding the expansion ROM of GPUs not coupled to a display. | 05-15-2014 |
20140160135 | Memory Cell Array with Dedicated Nanoprocessors - A processing architecture uses stationary operands and opcodes common on a plurality of processors. Only data moves through the processors. The same opcode and operand is used by each processor assigned to operate, for example, on one row of pixels, one row of numbers, or one row of points in space. | 06-12-2014 |
20140168228 | FINE-GRAINED PARALLEL TRAVERSAL FOR RAY TRACING - Techniques are disclosed for tracing a ray within a parallel processing unit. A first thread receives a ray or a ray segment for tracing and identifies a first node within an acceleration structure associated with the ray, where the first node is associated with a volume of space traversed by the ray. The thread identifies the child nodes of the first node, where each child node is associated with a different sub-volume of space, and each sub-volume is associated with a corresponding ray segment. The thread determines that two or more nodes are associated with sub-volumes of space that intersect the ray segment. The thread selects one of these nodes for processing by the first thread and another for processing by a second thread. One advantage of the disclosed technique is that the threads in a thread group perform ray tracing more efficiently in that idle time is reduced. | 06-19-2014 |
20140168229 | CPU-GPU PARALLELIZATION - Embodiments described herein relate to improving throughput of a CPU and a GPU working in conjunction to render graphics. Time frames for executing CPU and GPU work units are synchronized with a refresh rate of a display. Pending CPU work is performed when a time frame starts (a vsync occurs). When a prior GPU work unit is still executing on the GPU, then a parallel mode is entered. In the parallel mode, some GPU work and some CPU work is performed concurrently. When the parallel mode is exited, for example when there is no CPU work to perform, the parallel mode may be exited. | 06-19-2014 |
20140168230 | ASYNCHRONOUS COMPUTE INTEGRATED INTO LARGE-SCALE DATA RENDERING USING DEDICATED, SEPARATE COMPUTING AND RENDERING CLUSTERS - An asynchronous computing and rendering system includes a data storage unit that provides storage for processing a large-scale data set organized in accordance to data subregions and a computing cluster containing a parallel plurality of asynchronous computing machines that provide compute results based on the data subregions. The asynchronous computing and rendering system also includes a rendering cluster containing a parallel multiplicity of asynchronous rendering machines coupled to the asynchronous computing machines, wherein each rendering machine renders a subset of the data subregions. Additionally, the asynchronous computing and rendering system includes a data interpretation platform coupled to the asynchronous rendering machines that provides user interaction and rendered viewing capabilities for the large-scale data set. An asynchronous computing and rendering method is also provided. | 06-19-2014 |
20140176574 | Method and Apparatus for Interprocessor Communication Employing Modular Space Division - Novel method and system for distributed database ray-tracing is presented, based on modular mapping of scene-data among processors. Its inherent properties include scattering data among processors for improved load balancing, and matching between geographical proximity in the scene with communication proximity between processors. High utilization is enabled by unique mechanism of cache sharing. The resulting improved performance enables deep level of ray tracing for real time applications. | 06-26-2014 |
20140176575 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR TILED DEFERRED SHADING - A system, method, and computer program product are provided for tiled deferred shading. In operation, a plurality of photons associated with at least one scene are identified. Further, a plurality of screen-space tiles associated with the at least one scene are identified. Additionally, each of the plurality of screen-space tiles capable of being affected by a projection of an effect sphere for each of the plurality of photons are identified. Furthermore, at least a subset of photons associated with each of the screen-space tiles from which to compute shading are selected. Moreover, shading for the at least one scene is computed utilizing the selected at least a subset of photons. | 06-26-2014 |
20140176576 | SYSTEM AND METHOD FOR GRAPHICAL PROCESSING OF MEDICAL DATA - The invention provides a computer server with a graphical processer that can process data from multiple medical imaging systems simultaneously. Data sets can be provided by any suitable imaging system (x-ray, angiography, PET scans, MRI, IVUS, OCT, cath labs, etc.) and a processing system of the invention allocates resources in the form of a virtual machine, processing power, operating system, applications, etc., as-needed. Embodiments of the invention may find particular application with cath labs due to the particular processing requirements of typical cath lab systems. | 06-26-2014 |
20140184616 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR IDENTIFYING A FAULTY PROCESSING UNIT - A system, process, and computer program product are provided for identifying a faulty processing unit. A shader program that configures a plurality of processing units to generate data is executed and the data is compared with verification data to produce a test result. The test result is examined to identify a faulty processing unit of the plurality of processing units, where a unique identifier corresponding to each processing unit is encoded into the data generated by the respective processing unit. | 07-03-2014 |
20140192065 | Parallel Image Processing System - System and method for a parallel image processing mechanism for applying mask data patterns to substrate in a lithography manufacturing process are disclosed. In one embodiment, the parallel image processing system includes a graphics engine configured to partition an object into a plurality of trapezoids and form an edge list for representing each of the plurality of trapezoids, and a distributor configured to receive the edge list from the graphics engine and distribute the edge list to a plurality of scan line image processing units. The system further includes a sentinel configured to synchronize operations of the plurality of scan line image processing units, and a plurality of buffers configured to store image data from corresponding scan line image processing units and outputs the stored image data using the sentinel. | 07-10-2014 |
20140240327 | FINE-GRAINED CPU-GPU SYNCHRONIZATION USING FULL/EMPTY BITS - A heterogeneous computing system includes a central processing unit (CPU) and a graphics processing unit (GPU). The CPU and the GPU are synchronized using a data-based synchronization scheme, wherein offloading of a kernel from the CPU to the GPU is coordinated based upon the data associated with the kernel transferred between the CPU and the GPU. By using a data-based synchronization scheme, additional synchronization operations between the CPU and the GPU are reduced or eliminated, and the overhead of offloading a process from the CPU to the GPU is reduced. | 08-28-2014 |
20140253565 | System on Chip Having Processing and Graphics Units - System on chip comprising a general purpose processing element, a graphics processing unit and a display interface, supporting graphics visualization on mobile computing devices and on embedded systems. | 09-11-2014 |
20140292774 | SYSTEM AND METHOD FOR PERFORMING SAMPLE-BASED RENDERING IN A PARALLEL PROCESSOR - A processing system, a method of carrying out sample-based rendering (such as true or quasi-Monte Carlo rendering) in a multi- or many-core processor processing system and a graphics processing unit (GPU) incorporating the processing system or the method. In one embodiment, the processing system includes: (1) a sample-space distributor operable to distribute a first subset of samples for a pixel of an image to a first compute core for sample-based rendering therewith and a second subset of samples for the pixel to a second compute core for the sample-based rendering therewith, the second subset differing from the first subset and (2) a sample-space combiner associated with the sample-space distributor and operable to combine results of the sample-based rendering. | 10-02-2014 |
20140292775 | SILICON CHIP OF A MONOLITHIC CONSTRUCTION FOR USE IN IMPLEMENTING MULTIPLE GRAPHIC CORES IN A GRAPHICS PROCESSING AND DISPLAY SUBSYSTEM - A graphics processing chip includes multiple graphics pipeline cores and multi-pipeline core logic circuitry to process graphic data streams received from a processor and to drive multiple GPUs on the multiple graphics pipeline cores. | 10-02-2014 |
20140327682 | REDUCING THE NUMBER OF IO REQUESTS TO MEMORY WHEN EXECUTING A PROGRAM THAT ITERATIVELY PROCESSES CONTIGUOUS DATA - Methods and apparatuses to reduce the number of IO requests to memory when executing a program that iteratively processes contiguous data are provided. A first set of data elements may be loaded in a first register and a second set of data elements may be loaded in a second register. The first set of data elements and the second set of data elements can be used during the execution of a program to iteratively process the data elements. For each of a plurality of iterations, a corresponding set of data elements to be used during the execution of an operation for the iteration may be selected from the first set of data elements stored in the first register and the second set of data elements stored in the second register. In this way, the same data elements are not re-loaded from memory during each iteration. | 11-06-2014 |
20140327683 | Graphics Processor with Non-Blocking Concurrent Architecture - In some aspects, systems and methods provide for forming groupings of a plurality of independently-specified computation workloads, such as graphics processing workloads, and in a specific example, ray tracing workloads. The workloads include a scheduling key, which is one basis on which the groupings can be formed. Workloads grouped together can all execute from the same source of instructions, on one or more different private data elements. Such workloads can recursively instantiate other workloads that reference the same private data elements. In some examples, the scheduling key can be used to identify a data element to be used by all the workloads of a grouping. Memory conflicts to private data elements are handled through scheduling of non-conflicted workloads or specific instructions and/or deferring conflicted workloads instead of locking memory locations. | 11-06-2014 |
20140333635 | HIERARCHICAL HASH TABLES FOR SIMT PROCESSING AND A METHOD OF ESTABLISHING HIERARCHICAL HASH TABLES - A graphical processing unit having an implementation of a hierarchical hash table thereon, a method of establishing a hierarchical hash table in a graphics processing unit and GPU computing system are disclosed herein. In one embodiment, the graphics processing unit includes: (1) a plurality of parallel processors, wherein each of the plurality of parallel processors includes parallel processing cores, a shared memory coupled to each of the parallel processing cores, and registers, wherein each one of the registers is uniquely associated with one of the parallel processing cores and (2) a controller configured to employ at least one of the registers to establish a hierarchical hash table for a key-value pair of a thread processing on one of the parallel processing cores. | 11-13-2014 |
20140333636 | ASYNCHRONOUS NOTIFICATIONS FOR CONCURRENT GRAPHICS OPERATIONS - A method and an apparatus for notifying a display driver to update a display with a graphics frame including multiple graphics data rendered separately by multiple graphics processing units (GPUs) substantially concurrently are described. Graphics commands may be received to dispatch to each GPU for rendering corresponding graphics data. The display driver may be notified when each graphics data has been completely rendered respectively by the corresponding GPU. | 11-13-2014 |
20140340411 | Facilitating Efficient Switching Between Graphics-Processing Units - The disclosed embodiments provide a system that facilitates seamlessly switching between graphics-processing units (GPUs) to drive a display. In one embodiment, the system receives a request to switch from using a first GPU to using a second GPU to drive the display. In response to this request, the system uses a kernel thread which operates in the background to configure the second GPU to prepare the second GPU to drive the display. While the kernel thread is configuring the second GPU, the system continues to drive the display with the first GPU and a user thread continues to execute a window manager which performs operations associated with servicing user requests. When configuration of the second GPU is complete, the system switches the signal source for the display from the first GPU to the second GPU. | 11-20-2014 |
20140347373 | METHOD OF GENERATING TERRAIN MODEL AND DEVICE USING THE SAME - There are provided a method of generating a terrain model and a device using the same. The method of generating a terrain model includes dividing a primitive terrain model into a plurality of partial terrain sections based on a predetermined criterion, assigning the plurality of partial terrain sections to a multiprocessor, and generating a final terrain model by performing a terrain transformation simulation of the plurality of partial terrain sections through parallel processing based on the multiprocessor. Therefore, it is possible to rapidly generate a realistic terrain model. | 11-27-2014 |
20140354656 | MULTI CORE GRAPHIC PROCESSING DEVICE - A multi core graphic processing device includes a first graphic core that processes a first segment of a graphic frame divided into a plurality of segments and generates a first local decision that defines a scene property of the first segment, a second graphic core that processes a second segment of the graphic frame different from the first segment and generates a second local decision that defines a scene property of the second segment, and a global decision unit that receives the first local decision and the second local decision from the first graphic core and the second graphic core, and selects one of the received first local decision and second local decision as a global decision. | 12-04-2014 |
20140368515 | Coalescing Graphics Operations - Techniques for coalescing graphics operations are described. In at least some embodiments, multiple graphics operations can be generated to be applied to a graphical element, such as a graphical user interface (GUI). The graphics operations can be coalesced into a single renderable graphics operation that can be processed and rendered. | 12-18-2014 |
20140368516 | Multi-Processor Graphics Rendering - An operating system that includes an image processing framework as well as a job management layer is provided. The image processing framework is for performing image processing operations and the job management layer is for assigning the image processing operations to multiple concurrent computing resources. The computing resources include several processing units and one or more direct memory access (DMA) channels for concurrently rendering image data and transferring image data between the processing units. | 12-18-2014 |
20150009222 | METHOD AND SYSTEM FOR CLOUD BASED VIRTUALIZED GRAPHICS PROCESSING FOR REMOTE DISPLAYS - An apparatus for providing graphics processing. The apparatus includes a dual CPU socket architecture comprising a first CPU socket and a second CPU socket. The apparatus includes a plurality of GPU boards providing a plurality of GPU processors coupled to the first CPU socket and the second CPU socket, wherein each GPU board comprises two or more of the plurality of GPU processors. The apparatus includes a communication interface coupling the first CPU socket to a first subset of one or more GPU boards and the second CPU socket to a second subset of one or more GPU boards. | 01-08-2015 |
20150022534 | GRAPHICS PROCESSOR WITH ARITHMETIC AND ELEMENTARY FUNCTION UNITS - A graphics processor capable of efficiently performing arithmetic operations and computing elementary functions is described. The graphics processor has at least one arithmetic logic unit (ALU) that can perform arithmetic operations and at least one elementary function unit that can compute elementary functions. The ALU(s) and elementary function unit(s) may be arranged such that they can operate in parallel to improve throughput. The graphics processor may also include fewer elementary function units than ALUs, e.g., four ALUs and a single elementary function unit. The four ALUs may perform an arithmetic operation on (1) four components of an attribute for one pixel or (2) one component of an attribute for four pixels. The single elementary function unit may operate on one component of one pixel at a time. The use of a single elementary function unit may reduce cost while still providing good performance. | 01-22-2015 |
20150035841 | MULTI-THREADED GPU PIPELINE - Techniques are disclosed relating to a multithreaded execution pipeline. In some embodiments, an apparatus is configured to assign a number of threads to an execution pipeline that is an integer multiple of a minimum number of cycles that an execution unit is configured to use to generate an execution result from a given set of input operands. In one embodiment, the apparatus is configured to require strict ordering of the threads. In one embodiment, the apparatus is configured so that the same thread access (e.g., reads and writes) a register file in a given cycle. In one embodiment, the apparatus is configured so that the same thread does not write back an operand and a result to an operand cache in a given cycle. | 02-05-2015 |
20150035842 | DEDICATED VOICE/AUDIO PROCESSING THROUGH A GRAPHICS PROCESSING UNIT (GPU) OF A DATA PROCESSING DEVICE - A method includes providing an input port and/or an output port directly interfaced with a Graphics Processing Unit (GPU) of a data processing device further including a Central Processing Unit (CPU) to enable a corresponding reception of input data and/or rendering of output data therethrough. The method also includes implementing a voice/audio processing engine in the data processing device. Further, the method includes performing voice/audio related processing of the input data received through the input port and/or voice/audio related processing of data in the data processing device to realize the output data based on executing the voice/audio processing engine solely through the GPU. | 02-05-2015 |
20150042665 | GPGPU SYSTEMS AND SERVICES - Graphics processing units (GPUs) deployed in general purpose GPU (GPGPU) units are combined into a GPGPU cluster. Access to the GPGPU cluster is then offered as a service to users who can use their own computers to communicate with the GPGPU cluster. The users develop applications to be run on the cluster and a profiling module tracks the applications' resource utilization and can report it to the user and to a subscription server. The user can examine the report to thereby optimize the application or the cluster's configuration. The subscription server can interpret the report to thereby invoice the user or otherwise govern the users' access to the cluster. | 02-12-2015 |
20150049095 | Method for Handling Virtual Machine Graphics Processing Requests - A method for handling graphics processing requests that includes creating a network communication pipeline for transmitting graphics data between a first virtual machine (VM) and a second VM via corresponding software installed on the first and second VMs, wherein the second VM has access to one or more graphics processing units (GPUs) via a hypervisor, obtaining a graphics processing request and associated unprocessed graphics data generated by the first VM with the software installed on the first VM, and transmitting the unprocessed graphics data to the second VM via the network communication pipeline. The method further including processing the unprocessed graphics data with at least one of the one or more GPUs allocated to the second VM, thereby generating processed graphics data, and transmitting the processed graphics data to the first VM via the network communication pipeline. | 02-19-2015 |
20150054836 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR REDISTRIBUTING A MULTI-SAMPLE PROCESSING WORKLOAD BETWEEN THREADS - A system, method, and computer program product are provided for redistributing multi-sample processing workloads between threads. A workload for a plurality of multi-sample pixels is received and each thread in a parallel thread group is associated with a corresponding multi-sample pixel of the plurality of pixels. The workload is redistributed between the threads in the parallel thread group based on a characteristic of the workload and the workload is processed by the parallel thread group. In one embodiment, the characteristic is rasterized coverage information for the plurality of multi-sample pixels. | 02-26-2015 |
20150070364 | LINK AGGREGATOR FOR AN ELECTRONIC DISPLAY - Video data and auxiliary data may be sent between a processor and a display device via a single cable using a link aggregator. As such, the link aggregator may receive a first parallel signal that may include the video data and a second parallel signal that may include auxiliary data from the processor. The link aggregator may then send the first parallel signal and the second parallel signal as an aggregated signal to the display device. Upon receiving the aggregated signal at the display device, the link aggregator may de-aggregate the aggregated signal into the first parallel signal and the second parallel signal. The link aggregator may then send the first parallel signal and the second parallel signal to a timing controller of the display device, such that the timing controller may display the video data using the display device. | 03-12-2015 |
20150091912 | INDEPENDENT MEMORY HEAPS FOR SCALABLE LINK INTERFACE TECHNOLOGY - A method to render graphics on a computer system having a plurality of graphics-processing units (GPUs) includes the acts of instantiating an independent physical-memory allocator for each GPU, receiving a physical-memory allocation request from a graphics-driver process, and passing the request to one of the independent physical-memory allocators. The method also includes creating a local physical-memory descriptor to reference physical memory on the GPU associated with that physical-memory allocator, assigning a physical-memory handle to the local physical-memory descriptor, and returning the physical-memory handle to the graphics-driver process to fulfill a subsequent memory-map request from the graphics-driver process. | 04-02-2015 |
20150097844 | SPLIT DRIVER TO CONTROL MULTIPLE GRAPHICS PROCESSORS IN A COMPUTER SYSTEM - A computer system includes an operating system having a kernel and configured to launch a plurality of computing processes. The system also includes a plurality of graphics processing units (GPUs), a front-end driver module, and a plurality of back-end driver modules. The GPUs are configured to execute instructions on behalf of the computing processes subject to a GPU service request. The front-end driver module is loaded into the kernel and configured to receive the GPU service request from one of the computing processes. Each back-end driver module is associated with one or more of the GPUs and configured to receive the GPU service request from the front-end driver module and pass the GPU service request to an associated GPU. | 04-09-2015 |
20150097845 | HEURISTICS FOR IMPROVING PERFORMANCE IN A TILE-BASED ARCHITECTURE - One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from a world-space pipeline, and transmitting the first plurality of graphics primitives to a screen-space pipeline for processing while a tiling function is enabled. The technique further includes storing, in the buffer, a second plurality of graphics primitives and a second plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the tiling function should be disabled and that the second plurality of graphics primitives should be flushed from the buffer, and transmitting the second plurality of graphics primitives to the screen-space pipeline for processing while the tiling function is disabled. | 04-09-2015 |
20150109310 | GPU BASED PARALLEL IMAGE PROCESSING AT THIN CLIENT - Disclosed herein is a computing device that includes: a processor; a graphic processing unit having N graphic processing cores, N being an integer greater than 1; a random access memory (RAM); a video port; a non-volatile memory, and a display processing unit. The non-volatile memory stores a virtual desktop client (VDC). The VDC can communicate with a first virtual machine (VM) of a hypervisor running on a remote computing device and receive an encoded image frame from the first VM; instruct the plurality of graphic processing cores to decode the encoded image frame in parallel; and generate a decoded image frame of the encoded image frame. The display processing unit can generate display signals representing the decoded image frame and transmit the display signals to the video port. | 04-23-2015 |
20150145871 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT TO ENABLE THE YIELDING OF THREADS IN A GRAPHICS PROCESSING UNIT TO TRANSFER CONTROL TO A HOST PROCESSOR - A method, system, and computer-program product are provided to enable the yielding by threads executing in a processing unit to transfer control to a host processor. The method includes the steps of receiving an intermediate representation of a program, replacing a yield instruction in the intermediate representation with a yield operation that includes one or more instructions, and compiling at least a portion of the modified intermediate representation into a machine code for execution on a parallel processing unit. | 05-28-2015 |
20150145872 | SCHEDULING, INTERPRETING AND RASTERISING TASKS IN A MULTI-THREADED RASTER IMAGE PROCESSOR - A method of rasterising a document using a plurality of threads interprets objects of the document by performing interpreting tasks associated with the objects. Objects associated with different pages are interpreted in parallel. A plurality of rasterising tasks associated with the performed interpreting tasks are established, each performed interpreting task establishing a plurality of rasterising tasks. The method estimates an amount of parallelisable work available to be performed using the plurality of threads. The amount of parallelisable work is estimated using the established rasterising tasks and an expected number of interpreting tasks to be performed. The method selects, based on the estimated amount of parallelisable work, one of (i) an interpreting task to interpret objects of the document, and (ii) a rasterising task from the established plurality of rasterising tasks, and then executes the selected task using at least one thread to rasterize the document. | 05-28-2015 |
20150317762 | CPU/GPU DCVS CO-OPTIMIZATION FOR REDUCING POWER CONSUMPTION IN GRAPHICS FRAME PROCESSING - Systems, methods, and computer programs are disclosed for minimizing power consumption in graphics frame processing. One such method comprises: initiating graphics frame processing to be cooperatively performed by a central processing unit (CPU) and a graphics processing unit (GPU); receiving CPU activity data and GPU activity data; determining a set of available dynamic clock and voltage/frequency scaling (DCVS) levels for the GPU and the CPU; and selecting from the set of available DCVS levels an optimal combination of a GPU DCVS level and a CPU DCVS level, based on the CPU and GPU activity data, which minimizes a combined power consumption of the CPU and the GPU during the graphics frame processing. | 11-05-2015 |
20150324323 | INFORMATION PROCESSING SYSTEM AND GRAPH PROCESSING METHOD - A parallel computer system executes a plurality of processes each being assigned a memory space, by placing the information of a first graph vertex and the information of a first graph vertex group connected to the first graph vertex in a first memory space assigned to a first process, placing the information of the first graph vertex and the information of a second graph vertex group connected to the first graph vertex in a second memory space assigned to a second process, and sharing the result of computation concerning the first graph vertex in the first process and the result of computation concerning the first graph vertex in the second process between the first process and the second process. | 11-12-2015 |
20150332426 | TRANSFORMABLE COMPUTING DEVICE WITH PARALLEL PROCESSING CPUS AND DISCRETE GPU - A transformable computing device has a detachable first part with a main central processing unit (CPU), a second part having an display and a secondary CPU, and a detachable third part having a graphical processing unit (GPU). The main CPU can work with the secondary CPU via parallel processing means when the first part is connected to the second part; and the display of the second part can run on the GPU when the third part is connected to the second part. | 11-19-2015 |
20150339171 | DYNAMIC FEEDBACK LOAD BALANCING - A method for rendering a scene across N number of processors is provided. The method includes evaluating performance statistics for each of the processors and establishing load rendering boundaries for each of the processors, the boundaries defining a respective portion of the scene. The method also includes dynamically adjusting the boundaries based upon the establishing and the evaluating. | 11-26-2015 |
20150371355 | Host-Based Heterogeneous Multi-GPU Assignment - Examples of the disclosure assign a plurality of graphics processing units (GPUs) to a plurality of virtual machines (VMs) or processes. A composite score is generated for each GPU. The composite score represents the normalized processing capabilities of the multiple GPUs. Based on a comparison between the composite scores and allocated quantum corresponding to a proportional amount of GPU resources to which each VM is entitled, each VM is assigned to at least one of the GPUs. Graphics commands from the VMs are scheduled for execution by the assigned GPUs. | 12-24-2015 |
20150371357 | PROCESSING RESOURCE MANAGEMENT SYSTEM & METHODS - The present invention discloses a system and methods for parallel processing of multiple processing job requests; the system may include a server for receiving a job request, an algorithm for segmenting the job request to a few sub jobs and a few processors for processing the few sub jobs in parallel. Each sub job contains a few frames to be processed by the job processors and the outputs of few job processors are combined into a single output. The invention further discloses methods for proportional allocation of job segments and an optimization algorithm to automatically assign job requests and to adapt the resources of the system to meet customers demand according to predefined criteria. | 12-24-2015 |
20160019672 | METHOD AND APPARATUS FOR AN INTER-CELL SHORTEST COMMUNICATION - Novel method and system for distributed database ray-tracing is presented, based on modular mapping of scene-data among processors. Its inherent properties include matching between geographical proximity in the scene with communication proximity between processors. | 01-21-2016 |
20160055615 | Smart Frequency Boost For Graphics-Processing Hardware - A technique, as well as select implementations thereof, pertaining to smart frequency boost for graphics-processing hardware is described. A method may involve monitoring a queue of a plurality of graphics-related processes pending to be executed by a graphics-processing hardware to determine whether one or more predetermined conditions of the graphics-related processes in the queue are met. The one or more predetermined conditions may include an accumulation condition of the graphics-related processes in the queue. The method may also involve dynamically adjusting at least one operating parameter of the graphics-processing hardware in response to a determination that each of the one or more predetermined conditions of the graphics-related processes in the queue is met. | 02-25-2016 |
20160063665 | Processor, System, and Method for Efficient, High-Throughput Processing of Two-Dimensional, Interrelated Data Sets - Systems, processors and methods are disclosed for organizing processing datapaths to perform operations in parallel while executing a single program. Each datapath executes the same sequence of instructions, using a novel instruction sequencing method. Each datapath is implemented through a processor having a data memory partitioned into identical regions. A master processor fetches instructions and conveys them to the datapath processors. All processors are connected serially by an instruction pipeline, such that instructions are executed in parallel datapaths, with execution in each datapath offset in time by one clock cycle from execution in adjacent datapaths. The system includes an interconnection network that enables full sharing of data in both horizontal and vertical dimensions, with the effect of coupling any datapath to the memory of any other datapath without adding processing cycles in common usage. This approach enables programmable visual computing with throughput approaching that of hardwired solutions. | 03-03-2016 |
20160070552 | GENERAL PURPOSE SOFTWARE PARALLEL TASK ENGINE - A software engine for decomposing work to be done into tasks, and distributing the tasks to multiple, independent CPUs for execution is described. The engine utilizes dynamic code generation, with run-time specialization of variables, to achieve high performance. Problems are decomposed according to methods that enhance parallel CPU operation, and provide better opportunities for specialization and optimization of dynamically generated code. A specific application of this engine, a software three dimensional (3D) graphical image renderer, is described. | 03-10-2016 |
20160071305 | GENERAL PURPOSE SOFTWARE PARALLEL TASK ENGINE - A software engine for decomposing work to be done into tasks, and distributing the tasks to multiple, independent CPUs for execution is described. The engine utilizes dynamic code generation, with run-time specialization of variables, to achieve high performance. Problems are decomposed according to methods that enhance parallel CPU operation, and provide better opportunities for specialization and optimization of dynamically generated code. A specific application of this engine, a software three dimensional (3D) graphical image renderer, is described. | 03-10-2016 |
20160093069 | METHOD AND APPARATUS FOR PIXEL HASHING - An apparatus and method for pixel hashing. For example, one embodiment of a method comprises: determining X and Y coordinates for a pixel block to be processed; performing a lookup in a data structure indexed based on the X and Y coordinates of the pixel block, the lookup identifying an entry in the data structure corresponding to the X and Y coordinates of the pixel block; reading information from the entry identifying an execution cluster to process the pixel block; and executing the pixel block by the execution cluster. | 03-31-2016 |
20160098812 | APPLICATION PROCESSOR SHARING RESOURCE BASED ON IMAGE RESOLUTION AND DEVICES INCLUDING SAME - An application processor includes a first scaler including a first vertical scaler and a first horizontal scaler, and a second scaler including a second vertical scaler and a second horizontal scaler, wherein the second vertical scaler is selectively shared between the first scaler and the second scaler in response to a determination of resolution for an image being processed. | 04-07-2016 |
20160124852 | MEMORY MANAGEMENT FOR GRAPHICS PROCESSING UNIT WORKLOADS - A method, a device, and a non-transitory computer readable medium for performing memory management in a graphics processing unit are presented. Hints about the memory usage of an application are provided to a page manager. At least one runtime memory usage pattern of the application is sent to the page manager. Data is swapped into and out of a memory by analyzing the hints and the at least one runtime memory usage pattern. | 05-05-2016 |
20160125567 | IMAGE PROCESSING CIRCUIT AND METHODS FOR PROCESSING IMAGE ON-THE-FLY AND DEVICES INCLUDING THE SAME - An application processor includes an image processing circuit configured to process an image on-the-fly. The image processing circuit includes N pipelines, where N is a natural number of at least 2, and an enable control circuit configured to receive first information indicating a size of the image stored in a memory and second information indicating whether the image rotates and to enable M pipelines among the N pipelines based on the first information and the second information, where 2≦M≦N. The enabled M pipelines divide the image into M image segments and process the M image segments in parallel. | 05-05-2016 |
20160171643 | SYSTEM AND METHOD FOR PHOTOREALISTIC IMAGING WORKLOAD DISTRIBUTION | 06-16-2016 |
20160180486 | FACILITATING DYNAMIC PIPELINING OF WORKLOAD EXECUTIONS ON GRAPHICS PROCESSING UNITS ON COMPUTING DEVICES | 06-23-2016 |
20160180487 | LOAD BALANCING AT A GRAPHICS PROCESSING UNIT | 06-23-2016 |
20160253773 | PATH CALCULATION DEVICE, PATH CALCULATION METHOD AND PROGRAM | 09-01-2016 |