Patent application number | Description | Published |
20140372090 | INCREMENTAL RESPONSE MODELING - A method of selecting a one-class support vector machine (SVM) model for incremental response modeling is provided. Exposure group data generated from first responses by an exposure group receiving a request to respond is received. Control group data generated from second responses by a control group not receiving the request to respond is received. A response is either positive or negative. A one-class SVM model is defined using the positive responses in the control group data and an upper bound parameter value. The defined one-class SVM model is executed with the identified positive responses from the exposure group data. An error value is determined based on execution of the defined one-class SVM model. A final one-class SVM model is selected by validating the defined one-class SVM model using the determined error value. | 12-18-2014 |
20150261846 | COMPUTERIZED CLUSTER ANALYSIS FRAMEWORK FOR DECORRELATED CLUSTER IDENTIFICATION IN DATASETS - A computing device to automatically cluster a dataset is provided. Data that includes a plurality of observations with a plurality of data points defined for each observation is received. Each data point of the plurality of data points is associated with a variable to define a plurality of variables. A number of clusters into which to segment the received data is repeatedly selected by repeatedly executing a clustering algorithm with the received data. A plurality of sets of clusters is defined based on the repeated execution of the clustering algorithm that resulted in the selected number of clusters. A plurality of composite clusters is defined based on the defined plurality of sets of clusters. The plurality of observations is assigned to the defined plurality of composite clusters using the plurality of data points defined for each observation. | 09-17-2015 |
20150269241 | TIME SERIES CLUSTERING - A method of transforming time series data to cluster data is provided. Time series data including a plurality of time series is received. A distance between a first time series of the plurality of time series and each of a remaining set of time series of the plurality of time series is computed pairwise between each of the remaining set of time series of the plurality of time series and the first time series. The computed values of the distance are sorted in increasing value. Gap width values are computed as a difference between successive pairs of the sorted, computed values. Whether a cluster including the received time series data is uniform is determined based on the computed gap width values. Cluster data including the first time series and the remaining set of time series assigned to the cluster is output when the cluster is determined to be uniform. | 09-24-2015 |
20150324398 | CONTINGENCY TABLE GENERATION - A method of creating a contingency table is provided. Whether or not a variable level list exists for a second variable in tree data is determined. When the variable level list exists for the second variable in the tree data, a first node memory structure is determined for the second variable from the variable level list, a first value of a first variable is determined using a first observation indicator and the tree data, and a first counter value is added to the contingency table in association with the first value of the first variable and a first value of the second variable. The first node memory structure includes the first value indicator, the first counter value, and the first observation indicator. The first value indicator indicates a first value of the second variable. | 11-12-2015 |
20150324403 | DATA STRUCTURE SUPPORTING CONTINGENCY TABLE GENERATION - A method of converting data to tree data is provided. A first node memory structure that includes a first value indicator, a first counter value, and a first observation indicator is initialized for a first variable. The first value indicator is initialized with a first value of the first variable selected from first observation data, and the first observation indicator is initialized with a first indicator that indicates the first observation data. The first value of the first variable is compared to a second value of the first variable. The first counter value included in the first node memory structure is incremented when the first value of the first variable matches the second value of the first variable. Corresponding values of second observation data are compared to the identified values from first observation data when the first value of the first variable matches the second value of the first variable. A next observation is read from the data when the identified values match the corresponding values. The tree data is output after a last observation of the data is processed. | 11-12-2015 |
20160048557 | GRAPH BASED SELECTION OF DECORRELATED VARIABLES - A computing device to select decorrelated variables using a graph based method is provided. A correlation value is computed between each pair of a plurality of variables to define a correlation matrix. A binary threshold value is compared to each correlation value to define a binary similarity matrix from the correlation matrix. An undirected graph comprising a subgraph that includes one or more connected nodes is defined based on the binary similarity matrix to store connectivity information for the plurality of variables. Each node of the subgraph is pairwise associated with a unique variable of the variables. (a) A least connected node is selected from the undirected graph based on the connectivity information. (b) The selected least connected node is removed from the undirected graph. (c) The connectivity information for the undirected graph is updated based on the removed node. (d) (a)-(c) are repeated until a stop criterion is satisfied. | 02-18-2016 |
20160048577 | CLUSTER COMPUTATION USING RANDOM SUBSETS OF VARIABLES - A computing device to compute clusters using random subsets of variables is provided. Each data point of a plurality of data points is associated with a variable to define a plurality of variables. A subset of the plurality of variables is randomly selected. The subset does not include all of the plurality of variables. A number of clusters into which to segment the received data is determined. Cluster data that defines each cluster of the determined number of clusters is determined by executing a clustering algorithm with the received data using only the plurality of data points defined for each observation that are associated with the randomly selected subset of the plurality of variables. The determined cluster data is stored to cluster second data into the determined number of clusters. The second data is different from the received data. | 02-18-2016 |
20160048578 | DETERMINATION OF COMPOSITE CLUSTERS - A computing device to compute composite clusters is provided. A first and a second plurality of centroid locations are computed by executing a clustering algorithm with a first portion of data and a first input parameter and a second portion of the data and a second input parameter, respectively. The first portion is different from the second portion or the first input parameter is different from the second input parameter. A plurality of composite centroid locations is computed using the computed first and second plurality of centroid locations to define a composite set of clusters. An observation is selected. A cluster of the composite set of clusters to which to assign the observation is determined using the plurality of composite centroid locations. The selecting and the determining is repeated with each observation of the plurality of observations as the observation to define cluster assignments for the plurality of observations. | 02-18-2016 |
20160048579 | PROBABILISTIC CLUSTER ASSIGNMENT - A computing device to assign observations to clusters based on a statistical probability is provided. A first cluster assignment is defined by assigning the plurality of observations to a first set of clusters. A second cluster assignment is defined by assigning the plurality of observations to a second set of clusters. A set of composite clusters is defined based on the defined first set of clusters and the defined second set of clusters. For each observation, a statistical probability value for assigning an observation to each composite cluster of the defined set of composite clusters is computed based on the first and second cluster assignments and a composite cluster assignment is defined by assigning the observation to a cluster of the set of composite clusters based on the computed statistical probability value. The defined composite cluster assignment is stored. | 02-18-2016 |
20160048756 | NEURAL NETWORK BASED CLUSTER VISUALIZATION - A computing device presents a cluster visualization based on a neural network computation. First centroid locations are computed for first clusters. Second centroid locations are computed for second clusters. Each centroid location includes a plurality of coordinate values where each coordinate value relates to a single variable of a plurality of variables. Distances are computed pairwise between each centroid location. An optimum pairing is selected based on a minimum distance of the computed pairwise distances where each pair is associated with a different cluster of a set of composite clusters. Noised centroid location data is created. A multi-layer neural network is trained with the noised centroid location data. A projected centroid location is determined in a multidimensional space for each centroid location as values of hidden units of a middle layer of the multi-layer neural network. A graph is presented for display that indicates the determined, projected centroid locations. | 02-18-2016 |