This section includes 7 InterviewSolutions, each offering curated multiple-choice questions to sharpen your Current Affairs knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
Explain How To Use Dmx-the Data Mining Query Language? |
|
Answer» Data mining extension is BASED on the syntax of SQL. It is based on relational concepts and MAINLY used to CREATE and manage the data mining models. DMX comprises of two types of STATEMENTS: Data DEFINITION and Data manipulation. Data definition is used to define or create new models, structures. Example: Data manipulation is used to manage the existing models and structures. Example: Data mining extension is based on the syntax of SQL. It is based on relational concepts and mainly used to create and manage the data mining models. DMX comprises of two types of statements: Data definition and Data manipulation. Data definition is used to define or create new models, structures. Example: Data manipulation is used to manage the existing models and structures. Example: |
|
| 2. |
What Are Different Stages Of "data Mining"? |
|
Answer» Exploration: This STAGE involves preparation and collection of data. it also involves data cleaning, transformation. Based on size of data, DIFFERENT tools to analyze the data may be required. This stage helps to determine different variables of the data to determine their behavior. Model building and validation: This stage involves choosing the BEST model based on their predictive performance. The model is then applied on the different data sets and compared for best performance. This stage is also called as pattern identification. This stage is a little complex because it involves choosing the best pattern to allow easy predictions. DEPLOYMENT: Based on model SELECTED in previous stage, it is applied to the data sets. This is to generate predictions or estimates of the expected outcome. Exploration: This stage involves preparation and collection of data. it also involves data cleaning, transformation. Based on size of data, different tools to analyze the data may be required. This stage helps to determine different variables of the data to determine their behavior. Model building and validation: This stage involves choosing the best model based on their predictive performance. The model is then applied on the different data sets and compared for best performance. This stage is also called as pattern identification. This stage is a little complex because it involves choosing the best pattern to allow easy predictions. Deployment: Based on model selected in previous stage, it is applied to the data sets. This is to generate predictions or estimates of the expected outcome. |
|
| 3. |
What Are The Different Problems That "data Mining" Can Solve? |
|
Answer» *Data MINING helps analysts in MAKING faster business decisions which increases revenue with lower costs. *Data mining helps to understand, explore and IDENTIFY patterns of data. *Data mining automates PROCESS of FINDING predictive information in large databases. *Helps to identify previously hidden patterns. *Data mining helps analysts in making faster business decisions which increases revenue with lower costs. *Data mining helps to understand, explore and identify patterns of data. *Data mining automates process of finding predictive information in large databases. *Helps to identify previously hidden patterns. |
|
| 4. |
Define Rollup And Cube? |
|
Answer» CUSTOM ROLLUP operators provide a simple WAY of controlling the process of rolling up a member to its parents values.The rollup uses the contents of the column as custom rollup operator for each member and is used to EVALUATE the value of the member’s parents. If a cube has MULTIPLE custom rollup formulas and custom rollup members, then the formulas are resolved in the order in which the dimensions have been added to the cube. Custom rollup operators provide a simple way of controlling the process of rolling up a member to its parents values.The rollup uses the contents of the column as custom rollup operator for each member and is used to evaluate the value of the member’s parents. If a cube has multiple custom rollup formulas and custom rollup members, then the formulas are resolved in the order in which the dimensions have been added to the cube. |
|
| 5. |
What Is Etl? |
|
Answer» ETL STANDS for extraction, transformation and loading. ETL provide developers with an interface for DESIGNING source-to-target mappings, ransformation and JOB control parameter. ETL stands for extraction, transformation and loading. ETL provide developers with an interface for designing source-to-target mappings, ransformation and job control parameter. |
|
| 6. |
What Is Cure? |
|
Answer» CLUSTERING Using Representatives is called as CURE. The clustering algorithms generally work on spherical and SIMILAR size clusters. CURE OVERCOMES the problem of spherical and similar size CLUSTER and is more robust with respect to outliers. Clustering Using Representatives is called as CURE. The clustering algorithms generally work on spherical and similar size clusters. CURE overcomes the problem of spherical and similar size cluster and is more robust with respect to outliers. |
|
| 7. |
What Is Hierarchical Method? |
|
Answer» Hierarchical method GROUPS all the objects into a TREE of CLUSTERS that are ARRANGED in a hierarchical order. This method WORKS on bottom-up or top-down approaches. Hierarchical method groups all the objects into a tree of clusters that are arranged in a hierarchical order. This method works on bottom-up or top-down approaches. |
|
| 8. |
Differences Between Star And Snowflake Schemas? |
|
Answer» Star schema - all DIMENSIONS will be LINKED directly with a fat table. Star schema - all dimensions will be linked directly with a fat table. |
|
| 9. |
What Snow Flake Schema? |
|
Answer» Snowflake SCHEMA, each dimension has a primary dimension table, to which ONE or more additional DIMENSIONS can join. The primary dimension table is the only table that can join to the fact table. Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table. |
|
| 10. |
What Are The Foundations Of Data Mining? |
|
Answer» Data mining techniques are the RESULT of a LONG process of RESEARCH and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: Commercial databases are growing at unprecedented rates. A recent META Group survey of data warehouse projects found that 19% of respondents are beyond the 50 GIGABYTE level, while 59% expect to be there by second quarter of 1996.1 In some industries, such as retail, these numbers can be much larger. The accompanying need for IMPROVED computational engines can now be met in a cost-effective manner with parallel multiprocessor computer technology. Data mining algorithms embody techniques that have existed for at least 10 years, but have only recently been implemented as mature, reliable, understandable tools that consistently outperform older statistical methods. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Data mining is ready for application in the business community because it is supported by three technologies that are now sufficiently mature: Commercial databases are growing at unprecedented rates. A recent META Group survey of data warehouse projects found that 19% of respondents are beyond the 50 gigabyte level, while 59% expect to be there by second quarter of 1996.1 In some industries, such as retail, these numbers can be much larger. The accompanying need for improved computational engines can now be met in a cost-effective manner with parallel multiprocessor computer technology. Data mining algorithms embody techniques that have existed for at least 10 years, but have only recently been implemented as mature, reliable, understandable tools that consistently outperform older statistical methods. |
|
| 11. |
What Is Unique Index? |
|
Answer» UNIQUE index is the index that is APPLIED to any COLUMN of unique VALUE. Unique index is the index that is applied to any column of unique value. |
|
| 12. |
What Is Dimensional Modelling? Why Is It Important ? |
|
Answer» Dimensional MODELLING is a design concept used by many data warehouse desginers to build thier data warehouse. In this design MODEL all the data is stored in two TYPES of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the CONTEXT of measuremnets ie, the dimensions on which the facts are calculated. Dimensional Modelling is a design concept used by many data warehouse desginers to build thier data warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated. |
|
| 13. |
What Is The Use Of Regression? |
|
Answer» Regression can be used to solve the classification problems but it can also be used for applications such as FORECASTING. Regression can be performed USING many DIFFERENT types of techniques; in actually regression TAKES a set of data and fits the data to a formula. Regression can be used to solve the classification problems but it can also be used for applications such as forecasting. Regression can be performed using many different types of techniques; in actually regression takes a set of data and fits the data to a formula. |
|
| 14. |
Describe Important Index Characteristics? |
|
Answer» The characteristics of the INDEXES are: The characteristics of the indexes are: |
|
| 15. |
What Is Meta Learning? |
|
Answer» CONCEPT of combining the predictions made from multiple models of data MINING and analyzing those predictions to formulate a new and PREVIOUSLY UNKNOWN prediction. Concept of combining the predictions made from multiple models of data mining and analyzing those predictions to formulate a new and previously unknown prediction. |
|
| 16. |
Explain Mining Single ?dimensional Boolean Associated Rules From Transactional Databases? |
|
Answer» The apriori ALGORITHM: Finding FREQUENT ITEMSETS using candidate generation Mining frequent ITEM sets WITHOUT candidate generation. The apriori algorithm: Finding frequent itemsets using candidate generation Mining frequent item sets without candidate generation. |
|
| 17. |
What Is Time Series Analysis? |
|
Answer» A time SERIES is a SET of attribute values over a period of time. Time Series ANALYSIS may be viewed as finding patterns in the DATA and predicting FUTURE values. A time series is a set of attribute values over a period of time. Time Series Analysis may be viewed as finding patterns in the data and predicting future values. |
|
| 18. |
Define Wave Cluster? |
|
Answer» It is a grid based multi resolution clustering method. In this method all the objects are represented by a multidimensional grid structure and a wavelet transformation is applied for finding the dense region. Each grid CELL contains the information of the group of objects that map into a cell. A wavelet transformation is a PROCESS of signaling that PRODUCES the signal of VARIOUS frequency sub bands. It is a grid based multi resolution clustering method. In this method all the objects are represented by a multidimensional grid structure and a wavelet transformation is applied for finding the dense region. Each grid cell contains the information of the group of objects that map into a cell. A wavelet transformation is a process of signaling that produces the signal of various frequency sub bands. |
|
| 19. |
Explain Statistical Perspective In Data Mining? |
Answer»
|
|
| 20. |
What Is Attribute Selection Measure? |
|
Answer» The information Gain measure is used to select the test ATTRIBUTE at each NODE in the decision tree. Such a measure is REFERRED to as an attribute selection measure or a measure of the goodness of split. The information Gain measure is used to select the test attribute at each node in the decision tree. Such a measure is referred to as an attribute selection measure or a measure of the goodness of split. |
|
| 21. |
What Is A Lookup Table? |
|
Answer» A lookUp table is the one which is used when updating a warehouse. When the lookup is PLACED on the target table (fact table / warehouse) based upon the primary key of the target, it just updates the table by ALLOWING only new RECORDS or UPDATED records based on the lookup CONDITION. A lookUp table is the one which is used when updating a warehouse. When the lookup is placed on the target table (fact table / warehouse) based upon the primary key of the target, it just updates the table by allowing only new records or updated records based on the lookup condition. |
|
| 22. |
What Are The Steps Involved In Kdd Process? |
Answer»
|
|
| 23. |
What Is A Star Schema? |
|
Answer» Star schema is a type of organising the tables such that we can retrieve the result from the database easily and FASTLY in the WAREHOUSE environment.Usually a star schema consists of one or more DIMENSION tables around a FACT table which LOOKS like a star,so that it got its name. Star schema is a type of organising the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment.Usually a star schema consists of one or more dimension tables around a fact table which looks like a star,so that it got its name. |
|
| 24. |
Define Descriptive Model? |
|
Answer» It is used to determine the patterns and relationships in a sample DATA. Data MINING TASKS that belongs to DESCRIPTIVE model:
It is used to determine the patterns and relationships in a sample data. Data mining tasks that belongs to descriptive model: |
|
| 25. |
What Is Meteorological Data? |
|
Answer» Meteorology is the interdisciplinary scientific study of the atmosphere. It observes the changes in temperature, air pressure, moisture and wind direction. Usually, temperature, pressure, wind measurements and humidity are the variables that are measured by a thermometer, barometer, anemometer, and HYGROMETER, respectively. There are many METHODS of collecting data and Radar, Lidar, satellites are some of them. Weather forecasts are made by collecting quantitative data about the current state of the atmosphere. The main issue arise in this prediction is, it INVOLVES high-dimensional characters. To OVERCOME this issue, it is necessary to first analyze and simplify the data before proceeding with other analysis. Some data mining techniques are APPROPRIATE in this context. Meteorology is the interdisciplinary scientific study of the atmosphere. It observes the changes in temperature, air pressure, moisture and wind direction. Usually, temperature, pressure, wind measurements and humidity are the variables that are measured by a thermometer, barometer, anemometer, and hygrometer, respectively. There are many methods of collecting data and Radar, Lidar, satellites are some of them. Weather forecasts are made by collecting quantitative data about the current state of the atmosphere. The main issue arise in this prediction is, it involves high-dimensional characters. To overcome this issue, it is necessary to first analyze and simplify the data before proceeding with other analysis. Some data mining techniques are appropriate in this context. |
|
| 26. |
What Are Non-additive Facts? |
|
Answer» Non-Additive: Non-additive FACTS are facts that cannot be SUMMED up for any of the DIMENSIONS PRESENT in the FACT table. Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. |
|
| 27. |
Explain The Issues Regarding Classification And Prediction? |
|
Answer» PREPARING the DATA for classification and prediction:
Preparing the data for classification and prediction: |
|
| 28. |
Define Binary Variables? And What Are The Two Types Of Binary Variables? |
|
Answer» Binary VARIABLES are understood by TWO states 0 and 1, when state is 0, variable is absent and when state is 1, variable is present. There are two types of binary variables, symmetric and asymmetric binary variables. Symmetric variables are those variables that have same state values and WEIGHTS. Asymmetric variables are those variables that have not same state values and weights. Binary variables are understood by two states 0 and 1, when state is 0, variable is absent and when state is 1, variable is present. There are two types of binary variables, symmetric and asymmetric binary variables. Symmetric variables are those variables that have same state values and weights. Asymmetric variables are those variables that have not same state values and weights. |
|
| 29. |
Mention Some Of The Data Mining Techniques? |
Answer»
|
|
| 30. |
What Is An Index? |
|
Answer» Indexes of SQL Server are SIMILAR to the indexes in books. They help SQL Server retrieve the data quicker. Indexes are of two types. Clustered indexes and non-clustered indexes. Rows in the table are stored in the order of the clustered index KEY. Indexes of SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker. Indexes are of two types. Clustered indexes and non-clustered indexes. Rows in the table are stored in the order of the clustered index key. |
|
| 31. |
What Is Model Based Method? |
|
Answer» For optimizing a FIT between a given data set and a mathematical model based methods are USED. This method USES an assumption that the data are distributed by probability distributions. There are two basic approaches in this method that are For optimizing a fit between a given data set and a mathematical model based methods are used. This method uses an assumption that the data are distributed by probability distributions. There are two basic approaches in this method that are |
|
| 32. |
What Are The Advantages Data Mining Over Traditional Approaches? |
|
Answer» Data Mining is USED for the estimation of FUTURE. For example if we take a company/business organization by using the CONCEPT of Data Mining we can predict the future of business interms of Revenue (or) EMPLOYEES (or) Cutomers (or) Orders etc. Traditional approches use simple algorithms for estimating the future. But it does not give accurate results when compared to Data Mining. Data Mining is used for the estimation of future. For example if we take a company/business organization by using the concept of Data Mining we can predict the future of business interms of Revenue (or) Employees (or) Cutomers (or) Orders etc. Traditional approches use simple algorithms for estimating the future. But it does not give accurate results when compared to Data Mining. |
|
| 33. |
What Is Smoothing? |
|
Answer» SMOOTHING is an APPROACH that is used to remove the nonsystematic behaviors found in time series. It usually takes the FORM of finding moving averages of attribute VALUES. It is used to FILTER out noise and outliers. Smoothing is an approach that is used to remove the nonsystematic behaviors found in time series. It usually takes the form of finding moving averages of attribute values. It is used to filter out noise and outliers. |
|
| 34. |
What Is Spatial Data Mining? |
|
Answer» Spatial data mining is the application of data mining methods to spatial data. Spatial data mining follows along the same functions in data mining, with the end objective to find patterns in geography. So far, data mining and Geographic Information Systems (GIS) have existed as two separate technologies, each with its own methods, TRADITIONS and approaches to visualization and data analysis. Particularly, most contemporary GIS have only very BASIC spatial analysis functionality. The immense explosion in geographically referenced data OCCASIONED by developments in IT, digital mapping, remote sensing, and the global diffusion of GIS emphasises the importance of developing data driven INDUCTIVE approaches to geographical analysis and modeling. Data mining, which is the partially automated search for hidden patterns in large databases, offers great potential benefits for applied GIS-based decision-making. Recently, the task of integrating these two technologies has become critical, especially as various PUBLIC and private sector organizations possessing huge databases with thematic and geographically referenced data begin to realise the huge potential of the information hidden there. Among those organizations are: * offices requiring analysis or dissemination of geo-referenced statistical data Spatial data mining is the application of data mining methods to spatial data. Spatial data mining follows along the same functions in data mining, with the end objective to find patterns in geography. So far, data mining and Geographic Information Systems (GIS) have existed as two separate technologies, each with its own methods, traditions and approaches to visualization and data analysis. Particularly, most contemporary GIS have only very basic spatial analysis functionality. The immense explosion in geographically referenced data occasioned by developments in IT, digital mapping, remote sensing, and the global diffusion of GIS emphasises the importance of developing data driven inductive approaches to geographical analysis and modeling. Data mining, which is the partially automated search for hidden patterns in large databases, offers great potential benefits for applied GIS-based decision-making. Recently, the task of integrating these two technologies has become critical, especially as various public and private sector organizations possessing huge databases with thematic and geographically referenced data begin to realise the huge potential of the information hidden there. Among those organizations are: * offices requiring analysis or dissemination of geo-referenced statistical data |
|
| 35. |
What Is Ods? |
|
Answer» 1. ODS means OPERATIONAL Data Store. 1. ODS means Operational Data Store. |
|
| 36. |
Define Genetic Algorithm? |
|
Answer» Enables us to locate optimal binary STRING by processing an INITIAL random population of binary STRINGS by performing operations such as artificial mutation , crossover and SELECTION. Enables us to locate optimal binary string by processing an initial random population of binary strings by performing operations such as artificial mutation , crossover and selection. |
|
| 37. |
What Do U Mean By Partitioning Method? |
|
Answer» In PARTITIONING method a partitioning algorithm arranges all the objects into VARIOUS partitions, where the TOTAL NUMBER of partitions is less than the total number of objects. Here each partition represents a CLUSTER. The two types of partitioning method are k-means and k-medoids. In partitioning method a partitioning algorithm arranges all the objects into various partitions, where the total number of partitions is less than the total number of objects. Here each partition represents a cluster. The two types of partitioning method are k-means and k-medoids. |
|
| 38. |
Define Chameleon Method? |
|
Answer» Chameleon is ANOTHER hierarchical clustering method that uses DYNAMIC modeling. Chameleon is introduced to recover the drawbacks of CURE method. In this method two clusters are MERGED, if the interconnectivity between two clusters is greater than the interconnectivity between the objects WITHIN a cluster. Chameleon is another hierarchical clustering method that uses dynamic modeling. Chameleon is introduced to recover the drawbacks of CURE method. In this method two clusters are merged, if the interconnectivity between two clusters is greater than the interconnectivity between the objects within a cluster. |
|
| 39. |
Define Density Based Method? |
|
Answer» DENSITY BASED method DEALS with ARBITRARY shaped clusters. In density-based method, clusters are formed on the basis of the region where the density of the objects is HIGH. Density based method deals with arbitrary shaped clusters. In density-based method, clusters are formed on the basis of the region where the density of the objects is high. |
|
| 40. |
What Is A Dbscan? |
|
Answer» Density Based Spatial Clustering of APPLICATION Noise is called as DBSCAN. DBSCAN is a density based clustering method that converts the high-density objects regions into clusters with arbitrary shapes and SIZES. DBSCAN DEFINES the cluster as a maximal set of density CONNECTED points. Density Based Spatial Clustering of Application Noise is called as DBSCAN. DBSCAN is a density based clustering method that converts the high-density objects regions into clusters with arbitrary shapes and sizes. DBSCAN defines the cluster as a maximal set of density connected points. |
|
| 41. |
What Is A Sting? |
|
Answer» Statistical Information Grid is called as STING; it is a grid BASED multi resolution clustering method. In STING method, all the OBJECTS are contained into rectangular cells, these cells are kept into various LEVELS of resolutions and these levels are arranged in a hierarchical structure. Statistical Information Grid is called as STING; it is a grid based multi resolution clustering method. In STING method, all the objects are contained into rectangular cells, these cells are kept into various levels of resolutions and these levels are arranged in a hierarchical structure. |
|
| 42. |
What Are Interval Scaled Variables? |
|
Answer» INTERVAL scaled VARIABLES are continuous measurements of linear scale. For example, height and WEIGHT, weather temperature or coordinates for any CLUSTER. These measurements can be calculated USING Euclidean distance or Minkowski distance. Interval scaled variables are continuous measurements of linear scale. For example, height and weight, weather temperature or coordinates for any cluster. These measurements can be calculated using Euclidean distance or Minkowski distance. |
|
| 43. |
Define Pre Pruning? |
|
Answer» A tree is pruned by halting its construction early. Upon halting, the node becomes a leaf. The leaf may HOLD the most FREQUENT class AMONG the subset SAMPLES. A tree is pruned by halting its construction early. Upon halting, the node becomes a leaf. The leaf may hold the most frequent class among the subset samples. |
|
| 44. |
What Are The Benefits Of User-defined Functions? |
|
Answer» a. Can be used in a number of places without restrictions as compared to stored procedures. a. Can be used in a number of places without restrictions as compared to stored procedures. |
|
| 45. |
What Are The Different Ways Of Moving Data/databases Between Servers And Databases In Sql Server? |
|
Answer» There are SEVERAL WAYS of doing this. ONE can use any of the following options: There are several ways of doing this. One can use any of the following options: |
|
| 46. |
Explain How To Mine An Olap Cube? |
|
Answer» A DATA MINING extension can be used to slice the data the source cube in the order as DISCOVERED by data mining. When a cube is MINED the case table is a DIMENSION. A data mining extension can be used to slice the data the source cube in the order as discovered by data mining. When a cube is mined the case table is a dimension. |
|
| 47. |
Explain How To Use Dmx-the Data Mining Query Language. |
|
Answer» Data mining extension is based on the syntax of SQL. It is based on RELATIONAL concepts and MAINLY used to create and manage the data mining models. DMX COMPRISES of two types of statements: Data definition and Data manipulation. Data definition is used to define or create new models, STRUCTURES. Example: Data mining extension is based on the syntax of SQL. It is based on relational concepts and mainly used to create and manage the data mining models. DMX comprises of two types of statements: Data definition and Data manipulation. Data definition is used to define or create new models, structures. Example: |
|
| 48. |
Explain How To Work With The Data Mining Algorithms Included In Sql Server Data Mining? |
|
Answer» SQL Server DATA mining offers Data Mining Add-ins for office 2007 that ALLOWS discovering the patterns and relationships of the data. This also helps in an enhanced analysis. The Add-in called as Data Mining client for Excel is USED to first prepare data, build, evaluate, MANAGE and predict results. SQL Server data mining offers Data Mining Add-ins for office 2007 that allows discovering the patterns and relationships of the data. This also helps in an enhanced analysis. The Add-in called as Data Mining client for Excel is used to first prepare data, build, evaluate, manage and predict results. |
|
| 49. |
Explain The Concepts And Capabilities Of Data Mining? |
|
Answer» Data MINING is used to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting, planning STRATEGIES, finding meaningful PATTERNS etc. it is more COMMONLY used to transform large amount of data into a meaningful FORM. Data here can be facts, numbers or any real time information like sales figures, cost, meta data etc. Information would be the patterns and the relationships amongst the data that can provide information. Data mining is used to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting, planning strategies, finding meaningful patterns etc. it is more commonly used to transform large amount of data into a meaningful form. Data here can be facts, numbers or any real time information like sales figures, cost, meta data etc. Information would be the patterns and the relationships amongst the data that can provide information. |
|
| 50. |
What Is Sequence Clustering Algorithm? |
|
Answer» Sequence clustering ALGORITHM collects similar or related paths, SEQUENCES of data containing EVENTS. The data represents a series of events or transitions between states in a dataset like a series of web clicks. The algorithm will examine all probabilities of transitions and measure the differences, or distances, between all the possible sequences in the data set. This HELPS it to determine which sequence can be the best for input for clustering. E.g. Sequence clustering algorithm may help finding the path to store a product of “similar” nature in a retail ware house. Sequence clustering algorithm collects similar or related paths, sequences of data containing events. The data represents a series of events or transitions between states in a dataset like a series of web clicks. The algorithm will examine all probabilities of transitions and measure the differences, or distances, between all the possible sequences in the data set. This helps it to determine which sequence can be the best for input for clustering. E.g. Sequence clustering algorithm may help finding the path to store a product of “similar” nature in a retail ware house. |
|