Paper 14

Paper Title: Visual Mining of DNA Sequence; Actionable Knowledge Discovery: A Brain Informatics Perspective; Privacy-Preserving Data Mining: Past, Present, and Future; Toward Knowledge-Driven Data Mining

Three Critical Questions

Monday

Group 1:

Member Name: Chiranjeevi Ashok Puvvula

1. How to guarantee privacy, security and data integrity of the user data while it is being mined? How to guarantee the user about this security and privacy?
2. How to integrate mined data with the already inferred and reasoned knowledge? How can we asses the truthfulness of the inferred knowledge?
3. Do we need a unifying theory or a standard for the data mining approach? As we can see that most of the mining techniques are developed for the individual problems?
4. How to automate the data mining process? that is from combining different operations that are necessary for data mining to filtering data and finally the visualization , is there a way to automate all the three stages?

Group 2:

Member Name:Srikanth Kodali

1. By only seeing the search results the user can’t understand whether that result satisfies his requirement. So, obviously he has to go through every search link. But this will place a wrong entry in his history as he clicks on that link. Can we trust the genuinely of the results based on the history of the users profile?
2. This proposed approach works effectively for the web pages having Meta data and RDF data sources. Building RDF resources is more complicated and time consuming. How can this approach handle the web systems which are not using RDF and Meta data sources?
3. The scematic meaning between the results are retrieved by analyzing all the factors like RDF and Meta data sources, hybrid search engines. Suppose a web page have RDF, Meta data sources. This document is searched thrice because it has both the sources. How this redundancy is eliminated?
4. The history of the user will be grown unevenly. How the user is classified by analyzing his history? How the system behaves when the user searches for a thing which is not relevant to his profile?

Group 3:

Member Name: Sunil Kumar Garrepally

1. The visual mining uses the technique of spanning tree pattern which is based on 2D mapping and comparison. But, it would give better comparison results if it is analyzed in 3D. Are there any methods to form trajectories in 3D for the comparisons?
2. fMRI gives the images of the brain activities for various functions of it and it is very helpful in analyzing the brain functions and make algorithms. But, taking the images and processing the tasks and integrating them makes the process very slow. Are there any methods that could make the process faster or alternative for the image processing technique?
3. Two methods fMRI image processing and EEG frequency analysis are discussed for data mining. Which of these two methods is more cost and time efficient in collecting the data and process which is most important in data mining?
4. To provide privacy, SMC based methods that suffer from accuracy; item invariant methods that are more secure but involves more communication cost are used. Is there any method that could provide high security but overcome the difficulties of accuracy and communication cost?
5. The intelligent algorithms that differentiate various clusters and process the data for mining depends on size, shape or orientation. Is there any algorithm that does not depend on these factor?

Group 4:

Member Name:Ramya Devabhakthuni, Nikhilesh Katakam

• The paper stated that the “DNA representation and visualization techniques are used for only small data sets. How will these be used in case of large domain or large data sets and how the consistency is achieved using DNA mapping to “trajectories” ?
• The paper stated that the “spanning tree visualization” is used for only “metric distances”. These are not used for “non-metric” distances. How will this type of visualization be used for “non-metric distances”?
• The “data aggregation” needs to combine data from different data sets. How will these be achieved without compromising the “privacy risks” and when the results are sent to a “third-party site”?
• The “SMC” involves a number of “participating sites” to identify an “item’s global frequency”. How issues like cost be handled when the number of “participating sites” increases and how the “performance” is achieved?

Group 5:

Member Name: Ritesh Mengji

• In the first topic, author has considered small amount of data set for visualizing the DNA. Can the technique discussed by the author be used for the large collection of data set?
• When calculating the distance between two points, in the spanning tree visualization technique, author did not clearly explained how to calculate the distance and what are the metrics used.
• Discussing about the business knowledge and its scope for knowledge driven data mining, author says that data miners don’t have sufficient knowledge about the businesses and hence the y need to be complemented by the business experts over the lifecycle. But i feel that his is definitely not the right idea as the process and the model is expected to be automated and not be a gradual process requiring frequent feedback from the user itself who uses the system. So is such an approach presented according to the set standards of intelligent data mining or is it definitely a complementary technique which has to be implemented on a large scale?
• From the explanation the authors propose, it seems that privacy and accuracy can’t complement each other’s efficiency and such interoperability requires real hard problems. So the global frequency discovery approach and other similar approaches are only a partial solution to towards a final output. So what according to the real world situations is a better one to move towards? Is it the privacy or the accuracy which is to be given a weight?

Group 6:

Member Name:

Group 7:

Member Name: Priyanka Koneru

Mohana Siri Kamineni
1. When there occurs a collusion between the participating sites, privacy is susceptible using the approach “Secure Multipart Computation (SMC) “ and also the communication cost also increases as the sites must interact among themselves to find accurate global frequency of each item-set. So by using this technique how is the privacy ensured as it is meant to provide “ ‘ privacy in frequent item-set mining ‘ “.
2. As the data mining models make use public data sources, so by using the public data sources is the privacy of a data mining model being depicted by other models? Is the privacy being breached by the patterns of the data mining model itself?
3. There should be a clear distinction between the signal and noise as if one content data is used as noise, in the other content the same data is used as a signal. There are resourceful algorithms currently used in the “ data mining” to provide a vivid distinction between the noise and the signal.
4. The risk factor in providing the privacy grows rapidly when there is an aggregation of data from many sources as the accuracy model is based on sharing of data between the sources. So how is the risk eliminated in terms of privacy is not mentioned in the paper.
5. Is it possible for “ visualization and representation of DNA” of large data set items using more “advanced indexing and the compression techniques “. For now this visualization is done only to the small data set items but for what applications can this be extended and to be used for large data set items.

Group 8:

Member Name:Muppalla,Putheti

1. How the data mining privacy can be revealed by “associating the” “data model” with the “sources” that are public.
2. It is stated that the mining of data cannot be achieved well recovering the other forms such as the “banana shaped clusters”. How the data can be identified in “clusters” without taking its “shape, size, and spatial orientation” into consideration.
3. In “SMC” how the factors like cost and performance are taken care when there are large numbers of participating sites are present. How the feasibility issues are addressed when the participating sites increases.
4. How the privacy is achieved while aggregating the data from other sources. And while involving the third party during the mining or using the centralized server.
5. How the challenges regarding the data mining are addressed like the shift from the “method driven to a process driven approach”. And how the issues like the “combination” of the business and the technical knowledge can be addressed.

Wednesday

Group 1:

Member Name: pelluri,lattupalli,voruganti

Developing intelligent points and maintaining them is a tedious task. Developing a collection point and all its whereabouts require excessive storage techniques to be employed. Is such storages worthy enough for every point of collection of data? If no, are there any techniques to determine which points are to be made intelligent points?
POM/MDA techniques collect the data from various stimuli the human brain generates without the intervention of human-centric approach. Will all the human brains react in similar way to some task?
This question is simple but can generate random data that is important o be intervened before a knowledge based system is developed over complex brain informatics domain. How is this issue resolved?
The data mining model’s that is fully developed will require highly efficient data privacy model. The level of arbitration and data sharing increases in order to develop good knowledge based systems. The complexity to develop such systems that preserve the privacy of various data is high and such models are complex to develop. As the mining model’s improve their efficiency, the privacy preservance technique must also be improved. Are they not relatively on a linear curve nullifying the chance of developing the efficient privacy-preserving mining model?

Group 2:Addagalla, Bobbili, Gopinath

Member Name:

1. The author talks about using decision trees for extracting actions. The question here is whether decision trees are efficient when trying to classify data that is highly heterogeneous; in other words, has the author accounted for excessive fan-out of the decision tree?
2. The author introduces a technique by which the system learns from examples. The question is whether wrong actions, for ex., unsuccessful trials of an experiment, can affect the learning, if so, to what extent?
3. The author mentions the potential effects and opportunities of data mining in the field of video analysis. The question is whether data mining technology today is mature enough to be applied in these areas, that are, in the author’s own words, much too complex for automated analysis?
4. The authors have used a small set of examples to demonstrate how data mining technology can be used for DNA matching and analysis. The critical question is whether the sample set is large enough to prove the feasibility of the technology.

Group 3:

Member Name: Chi Zhang

1. The first paper points out a way to illustrate the current findings of DNA. However, as we acquire more and more knowledge, how can we use these knowledge to guide data mining? It is an interesting question and it may be an important one since biology science in some sense is the rising academic field in future.

2. The second paper stresses the methods of protecting privacy in data mining. In my opinion, privacy protection should involve certain people, such as database administrator. It is a systematic project rather than using certain methods. Why the paper does not state those issues clear?

3. The fourth paper advocates adding knowledge approach to data mining, however, not all data mining techniques need to change to knowledge based systems. We really need to calculate the cost and income in order to make a very nice decision. There are noise factors which may ruin the result, so how can we minimize these factors? It still remains a question.

Group 4:

Member Name:Shaiv, karuna priya rameshwaram, anusha vunnam

1) How is the warping distance calculated for objects that have non metric distance values between them?
2) What is indexing technique defined in the paper which is used in determining different stages of cancer?
3) In the SMC technique used to preserve privacy in data mining how can we determine whether the participating site is a legitimate participant or an intruder?

Group 5:

Member Name: Rahul Mootha,Rahul Reddy

1. Why does “Multiaspect data analysis “ ((MDA) a Brain Informatics methodology) require web based models when different sources of data are available to mine rules and extract knowledge?
2. Do the instruments EEG and fMRI use multiple algorithms to collect the information about the brain? How do they implement the two levels of processing as required by MDA ?
3. How are the key issues of designing the experiments to obtain data about the brain and its analyzation using data mining implemented ?
4. Randomization technique is used to minimize privacy breaches without exposing the exact frequency of data. It is transaction variant , then how is the transaction size considered as a privacy breach ?
5. How is the Dumb data converted to intelligent data?Sometimes dumb data is also considered in algorithms in extracting knowledge.How can efficient algorithms distinguish noisy data in extracting knowledge from the original data?

Group 6:

Member Name:

Group 7:

Member Name:

Group 8:

Member Name:Brugu Kumar Bhargava

1) The layer to be build needs to interact with all layers then can we make the layer distributed? If so making this layer distributed reduces the problem or increases the complexity in the work?
2) Author says he had a greedy solution for the problem which in the previous statement he said as NP complete. This is a paradox explain?
3) Author says that data centric model is not an efficient model for the Action driven KDD. What security features are affected by using this model? or by not using this model?

Group 9:

Member Name: Satish Bhat, Holly Vo

1. Should the other pivot point has the furthest distance from the closest neighbor of all points? It guarantees the intersection of any two transcribed circles.
2. What is the Grid size needed to support brain task analysis at reasonable rate? Is current technology possibly support brain complexity? (Consider complexity of computation only and disregard specific algorithm and computation.)
3. How to determine characteristics of data model (real-time, few sources, small local frequent-item) to determine best approach?
4. Can multi-level of data aggregation improve privacy and other costs? Each level contains data sources of the same characteristic and applies the best approach.
5. How to feed domain expert knowledge to technical model? And who will do the transformation (knowledge formulation)?

Group 10:

Member Name: Sunae Shin, Hyungbae Park

1. Although Wikid hierarchy is proposed, the concept of the intelligence is still too abstract. Also, the intelligent algorithm and intelligent data can be interpreted in many ways. The authors should be compare and prove that the turning point between the intelligent and non-intelligent. It would make more clear and understandable for knowledge driven data mining.
2. It is obvious that privacy is a significant issue. They discussed about protecting the data with two major issues which are privacy protecting and accurate data mining. It is reasonable for dealing with large and complex data. In my opinion, the combining the brain informatics and privacy protecting technique could be the new portion of the research.
3. They point out that the business knowledge and technical information is not quite useful before using the data mining technique. However, there is a lack of specific explanation of combining the data mining technique with that knowledge and the method how to collect the knowledge for business area. In addition, it would be better if there is a discussion about making the business knowledge to be intelligent.
4. Before they propose the data mining for privacy, they should argue about the issue that the data mining can be the protectable or not originally. Then the privacy protecting data mining suggestion might be stronger and more reasonable proposition. Furthermore, the treats of the basic data mining technique are needed to be considered.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License