Paper 16

Paper Title: Mining Association Rules for Adaptive Search Engine Based on RDF Technology

Three Critical Questions

Monday

Group 1:

Member Name:Chiranjeevi Ashok Puvvula

1. How can a non expert in data mining understand the algorithms which have so many input parameters and rules which are generated from the mining?
2. Search engines rarely produce huge result sets which may be due to mismatch of the user query and the documents in which they search. Incorrectly exploiting the document structure is also responsible for such huge datasets. How can such mismatches be rectified? How to properly use the document structure?
3. How can complex search strategies through Meta data search engines produce the desired search results taking into account the interpretation problem?
4. How can hybrid search engines overcome the problem of information loss?

Group 2:

Member Name: Srikanth Kodali

1. By only seeing the search results the user can’t understand whether that result satisfies his requirement. So, obviously he has to go through every search link. But this will place a wrong entry in his history as he clicks on that link. Can we trust the genuinely of the results based on the history of the users profile?
2. This proposed approach works effectively for the web pages having Meta data and RDF data sources. Building RDF resources is more complicated and time consuming. How can this approach handle the web systems which are not using RDF and Meta data sources?
3. The scematic meaning between the results are retrieved by analyzing all the factors like RDF and Meta data sources, hybrid search engines. Suppose a web page have RDF, Meta data sources. This document is searched thrice because it has both the sources. How this redundancy is eliminated?
4. The history of the user will be grown unevenly. How the user is classified by analyzing his history? How the system behaves when the user searches for a thing which is not relevant to his profile?

Group 3:

Member Name: Sunil Kakaraparthi

1. Even though the web dominates in the real world, the systems do not have a sufficient knowledge in the language. It would be more helpful for the system if the users gives the metadata. Adding the metadata takes more time. How about the time constraint and time complexity in adding the metadata?

2. Experiments are performed on association rules. These experiments are used to check the rules. A virtual user was introduced whose job is to submit a query by selecting the keywords with attribute. What if the user is an inexperienced user and unable to control the situation in selecting the attribute?

3. The author proposes the association rule for the adaptive search. In this he introduced the generalization for the association rules. The retrieval history is saved as log with the Resource Descriptive Framework(RDF). Various patterns are generated by the users. How can the generalization concept justified by having different patterns?

4. The author has mentioned the use of Metadata and User log files for enhancing the search criteria. But, it would increase the memory requirements. So, it needs some compression techniques to store the large amounts of log files. Does this system support the compressed log storage?

Group 4:

Member Name:Prashant Sunkari, Nikhilesh Katakam

• How is author going to cluster the documents of similar type together? The clustering is important because general search by users is expected to return results of same type.
• The author stressed on their effort towards using their model in applications that include humantronics & using it for sharing information but how is it done?
• The proposed model combines the best of concepts in web and semantic web. The range of applications where this could be applied should be more than just two mentioned in the paper. What are the areas where this concept could be applied in the environment where we are moving from web to semantic web?

Group 5:

Member Name: Ritesh Mengji

1)Author has proposed the paper by considering a single document repository. Can his suggestion and the proposed architecture work for distributed systems?
2)The experiment conducted in this paper considers a virtual user. Can we rely on the outcome of the experiment and can we consider it to be appropriate?
3)Author has proposed architecture, are there any other models so that we can compare with the model presented by the author? [ Author proposed his architecture based on the experiments conducted by him, but different architecture of same domain can reveal different outcomes]

Group 6:

Member Name:

Group 7:

Member Name: Mohana Siri Kamineni

1. It is mentioned that we need to append the “metadata” at the end of each file. The document also states that when the “query term” is not specified as a part of “metadata” the required document is not retrieved. Then how accurate is this method for the retrieval of the required documents?
2. The ‘author” mentions that “full-text search” do not take into consideration the “structural and semantic” information of the files and it tends to retrieve many irrelevant documents when we submit a query. Then how apt is this method used for the document retrieval considering the time and cost factors and also in performance?
3. The paper states that computers do not understand the language and sense of human beings. As a result of this the “metadata” needs to be appended at the end of the document each and every time and it takes a lot of time. And also many users are not familiar with languages like RDF. When this method has a lot factors that are to be taken into consideration why are we using such technique for identifying the files?

Group 8:

Member Name:Muppalla Durga Maheswari

1. How far is the “generalization method” which is explained by the “RDF schema” is effective as it is stated that the users “behaviors” have several “patterns” in the “retrieval history” of the “reasoning engines” based on the “RDF.”
2. How the query term is searched based on the metadata as it is stated that the metadata is to be added manually for a document and there are large number of files in the web. So how the search based on the metadata can be done accurate every time and also it is stated that the document cannot be retrieved when the query term is included but it is not added as “specified metadata”.
3. It is stated that the “full text based search” doesn’t compares the “structural” or the “semantic” information of the documents and recover documents that are irrelevant to the user so the metadata based search is used but the metadata search also has some constraints like when the query term is not specified. In this how the issues like the “precision” and “recall” of the data is addressed.

Wednesday

Group 1:

Member Name:lattupalli,pelluri,voruganti

1. History base is some thing which needs to be balanced,how much history will it be holding,will the history be separate for each user,when will it be refreshed,if at all refreshed will all the previously inferred rules remain persistent?

2. Meta data ,though it’s a brilliant idea to maintain data about data, but as we have so many new documents coming in to the net,is it worth maintain the meta data all over the web?

3. 300 trails extracted 12 rules,so these trials have a certain history ,what if the trials are same all through out,there will not be so many rules,lets think that duplicaton is avoided,how will it handle such cases then.

Group 2:Addagalla, Bobbili, Gopinath

Member Name:

• The author talks about Vector Space model as a retrieval system. How will this system be successful in data retrieval without metadata and without metadata?

• The 12 rules are observers of user’s behavior. If any one of the rules fails to observe the behavior how will the system be affected and how trivial or how adverse will the affects be?

• The author defines rules based on three terms, namely Artificial Intelligence, Interaction and Rules. How was he satisfied by using only these terms? Are these 3 alone are sufficient in establishing positive impact on building rules?

• Regarding the 12 rules concept and 300 times query, Are these rules appropriate for extraction in the proposed method?

Group 3:

Member Name: Swathi Shastry

1>The metadata stated for each document needs to be exhaustive and cover all possible keywords and/or query terms that are provided in the search query. Is there any updating of metadata pertaining to a document when it is updated by a person other than the creator? And is the new updated metadata exhaustive?

2>How is the association rules formed for the first run if there is no pre existing data in the history base which provides the behavior of previous users? How would the adaptive search engine handle a corruption of the history base?

3>The metadata search engine needs to learn the relationships between element types by learning the relationships from the users’ logs and thus adaptively search the document effectively. Is there any verification scheme to validate the relationships learnt by the metadata search engine?

Group 4:

Member Name:Karunapiya Rameshwaram, Shaiv, Anusha Vunnam

Critical Questions:
1.The author has discussed about weighting of the keyword. But how does this weighting achieved. What concepts or algorithms are followed by term and inverse term frequencies in enforcing weighting?
2.Is the reasoning base transaction safe? There can be situations where reasoning rule might be updated. Then is some locking performed on the association rule while its being updated in order to maintain consistency?
3.Is it cost efficient to generate rules through certain number of trials?

Group 5:

Member Name: Gayathri Devi Bojja

1) What is the RDF specification the author is stating about in the documentary repository adaptive hybrid search???
2) Will the technologies that are brought up in the paper help in realizing the different topics like sharing knowledge and information etc???
3) When too many queries are posted the capacity of the association rules that can analyse the document retrieval is also questionable

Group 6:

Member Name:

Group 7:

Member Name:

Group 8:

Member Name:Brugu Kumar Bhargava Mudumba

1) Author in the paper explains different kinds of heterogeneity as syntactic, semantic and structural heterogeneity. By using RDF we can justify the syntactic and structural heterogeneity but how can the semantic heterogeneity be satisfied?
2) The recall and precision by definition inverse to each other then how can both be established at the same time. Definitely an improvement in one aspect will degrade the other aspect. Justify the reason?
3) We need to give the metadata tagging manually or instead we can group the attributes but is there a specific way to achieve the define metadata automatically or semi automatically so that precision can be increased by high rate?

Group 9:

Member Name: Satish Bhat, Holly Vo

1. Similar to document registration module which force metadata on registration, why don’t we check retrieval results against existing association rules and adaptively modify/add association rules immediately after each successful search rather than save the behavior and periodically extract rules? This point can solve storage issue and processing time.
2. How can user evaluate search results? Since only human (not machine) understand if a document is well matched, if we have this help from user, search engine can be significantly improved. Usually, user open returned documents one by one; however, only the well matched documents are carefully read or downloaded. If we can keep the ratio of well match documents over the returned documents and also extract some rules from these well matched documents, we have more semantic data to improve search engine.
3. How to determine thresholds for support and confidence values?

Group 10:

Member Name: Sunae Shin, Hyungbae Park

1. The adaptive retrieval method that can improve precision as well as recall is working based on the similar retrieval behaviors of past users to current user’s retrieval. I think this is similar with the current ranking algorithm because the ranking algorithm stores user’s frequent selections and uses these behaviors to retrieve the results.
2. The system may get wrong association rules by user’s misbehaviors. Thus, there is a possibility that we get the wrong results since they are generated by the wrong association rules.
3. In the query submission of a virtual user, the probabilities are predefined which means the system is not running with true randomly behaviors. We might get different results if we use totally random user’s behaviors.
4. As we can see from the paper, we can find fewer association rules as the number of trials to extract the rules is increased. This means there is some possibility if we use a massive repository then we may find no association rules which means there is no benefit to adapt their idea. Thus, I’m pity that they didn’t set the experiment environment with a massive repository.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License