1 Introduction Intelligent Wrapping of Information Sources in an Electronic Commerce Enviro

发布时间:2011-06-29 08:08:59   来源:文档文库   
字号:
Intelligent Wrapping of Information Sources in an Electronic Commerce EnvironmentSebastian PulkowskiInstitute for Program Structures and Data Organization, University of Karlsruhe,GermanyE-mail:pulkowsk@ira.uka.de1IntroductionThe World Wide Web can be seen as one big virtual library.Information about documents or even the doc-uments themselves in electronic format can be found on nearly every subject area.Thus literature search and delivery is a rapidly expanding market.Today al-most all booksellers and publishers place their offers on the Internet,and intermediaries that catalogue and index documents for search assist users in the retrieval of relevant information.Almost all of these do so to make a profit and,consequently,charge users and/or providers for their services.The problem facing a customer searching for infor-mation is to become knowledgeable about all these sources,find the most suitable ones and combine all sources into a search and delivery process that meets his or her needs.The UniCats[2]project at the Universitt Karlsruhe solves these problems by means of user agents and traders.These agents only submit the requests to se-lected information sources,aided by the users profiles. The selection is done by so-called traders,which hold technical and textual information about the sources. The system mainly relies on wrappers to adapt the data sources such that user agents and traders can function as desired.The wrapper translates the user request to the source and the sources results to the user.User agents and traders expect that a wrapper will carry out a request in its entirety,without further interaction on their part.If we assume that the sources are provided in the form of a set of statically or dynamically gener-ated HTML pages,the following problems arise while wrapping information sources:The information contributing to the answer for a request is usually distributed over several Web pages.Thus,the wrapper has to navigate the source,collect the required information from the pages,and present only thefinal result to the user.Interaction with the user must be reduced to a minimum.Today more and more commercial search-centerslike the Fachinformationszentrum Karlsruhe ap-pear on the literature market.They charge a user not only for the delivery of documents,but also for searching their database.Their search fees are typically based on processing time and result size and are thus hard to predict.The search has to be stopped when a user given limit is reached, eventually with an empty result.To avoid this,a pre-calculation of costs based on meta data and data about previous requests is desirable.The functionality provided by sources varies con-siderably.A uniform wrapper for all information sources would have to cover login-handling,se-curity,payment,registration,metadata collection or result formatting.The web pages change their layout frequently.A wrapper accessing a commercial database must check for such changes and make sure it does not spend money on queries whose result pages it is unable to parse.Another problem is the wrapper construction.E.g.,a university library,which usually has alot of sources included in their search system, must be able to generate new and modify exist-ing wrappers quickly.After presenting some other projects in this subject area,we will show our approach to wrapping and wrapper construction for information sources in an electronic commerce environment.Of course these wrappers could also be used for conventional,non-profit information sources.2Related WorkQuerying Web sources and retrieving data from semistructured and structured Web sources has be-come more and more important and receives attention in the database literature(see[4]for a survey). Some projects cover the extraction of information from HTML pages[8,10,7].Most of these use their

本文来源:https://www.2haoxitong.net/k/doc/c496de140b4e767f5acfcedb.html

《1 Introduction Intelligent Wrapping of Information Sources in an Electronic Commerce Enviro.doc》
将本文的Word文档下载到电脑,方便收藏和打印
推荐度:
点击下载文档

文档为doc格式