Home » Bibliography Agent

Bibliography Agent Community

Gollasary

Spider: A computer robot program "spide" on the internet to collect / find document.
BibTex: A record of meta-data which describes a bibliography of a document, usually a paper.
Agent: A computer program which automously does jobs delegated by humans.

Mission

Given a BibTex Entry, how to find the full-text or abastract data of the document?

Why agents? Community?

Most of all, it was a term project of a course called "Intelligent Agents". And there are some reasons that we thought this was a good approach for this domain". Let me give you an example first:

Assume that you're looking for anything about "Agents" written by "Jane, Yung-jen Hsu", published by AAAI, what will you do? 1. Use general search engine 2. Connect to your closest library homepage 3. Try to connect to AAAI to look for it 4. ...

As you can see, there are more ways to do it and all of them need different "search" mechanism. Thus we decides to have many "specifialized" agents to cooperate to finish this job. Moreover, agent is automous; we'd like each agent to operate automously, to keep running, and to adapt itself.

Spider

Many agents in our community need to do searching starting from a certain URL. Instead of doing duplicated jobs, we decide to build a spider which, given the BibTex entry, searches through nearby indicated web-pages or URLs. Then return possibles links, the scores, the way we found it and the fields in BibTex we matched. So that they can use these feedback to improve themselves.

Task / Input

URL / HTML page: the URL / HTML page which we use to start to search (some agents give more than 1 URL would construct a HTML page and deliver it to Spider.
MatchingList: Some agents, according to previous feedbacks, know how to find this document. They directly tell Spider "how to get it". For example, they might tell Spider "try to match 'Author' first, then 'Title', then...".
ExploreLeve : Some agents are certain the document should reside in a certain depth of HTML hierarchy.
Timout: Sometimes network is just very slow. Spider, as an agent, should always give feedback in a reasonable time which can be adjusted by its delegate.

Output / Feedback

HTML: To make output consistent, Spider returns a formatted HTML
Number: Number of links found
Matched There are more than one result could be returned, this stores the URL candidate.
MatchedList For each matched URL, this records "how Spider found the URL".

Techniques used in Spider

  1. Multi-connection: Network might be slow. Spider maintains multiple connections in the same time to fetch and evaluate the document.
  2. Agent Communicator: All agents talks to a "Broker" following a special language (simplified KQML).

Related information

  1. Bibliography Agent is still under development and advised by Jane Yung-jen Hsu at Intelligent Robot Lab.,Computer Science and Information Engineering Department, NTU, Taiwan.
  2. The Bibliography Agent Community: by Jane Yung-jen Hsu, Tzong-han Tsai, Keh-ming Luoh, and Shih-jui Lin.
  3. Part of spider is implemented by Bo-heng Lin and Yu-chong Li.

Last Updated : 7/19/2005 by Bo-chieh Yang (bcyang@alumni.cmu.edu)