notes.txt
(continued at:
http://bot.blog-city.com/ )
I'm looking for something along the lines of TopicMaps or
XFML wrt topic hierarchies; but need also to be able to
optionally associate a set of weighted regular expressions
with each topic. This lets the topics do other "concept-based"
things.
=================================================
Notes: Concept-based systems, Thesauri, TopicMaps
=================================================
Word Wranglers
Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources
http://www.intelligentkm.com/feature/010101/feat1.shtml
SVMlight
Support Vector Machine
http://svmlight.joachims.org/
http://www.econtentmag.com/r5/2002/reamy11_02.html
Auto-Categorization: Coming to a Library or Intranet Near You!
(Nov 2002)
--
''[BOT] Concept Dictionary. This is a set of topics and
subtopics about drugs. Manually, a set of weigthed generating
terms were added. A spider searches the web for stories related
to drugs and automatically assigns topics to these stories. The
end results gets exported in this XFML feed, which is the first
known case of someone using the occurrence strength concept
which lets you indicate how much trust you have in the
occurrence. I say cool.''
http://poorbuthappy.com/ease/archives/2002/12/07/611/bot-concept-dictionary-this-is
Yay! Someone gets it :-)
http://www.wikipedia.org/wiki/Ontology_(computer_science)
The next scenario I can imagine is as a way
of producing a consolidated "on-topic" feed from
a number of other feeds. Combined with technology
to scrape RSS from sites and databases and with a
little automagic to add topics where they don't
exist this could be very powerful.
http://radio.weblogs.com/0107808/2002/11/28.html#a577
[!!!!!]
''The idea is to produce a facility within liveTopics for
suggesting topics based upon the text entered in a post.
At the moment there is a simple facility based upon a
word search for existing topics, I'm keen to improve upon
that in the future.''
http://radio.weblogs.com/0107808/2002/09/08.html#a365
Untangling Text Data Mining (TDM), LINDI Project,
http://www.sims.berkeley.edu/~hearst/papers/acl99/acl99-tdm.html
http://poorbuthappy.com/
Interesting RSS, XFML, FOAF discussions
Syndication News from Bill Kearney
http://www.syndic8.com/~wkearney/blogs/syndic8/
Mind your phraseology!
Using controlled vocabularies to improve findability
http://www.digital-web.com/tutorials/tutorial_2002-08.shtml
XFML Core is an open XML format for publishing and sharing
hierarchical faceted metadata and indexing efforts. XFML Core is
lightweight and easy to implement, yet uniquely powerful.
http://xfml.org/
XML Topic Maps
http://www.topicmaps.org/xtm/1.0/
http://www.topicmap.com/
==>
http://www.ontopia.net/topicmaps/materials/tao.html
*****
Ontology for describing topic hierarchies
http://www.daml.org/ontologies/135
http://daml.umbc.edu/ontologies/topic-ont.daml
http://babage.dia.fi.upm.es/ontoweb/wp1/OntoRoadMap/show_onto.js
p?onto_name=topic-ont.daml
Concept-Based Systems
----------------------
ArchiText .... Excite (history)
http://www.redherring.com/mag/issue19/inside.html
Sun Labs: Conceptual Indexing/Retrieval
Improving your ability to find information online
Conceptual Indexing
for Precision Content Retrieval
http://research.sun.com/research/knowledge/index.html
DCARS (Document Content Analysis and Retrieval System)
http://www.cs.buffalo.edu/~thies/Calspan/report.html
As military organizations continue to collect and store
increasing amounts of electronic textual documents, users need
automated tools to facilitate precise timely retrievals. The
Information Directorate, in collaboration with the National Air
Intelligence Center (NAIC), developed the Document Content
Analysis and Retrieval System (DCARS) to permit users to
retrieve text documents with increased speed and precision from
the Central Information Reference and Control Repository. DCARS
uses a state-of-the-art search engine that includes several
dictionaries and a thesaurus. The directorate's Global
Information Base Branch and NAIC developed tools to convert
military thesaurus and acronym lists for easy integration into
DCARS by adding a higher recall capability. Directorate
engineers extended DCARS technology to the World Wide Web as
WebDCARS. DCARS/WebDCARS provides full text document search and
retrieval along withnatural language and Boolean searches.
http://www.afrlhorizons.com/0001/t.html
http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=DCARS+con
cepts
http://www.textanalysis.info/
Concept Based Information Retrieval
http://www.google.com/search?hl=en&q=concept-based-information-r
etrieval&btnG=Google+Search
APELON ANNOUNCES NEXT-GENERATION HEALTHCARE CONTENT TAGGING TOOL
Software brings new power to content tagging and data mining
''Concept-based indexing translates phrases and words to their
underlying clinical concepts, for example, "heart attack",
"myocardial infarction" and "MI" would all index to the
same concept. Concept-based retrieval translates the user's
search words and phrases to the same set of concepts. The
result is consistent, comprehensive retrieval of the right
healthcare information.
http://www.apelon.com/news/press_062502.htm
http://www.apelon.com/products/products_conceptbased.htm
Enhancing Internet Search Engines to Achieve Concept-based
Retrieval Fenghua Lu1, Thomas Johnsten2, Vijay Raghavan1 and
Dennis Traylor3
http://www.osti.gov/inforum99/papers/csss.html
Theme Extraction - How it Works
Active Navigation theme extraction technology is at the core
of Portal Maximizer's power. Our theme extraction is based on
proven linguistic and statistical mining techniques to ensure
that themes truly reflect the meaning of content and guarantee
that context is maintained throughout the entire information
discovery process.
http://www.multicosm.com/Technology/Technology_Theming.htm
(concept based)
AFRL Enhances Precision Document Retrieval and Electronic Mail
Surveillance
|