Tuesday, September 14, 2004

 

 

 

 

 

Sep 14

4 p.m. - 5:30 p.m.

Web Intelligence, World Knowledge and Fuzzy Logic

Location: 405 Soda Hall
Seminar Speaker Name: L. A. Zadeh
Seminar Speaker Affil.: BISC Program; EECS, UC Berkeley
Seminar Series: BISC seminar
For more information: http://www-bisc.cs.berkeley.edu

Details:
Existing search engines—with Google at the top—have many remarkable capabilities; but what is not among them is deduction capability—the capability to synthesize an answer to a query from bodies of information which reside in various parts of the knowledge base.
In recent years, impressive progress has been made in enhancing performance of search engines through the use of methods based on bivalent logic and bivalent-logic-based probability theory. But can such methods be used to add nontrivial deduction capability to search engines, that is, to upgrade search engines to question-answering systems? A view which is articulated in this note is that the answer is “No.” The problem is rooted in the nature of world knowledge, the kind of knowledge that humans acquire through experience and education.
It is widely recognized that world knowledge plays an essential role in assessment of relevance, summarization, search and deduction. But a basic issue which is not addressed is that much of world knowledge is perception-based, e.g., “it is hard to find parking in Paris,” “most professors are not rich,” and “it is unlikely to rain in midsummer in San Francisco.” The problem is that (a) perception-based information is intrinsically fuzzy; and (b) bivalent logic is intrinsically unsuited to deal with fuzziness and partial truth.
To come to grips with the fuzziness of world knowledge, new tools are needed. The principal new tool—a tool which is briefly described in their note—is Precisiated Natural Language (PNL). PNL is based on fuzzy logic and has the capability to deal with partiality of certainty, partiality of possibility and partiality of truth. These are the capabilities that are needed to be able to draw on world knowledge for assessment of relevance, and for summarization, search and deduction.

Tuesday, September 21, 2004

 

 

 

 

 

Sep 21

4 p.m. - 5:30 p.m.

Text Summarization -- In Search of Effective Ideas and Techniques

Location: 405 Soda Hall
Seminar Speaker Name: Shuhua Liu
Seminar Speaker Affil.: Åbo Akademi University, Finland & and BISC Program; EECS, UC Berkeley
Seminar Series: BISC seminar
For more information: http://www-bisc.cs.berkeley.edu

Details:
The dominant approach to text summarization is selection-based, by
which the most content-bearing sentences or passages are identified
and selected to compose a summary. However, the results from such a
process often suffer from flaws such as incoherent content and poor
readability due to unclear relationships between the selected text
excerpts, dangling references, and so on. In our work we follow a
process model of text summarization based on topic analysis. By
directing the content selection process with identified topic
structure of the text, the logical relation between the selected
sentences is expected to be captured and better presented, which will
also be valuable resource for the understanding of text meaning. Key
issues and relevant techniques for the implementation of the model
will be discussed.

Tuesday, September 28, 2004

 

 

 

 

 

Sep 28

4 p.m. - 5:30 p.m.

Decision Tree using Evolutionary Techniques: GA-GP-Based Fuzzy Decison Tree Model

Location: 405 Soda Hall
Seminar Speaker Name: Souad Souafi Bensafi
Seminar Speaker Affil.: BISC Program, EECS Department, UC Berkeley
Seminar Series: BISC
For more information: http://www-bisc.cs.berkeley.edu

Details:
Our aim is to develop intelligent computing techniques that address the problem of multi-criteria decision making dealing with subjective and imprecise data. This kind of problem requires conception of intelligent systems able to replace a human with expertise in a specific domain in a decision making process. So, the intelligent system should take into account the subjective and imprecise character of data on the one hand, and represent the user or expert's preferences and knowledge on the other hand. For this purpose, we developed a generic multi-criteria model based on fuzzy logic concepts for decision support systems. Our goal is to build such a model by (1) fitting real-world data and (2) representing the preferences of specific-domain users or experts. Toward this end, we used evolutionary computation techniques. Initially, we worked on a first order aggregation model and performed its learning using genetic algorithms, in which these preferences have been represented by a weighting vector associated with the variables involved in the aggregation process. This has been used in a specific application related to university admissions. Then, we developed a more advanced multi-aggregation model based on a hierarchical decision trees and for the learning process of this model, we developed a technique inspired from genetic programming. In this model tree nodes represent aggregators, terminals or leaves correspond to variables, and weight values are added to the children branches for each aggregator. The aggregation result overall the variables is then obtained by running recursively the root aggregator of the tree.

Tuesday, October 5, 2004

 

 

 

 

 

Oct 5

4 p.m. - 5:30 p.m.

The Failure of Clustering in Search User Interfaces

Location: 405 Soda Hall
Seminar Speaker Name: Marti Hearst
Seminar Speaker Affil.: SIMS, UC Berkeley
Seminar Series: BISC
For more information: http://www-bisc.cs.berkeley.edu

Details:
Time and again, researchers (including the speaker, long ago) have proposed using
text or image clustering for search user interfaces, despite that fact that
usability studies consistently show negative results for general search (as
opposed to for analysis). In this talk I will summarize the evidence against
using clustering in general search user interfaces and describe why it fails.
To avoid this being an entirely negative talk, I will point the way towards what
appears to be the solution, which is the flexible and intuitive presentation of
categories in search user interfaces.


Speaker Bio:

Dr. Marti Hearst is an associate professor in SIMS, with an affiliate
appointment in the CS Division at UCB. Her primary research interests are user
interfaces and visualization for information retrieval, empirical computational
linguistics, and text data mining. She received BA, MS, and PhD degrees in
Computer Science from the University of California at Berkeley, and she was a
Member of the Research Staff at Xerox PARC from 1994 to 1997. Prof. Hearst is
on the editorial boards of ACM Transactions on Information Systems and ACM
Transactions on Computer-Human Interaction and was formerly on the boards of
Computational Linguistics and IEEE Intelligent Systems, and was the program
co-chair of HLT-NAACL '03 and SIGIR '99. She has received an NSF CAREER award,
an IBM Faculty Award, an Okawa Foundation Fellowship, and two student-initiated
Excellence in Teaching awards.

Tuesday; Oct 26, 2004

 

Oct 26

Tuesday; 1:30-2:30pm

Semantics – the implicit, the formal and the powerful; (with a case study in Glycomics)

Location: 606 Soda Hall
Seminar Speaker Name: Amit Sheth
Seminar Speaker Affil.: Large Scale Distributed Information Systems (LSDIS) lab, Univ. of Georgia
Seminar Series: BISC
For more information: http://www-bisc.cs.berkeley.edu

Details:

Semantics has been recognized as the key to next generation of more powerful information systems for better search, integration, question/answering as well as analysis/discovery.  Semantics has long
been studied in many disciplines including linguistics, AI, IR, information and database systems, and soft computing, and a rich variety of approaches, techniques and tools have been developed.  More
recently, the Semantic Web community has made concerted effort in using semantics by defining standards for the modeling of knowledge based on Description Logic (DL) based languages
such as OWL, and focused on corresponding reasoning techniques that have rather narrow set of applications. We view these recent approaches as addressing a subset of challenges,
complemented by techniques that deal with broad variety of information and knowledge, expressiveness, computational capabilities and computability.  In this talk we attempt to organize this
broad variety of options from a pedagogic perspective that characterizes semantic approaches as implicit (such as those based on statistical and machine learning), formal (such as those
based on DL to FOL) and powerful (such as those based on soft computing). 
 
 
To exemplify this perspective, we look at some examples from the domain of life sciences which offer rich sets of challenges due to the complexity of biological systems.  More specifically we
look at our current research in Glycomics that involve the requirements for creation of taxonomies and ontologies with higher expressive representations, semantic annotation of textual
data in heterogeneous formats as well as machine generated scientific data, semantic search of scientific literature, semantic integration of heterogeneous textual and scientific (nontextual)
data, wrapping of data analysis tools with semantically annotated Web Services, and development of semantic web processes leading to better and quicker interpretation/analytics and
discovery.  In particular, we will offer concrete examples demonstrating the need for more expressive representation that follow the ideas offered by Prof. Zadeh in “Toward a
perception-based theory of probabilistic reasoning with imprecise probabilities”. Among the novel research outcomes including automatic taxonomy generation in Taxaminer, development
of a comprehensive ontology for Glycomics called GlycO, early efforts in semantic annotation of machine generated scientific data, and preliminary ideas about fpOWL, an extension to the
ontology language OWL that allows probabilistic and fuzzy reasoning. See project at the LSDIS lab for more information. 
 
 
Acknowledgement: Will York, Christopher Thomas, and other members of Bioinformatics for Glycan Express project team; Cartic Ramakrishnan and members of Taxaminer team. 
 
 
 
About the speaker:  
 
Amit Sheth is an Educator, Researcher and Entrepreneur. He joined the University of Georgia and started the LSDIS lab in 1994. Earlier, he served in R&D groups at Bellcore (now Telcordia
Technologies), Unisys, and Honeywell. In August 1999, Sheth founded Taalee, Inc., a VC funded enterprise software and internet infrastructure startup based on the technology developed at the
LSDIS lab. He managed Taalee as its CEO until June 2001. Following Taalee's acquisition/merger, he serves as the CTO and co-founder of Semagix, Inc. (formerly Voquette, Inc). His research
has led to several commercial products and applications. He has published over 175 papers and articles (in the areas of semantic interoperability, federated databases, workflow management,
Semantic Web), given over 130 invited talks and colloquia including 19 keynotes, (co)-organized/chaired twelve conferences/workshops, and served on over 90 program committees.  He is a
member of W3C Advisory Committee, SWSA, etc. http://lsdis.cs.uga.edu/~amit and http://www.semagix.com/company_team.html 

 

Tuesday, Nov 9, 2004

 

Nov 9

4:00-5:30pm

Web Search as a Computational Challenge

Location: 306 Soda Hall
Seminar Speaker Name: Peter Norvig
Seminar Speaker Affil.: Google Inc
Seminar Series: BISC Seminar
For more information: http://www-bisc.cs.berkeley.edu

Details:
For users of the Internet, web search has emerged as the second most
popular application, after email. For computer scientists, web search
offers challenges in software infrastructure, distributed systems,
information retrieval, machine learning, and natural language
understanding. This talk will examine these challenges.

Short Bio:
===========
Peter Norvig is the Director of Search Quality at Google Inc.. He is a Fellow and Councilor of the American Association for Artificial Intelligence and co-author of Artificial Intelligence: A Modern Approach, the leading textbook in the field.
Previously he was head of the Computational Sciences Division at NASA Ames Research Center, where he oversaw a staff of 200 scientists performing NASA's research and development in autonomy and robotics, automated software engineering and data analysis, neuro-engineering, collaborative systems research, and simulation-based decision-making. Before that he was Chief Scientist at Junglee, where he helped develop one of the first Internet comparison shopping service; Chief designer at Harlequin Inc; and Senior Scientist at Sun Microsystems Laboratories.

Dr. Norvig received a B.S. in Applied Mathematics from Brown University and a Ph.D. in Computer Science from the University of California at Berkeley. He has been a Professor at the University of Southern California and a Research Faculty Member at Berkeley. He has over fifty publications in various areas of Computer Science, concentrating on Artificial Intelligence, Natural Language Processing and Software Engineering including the books Paradigms of AI Programming: Case Studies in Common Lisp, Verbmobil: A Translation System for Face-to-Face Dialog, and Intelligent Help Systems for UNIX.

 

 

Wednesday, November 10, 2004

 

 Nov 10

 1:00-2:30pm

 Social network analysis of text

Location: 380 Soda Hall

Seminar Speaker Name: Dragomir R. Radev

Seminar Speaker Affil.: University of Michigan

Seminar Series: BISC seminar

For more information: http://www-bisc.cs.berkeley.edu

 

Details:

Textual data is everywhere, in email and scientific papers, in online

newspapers and e-commerce sites. The Web contains more than 200 terabytes of

text not even counting the contents of dynamic textual databases. This

enormous source of knowledge is seriously underexploited. Textual documents

on the Web are very hard to model

computationally: they are unstructured, time-dependent, collectively

authored, and of uneven importance.  Traditional grammar-based techniques

don't scale up to address such problems. Novel representations and

analytical tools are needed.

 

   NewsInEssence (www.newsinessence.com) is a system that crawls the Web for

news, automatically clusters them by topic, and produces user-defined

extractive summaries of each cluster. A recent addition to the battery of

summarization algorithms available to NewsInEssence is the Cosine Centrality

method.  In this talk I will describe how one can apply the theory of social

networks and stochastic processes (in particular rank-based prestige and

random walks on undirected graphs) to multi-document text summarization.

 

   (I will begin my talk with a short tutorial on the mathematics needed for

the rest of the talk.)

 

   If time permits, at the end of the talk, I will quickly describe two

recent ongoing projects in my research group: one on machine learning for

object classification using random walks on bipartite

(feature-object) graphs and another on using phylogenetic techniques for

fact tracking in evolving multi-document summarization.

 

--------------------------------------------

Short Bio:

==========

 

Dragomir R. Radev is Assistant Professor of Information, Electrical

Engineering and Computer Science, and Linguistics at the University of

Michigan, Ann Arbor.  He leads the CLAIR (Computational Lingusitics And

Information Retrieval) group which currently includes 12 undergraduate and

graduate students.  Dragomir holds a Ph.D. in Computer Science from Columbia

University.  Before joining Michigan, he was a Research Staff Member at

IBM's TJ Watson Research Center in Hawthorne, NY.  He is the author of more

than 45 papers on information retrieval, text summarization, graph models of

the Web, question answering, machine translation, text generation, and

information extraction.  Dr. Radev's current research on probabilistic and

link-based methods for exploiting very large textual repositories,

representing and acquiring knowledge of genome regulation, and semantic

entity and relation extraction from Web-scale text document collections is

supported by NSF and NIH.  Dragomir serves on the HLT-NAACL advisory

committee, was recently reelected as treasurer of NAACL, is a member of the

editorial boards of JAIR and Information Retrieval, and is a four-time

finalist at the ACM international programming finals (as contestant in 1993

and as coach in 1995-1997). Dragomir received a graduate teaching award at

Columbia and recently, the U. of Michigan award for Outstanding Research

Mentorship (UROP).

 

 

Tuesday, Nov 30, 2004; 405 Soda Hall, 4:00-5:30pm

 

Tuesday, Nov 30

4:00-5:30pm

Title: Uncertainty in an unknown world

Prof. Stuart Russell

Computer Science Division, University of

 

Description: Recent advances in knowledge representation for probability

models

have allowed for uncertainty about the properties of objects and the

relations that might hold among them. Such models, however, typically

assume exact knowledge of which objects exist and of which object is

which---that is, they assume *domain closure* and *unique names*.

These assumptions greatly simplify the sample space for probability

models, but are inappropriate for many real-world situations. This

talk presents a formal language, BLOG, for defining probability models

over worlds with unknown objects and in which several terms may refer

to the same object.  The language has a simple syntax based on

first-order logic, combined with local probability functions for

quantifying conditional dependencies. A key additional element is the

*number* statement, which specifies a conditional distribution over

the number of objects that satisfy a given property. Subject to

certain acyclicity constraints, every BLOG model specifies a unique

probability distribution over the full set of possible worlds for the

first-order language. Furthermore, complete inference algorithms exist

for a large fragment of the language. I will present several example

models and discuss interesting issues arising from the treatment of

evidence in such languages.

Tuesday, Nov 30, 2004; 405 Soda Hall, 4:00-5:30pm

 

Nov 30,  2004

4:00-5:30pm; 405 Soda Hall

Recent Research in Cross-language Document Search

 

Fredric Gey

University of California, Berkeley

 

Abstract

 

Cross-language document search research has been underway for more than 10 years now and while much progress has been made, certain research challenges remain.   This talk will review

recent research in Cross-language information retrieval, including the 2004 evaluation workshops: NTCIR for Asian language retrieval in Japan (http://research.nii.ac.jp/ntcir-ws4/index.html)

and CLEF for European language retrieval (http://clef.iei.pi.cnr.it:2002/), as well as the U.S. DARPA “Hindi Surprise Language Exercise” of 2003.   Topics to be covered include:

 

      Language-specific processing (stemming, segmentation, stop-words)

 

      Word decompounding for German

 

      Translation disambiguation for bilingual dictionaries

 

      Parallel corpora induced lexicons

 

      Web corpora usage for out-of-vocabulary translation

 

      Special retrieval tasks (Patent Retrieval, Cross-language question answering

 

      Geographic information retrieval

 

      Challenges of less-commonly taught languages

 

      The road ahead in cross-language information retrieval research

 

 

 

Presenter:  Dr. Fredric Gey has been doing research in cross-language information retrieval since 1998.  He and his associates have participated in every cross-language information retrieval

evaluation in the United States, Japan and Europe.   Currently he is working on retrieval (including geographic information retrieval) of Russian language corpora and other digital objects.

Dr. Gey co-chaired the English-Arabic retrieval evaluation track at the TREC conferences in 2001 and 2002.  He co-chaired a workshop on “Cross-language Information Retrieval Research:

The Road Ahead” at the ACM SIGIR-2002 conference in Finland.  He is co-author of the entry on “Multilingual Information Retrieval” in the Encyclopedia of Library and Information

Science and co-editor of a forthcoming special issue on Cross-Language Information Retrieval of the Information Processing and Management Journal.



Notify the calendar administrator of a change to an existing EE or CS calendar entry.


Powered by WebEvent (tm).

College of Engineering | Bioengineering | Civil & Environmental Engineering | Electrical Engineering & Computer Sciences | Industrial Engineering & Operations Research | Materials Science & Engineering | Mechanical Engineering | Nuclear Engineering

UC Berkeley   © UC Regents   Privacy Statement   Feedback