|
Anthony Tomasic is the US
CEO and co-founder of E-XMLMedia,
a XML-based, enterprise scale, software vendor start-up located in Pittsburgh,
PA and Paris, France. E-XMLMedia was a winner in the French National
Competition for New Ventures in 1999. Before starting E-XMLMedia,
in reverse chronology, Anthony was Chief
Technical Officer of Common Object, Inc., a next-generation banner
advertising application service provider. For two years, he was an
engineer for Digital
Integrity, Inc., a digital intellectual property software start-up. For four
years, he was an invited researcher at Dyade, a research and
development consortium established by Institute National de Researche en Informatique et en Automatic (INRIA) and the
Group Bull. He was scientific director for the team of students
and engineers that built Disco, a distributed heterogeneous database system.
During its existance, a paper describing Disco was the most cited data
warehousing paper according to NEC CiteSeer Research Index. In 1994 he was a
post-doctoral researcher at the IBM Almaden Research Center. From 1991 to 1994 he
was research staff for the Database Group at Stanford University. At
Stanford he completed his Ph.D. work on the performance of distributed
information retrieval search engines. From Princeton University, he received
in 1993 an MS and in 1994 a Ph.D. from the Department of Computer Science. From
1983 to 1989 he was a researcher at the European Computer-Industry Research Centre (ECRC).
He received a BS from the Department of Computer Science of Indiana University,
in 1983.
Articles
George Mihaila, Louiqa Raschid and Anthony Tomasic, "Locating and
Accessing Data Repositories with WebSemantics," Technical Note,
The VLDB Journal, 11(1), 2002.
Abstract
Many collections of scientific data in particular disciplines are
available today on the World Wide Web. Most of these data sources are
compliant with some standard for interoperable access. In addition,
sources may support a common semantics, i.e., a shared meaning for the
data types and their domains. However, sharing data among a global
community of users is still difficult because of the following
reasons: (i) data provides need a mechanism for describing and
publishing available sources of data; (ii) data administrators need a
mechanism for discovering the location of published sources and
obtaining metadata from these sources; and (iii) users need a
mechanism for browsing and selecting sources. This paper describes a
system, WebSemantics, that accomplishes the above tasks. We describe
an architecture for the publication and discovery of scientific data
sources that is an extension of the World Wide Web architecture and
protocols. We support catalogs containing metadata about data sources
for some application domain. We define a language for discovering
sources and querying their metadata. We then describe the
WebSemantics prototype.
Hubert Naacke, Anthony Tomasic, and Patrick Valduriez, "Validating Mediator
Cost Models with Disco," in Networking and Information Systems
Journal, 2(5), 2000.
Abstract
Disco is a mediator system developed at INRIA for accessing
heterogeneous data sources over the Internet. In Disco, mediators
accept queries from users, process them with respect to wrappers, and
return answers. Wrapper provide access to underlying sources. To
efficiently process queries, the mediator performs cost-based query
optimization. In a heterogeneous distributed database, cost-estimate
based query optimization is difficult to achieve because the
underlying data sources do not export cost information. Disco's
approach relies on combining a generic cost model with specific cost
information exported by wrappers. In this paper, we propose a
validation of Disco's cost model based on experimentation with real
Web data sources. This validation shows the efficiency of our generic
cost model as well as the efficiency of more specialized cost functions.
Luis Gravano, Hector Garcia-Molina and Anthony Tomasic, "GlOSS: Text-Source
Discovery over the Internet," in ACM Transactions on Database
Systems, 24(2), 1999.
Abstract
The dramatic growth of the Internet has created a new problem for
users: the location of relevant sources of documents. This article
presents a framework for (and experimentally analyzes a solution to)
this problem, which we call the text-source discovery problem. Our
approach consists of two phases. First, each text source exports its
contents to a centralized service. Then, users present queries to the
service, which returns an ordered list of promising text sources. This
article describes GlOSS -- Glossary of Servers Server --, with two
versions: bGlOSS, which provides a Boolean query retrieval model, and
vGlOSS, which provides a vector-space retrieval model.
We also present hGlOSS, which provides a decentralized version of the
system. We extensively describe the methodology for measuring the
retrieval effectiveness of these systems and provide experimental
evidence, based on actual data, that all three systems are highly
effective at determining promising text sources for a given query.
Anthony Tomasic, Louiqa Raschid and Patrick
Valduriez, "Scaling
Access to Heterogeneous Databases with DISCO," in IEEE
Transactions on Knowledge and Data Engineering, 10(5), 1998.
Abstract
Accessing many data sources aggravates problems for users of
heterogeneous distributed databases. Database administrators must
deal with fragile mediators, that is, mediators with schemas and
views that must be significantly changed to incorporate a new data
source. When implementing translators of queries from mediators to data
sources, database implementors must deal with data sources
that do not support all the functionality required by mediators.
Application programmers must deal with graceless failures for
unavailable data sources.
Queries simply return failure and no further information when
data sources are unavailable for query processing. The Distributed
Information Search COmponent (DISCO) addresses these problems. Data
modeling techniques manage the connections to data sources, and
sources can be added transparently to the users and applications. The
interface between mediators and data sources flexibly handles
different query languages and different data source functionality.
Query rewriting and optimization techniques rewrite queries so they
are efficiently evaluated by sources. Query processing and evaluation
semantics are developed to process queries over unavailable data
sources. In this article we describe (a) the distributed mediator
architecture of DISCO; (b) the data model and its modeling of data source
connections; (c) the interface to underlying data sources and the
query rewriting process; and (d) query processing semantics. We
describe several advantages of our system.
Anthony Tomasic, Luis Gravano, Calvin Lue, Peter Schwarz, and Laura
Haas, "Data Structures for
Efficient Broker Implementation," in
ACM Transactions on Information Systems, 15(3), July 1997.
Abstract
With the profusion of text databases on the Internet,
it is becoming increasingly hard to find the most useful databases for a
given query. To attack this problem, several existing and
proposed systems employ brokers to direct user queries, using a
local database of summary information about the available databases.
This summary information must effectively distinguish
relevant databases, and must be compact while allowing efficient access.
We offer evidence that one broker,
GlOSS, can be effective at locating databases
of interest even in a system of hundreds of databases, and
examine the performance of accessing the GlOSS summaries for two
promising storage methods: the grid file and partitioned hashing.
We show that both methods can be tuned to provide good performance for
a particular workload (within a broad range of workloads), and discuss
the tradeoffs between the two data structures.
As a side effect of our work, we show that grid
files are more broadly applicable than previously thought; in
particular, we show that by varying the policies used to construct the
grid file we can provide good performance for a wide range of
workloads even when storing highly skewed data.
Anthony Tomasic and Hector Garcia-Molina, "Performance Issues in
Distributed Shared-Nothing Information Retrieval Systems" in
Information Processing and Management, 32(6), pp. 647-665, 1996.
Abstract
Many information retrieval systems provides access to abstracts. For
example Stanford University, through its FOLIO system, provides access
to the INSPEC database of abstracts of the literature on physics,
computer science, electrical engineering, etc. In this article this
database is studied by using a trace-driven simulation. We focus on a
physical index design which accommodates truncations, inverted index
caching, and database scaling in a distributed shared-nothing system.
All three issues are shown to have a strong effect on response time
and throughput. Database scaling is explored in two ways. One way
assumes an ``optimal'' configuration for a single host and then
linearly scales the database by duplicating the host architecture as
needed. The second way determines the optimal number of hosts given a
fixed database size.
Anthony Tomasic and Hector Garcia-Molina, "Query Processing and
Inverted Indices in Distributed Text Document Retrieval Systems,"
in The VLDB Journal, 2(3), 1993.
Abstract
The performance of distributed text document retrieval systems is
strongly influenced by the organization of the inverted index. This
paper compares the performance impact on query processing of various
physical organizations for inverted lists. We present a new
probabilistic model of the database and queries. Simulation
experiments determine those variables that most strongly influence
response time and throughput. This leads to a set of design trade-offs
over a wide range of hardware configurations and new parallel query
processing strategies.
Conference Papers
Georges Gardarin, Antoine Mensch and Anthony Tomasic, "An Introduction to
the e-XML Data Integration Suite," in Proceedings of the 8th
International Conference on Extending Database Technology (EDBT),
2002.
Abstract
This paper describes the e-XML component suite, a modular product
for integrating heterogeneous data sources under an XML schema and querying
in real-time the integrated information using XQuery, the emerging W3C
standard for XML query. We describe the two main components of the suite,
i.e., the repository for warehousing XML and the mediator for distributed query
processing. We also discuss some typical applications.
George Mihaila, Louiqa Raschid and Anthony Tomasic, "Equal Time for Data
on the Internet with WebSemantics," in Proceedings of the 6th
International Conference on Extending Database Technology (EDBT),
1998.
Abstract
Many collections of scientific data in particular disciplines are
available today around the world. Much of this data conforms to some
agreed upon standard for data exchange, i.e., a standard schema and
its semantics. However, sharing this data among a global community of
users is still difficult because of a lack of standards for the
following necessary functions: (i) data providers need a standard for
describing or publishing available sources of data; (ii) data
administrators need a standard for discovering the published data and
(iii) users need a standard for accessing this discovered data. This
paper describes a prototype implementation of a system, WebSemantics,
that accomplishes the above tasks. We describe an architecture and
protocols for the publication, discovery and access to scientific
data. We define a language for discovering sources and querying the
data in these sources, and we provide a formal semantics for this
language.
Laurent Amsaleg, Michael Franklin, Anthony Tomasic and Tolga
Urhan, "Scrambling
Query Plans to Cope with Unexpected Delays," in Parallel and
Distributed Information Systems, 1996.
Abstract
Accessing numerous widely-distributed data sources poses significant
new challenges for query optimization and execution. Congestion or
failure in the network introduce highly-variable response times for
wide-area data access. This paper is an initial exploration of
solutions to this variability. We investigate a class of dynamic,
run-time query plan modification techniques that we call query
plan scrambling. We present an algorithm which modifies execution
plans on-the-fly in response to unexpected delays in data
access. The algorithm both reschedules operators and
introduces new operators into the plan. We present simulation
results
that show how our technique effectively hides delays in receiving
the initial requested tuples from remote data sources.
Anthony Tomasic, Louiqa Raschid and Patrick Valduriez, "Scaling
Heterogeneous Distributed Databases and the Design of DISCO," in
Proceedings of the 16th International Conference on Distributed Computing
Systems, Hong Kong, 1996.
Abstract
Access to large numbers of data sources introduces new problems for
users of heterogeneous distributed databases. End users and
application programmers must deal with unavailable data
sources. Database administrators must deal with incorporating new
sources into the model. Database implementors must deal with the
translation of queries between query languages and schemas. The
Distributed Information Search COmponent (Disco) addresses these
problems. Query processing semantics are developed to process queries
over data sources which do not return answers. Data modeling
techniques manage connections to data sources. The component interface
to data sources flexibly handles different query languages and
translates queries. This paper describes (a) the distributed mediator
architecture of Disco, (b) its query processing semantics, (c) the
data model and its modeling of data source connections, and (d) the
interface to underlying data sources.
Luis Gravano, Hector Garcia-Molina and Anthony Tomasic, "Precision
and Recall of GlOSS Estimators
for Database Discovery," in Proceedings of the Third
International Conference on Parallel and Distributed Information
Systems (PDIS), Austin, Texas, 1994.
Abstract
On-line information vendors offer access to multiple databases. In
addition, the advent of a variety of INTERNET tools has provided easy,
distributed access to many more databases. The result is thousands of
text databases from which a user may choose for a given information
need (a user query). This paper, an abridged version, presents a
framework for (and analyzes a solution to) this problem, which we call
the text-database discovery problem (see full version for a survey of
related work). Our solution to the text-database discovery problem is
to build a service that can suggest potentially good databases to
search. A user's query will go through two steps: first, the query is
presented to our server (dubbed GlOSS, for Glossary-Of-Servers Server)
to select a set of promising databases to search. During the second
step, the query is actually evaluated at the chosen databases. GlOSS
gives a hint of what databases might be useful for the user's query,
based on word-frequency information for each database. This
information indicates, for each database and each keyword in the
database vocabulary, how many documents at that database actually
contain the keyword, for each field designator (Sections 2 and 3). For
example, a Computer-Science library could report that ``Knuth''
(keyword) occurs as an author (field designator) in 180 documents, the
keyword ``computer,'' in the title of 25,548 documents, and so
on. This information is orders of magnitude smaller than a full index
since for each keyword field-designation pair we only need to keep its
frequency, not the identities of the documents that contain it. To
evaluate the set of databases that GlOSS returns for a given query,
Section 4 presents a framework based on the precision and recall
metrics of information-retrieval theory. In that theory, for a given
query q and a given set S of relevant documents for q, precision is
the fraction of documents in the answer to q that are in S, and recall
is the fraction of S in the answer to q. We borrow these notions to
define metrics for the text-database discovery problem: for a given
query q and a given set of ``relevant databases'' S, P is the fraction
of databases in the answer to q that are in S, and R is the fraction
of S in the answer to q. We further extend our framework by offering
different definitions for a ``relevant database'' (Section 4). We have
performed experiments using query traces from the FOLIO library
information-retrieval system at Stanford University, and involving six
databases available through FOLIO. As we will see, the results
obtained for different variants of GlOSS are very promising (Section
5). Even though GlOSS keeps a small amount of information about the
contents of the available databases, this information proved to be
sufficient to produce very useful hints on where to search.
Kurt Shoens, Anthony Tomasic and Hector Garcia-Molina, "Synthetic Workload
Performance Analysis of Incremental Updates," in Proceedings of
ACM Special Interest Group on Information Retrieval (SIGIR),
Dublin, Ireland, 1994.
Abstract
Declining disk and CPU costs have kindled a
renewed interest in efficient document indexing techniques.
In this paper, the problem of incremental updates of inverted
lists is addressed using a dual-structure index data structure
that dynamically separates long and short inverted lists and
optimizes the retrieval, update, and storage of each type of list.
The behavior of this index is studied with the use of a
synthetically-generated document collection and a simulation
model of the algorithm.
The index structure is shown to support rapid insertion of
documents, fast queries, and to scale well to large document
collections and many disks.
Anthony Tomasic, Hector Garcia-Molina and Kurt Shoens, "Incremental Updates
of Inverted Lists for Text Document Retrieval," in Proceedings
of ACM Special Interest Group on Management of Data (SIGMOD),
Minneapolis, MN, 1994.
Abstract
With the proliferation of the world's ``information highways'' a
renewed interest in efficient document indexing techniques has come
about. In this paper, the problem of incremental updates of inverted
lists is addressed using a new dual-structure index data structure.
The index dynamically separates long and short inverted lists and
optimizes the retrieval, update, and storage of each type of list. To
study the behavior of the index, a space of engineering trade-offs
which range from optimizing update time to optimizing query
performance is described. We quantitatively explore this space by
using actual data and hardware in combination with a simulation of an
information retrieval system. We then describe the best algorithm for
a variety of criteria.
Luis Gravano, Hector Garcia-Molina and Anthony Tomasic, "The Efficacy of
GlOSS for the Text Database Retrieval Problem," in
Proceedings of ACM Special Interest Group on Management of Data
(SIGMOD), Minneapolis, MN, 1994.
Abstract
The popularity of on-line document databases has led to a new problem:
finding which text databases (out of many candidate choices) are the
most relevant to a user. Identifying the relevant databases for a
given query is the text database discovery problem. The first part of
this paper presents a practical solution based on estimating the
result size of a query and a database. The method is termed
GlOSS--Glossary of Servers Server. The second part of this paper
evaluates the effectiveness of GlOSS based on a trace of real user
queries. In addition, we analyze the storage cost of our approach.
Anthony Tomasic and Hector Garcia-Molina, "Caching and Database
Scaling in Distributed Shared-Nothing Information Retrieval
Systems," in Proceedings of ACM Special Interest Group on
Management of Data (SIGMOD), Washington, D.C., 1993.
Abstract
A common class of existing information retrieval system provides
access to abstracts. For example Stanford University, through its
FOLIO system, provides access to the INSPEC database of abstracts of
the literature on physics, computer science, electrical engineering,
etc. In this paper this database is studied by using a trace-driven
simulation. We focus on physical index design, inverted index
caching, and database scaling in a distributed shared-nothing system.
All three issues are shown to have a strong effect on response time
and throughput. Database scaling is explored in two ways. One way
assumes an ``optimal'' configuration for a single host and then
linearly scales the database by duplicating the host architecture as
needed. The second way determines the optimal number of hosts given a
fixed database size.
Anthony Tomasic and Hector Garcia-Molina, "Performance
of Inverted Indices in Shared-Nothing Distributed Text Document
Information Retrieval Systems," in Proceedings of the Second
International Conference on Parallel and Distributed Information
Systems (PDIS), San Diego, 1993.
Abstract
The performance of distributed text document retrieval systems is
strongly influenced by the organization of the inverted index. This
paper compares the performance impact on query processing of various
physical organizations for inverted lists. We present a new
probabilistic model of the database and queries. Simulation
experiments determine which variables most strongly influence response
time and throughput. This leads to a set of design trade-offs over a
range of hardware configurations and new parallel query processing
strategies.
Anthony Tomasic, "View Update Translation via Deduction and
Annotation," in Proceedings of the Second International
Conference on Database Theory (ICDT), Bruges, Belgium, also as
Springer Verlag Lecture Notes in Computer Science 326, 1988.
Abstract
First steps are taken in examining the view update problem in
deductive databases. The class of recursive definite deductive
databases is examined. A view update is defined as a statement of
factual logical consequence of the deductive database. A translation
is a minimul update on the facts of a deductive database such that the
view update holds. The number of translations for a view update is
exponential in the size of the database. Algorithms for view updates
are presented and proven correct. They are based on
SLD-resolution and are independent of the computation rule.
Finally, as an example of a method for reducing the number of possible
translations of a view update, rule annotations are introduced. A
small number of unique annotations (proportional to the size of the
database) is shown to produce unique translations of view updates.
Workshop Papers
Philippe Bonnet and Anthony Tomasic, "Unavailable Data Sources
in Mediator Based Applications," in First International Workshop
on Practical Information Mediation and Brokering, and the Commerce of
Information on the Internet (I'MEDIAT'98), Tokyo, Japan, 1998.
Abstract
We discuss the problem of unavailable data sources in the context of
two mediator based applications. We discuss the limitations of
existing system with respect to this problem and describe a novel
evaluation model that overcomes these shortcomings.
Introduction
Mediator systems are being deployed in various environments to provide
query access to heterogeneous data sources. When processing a query,
the mediator may have difficulty accessing a data source (due to
network or server problems). In such cases the mediator is faced with
the problem of unavailable data sources. In this paper, we discuss
the problem of unavailable data sources in mediator based
applications. We first introduce two applications that we are
currently developing. The first application concerns a hospital
information system; a mediator accesses data sources located in the
different services to provide doctors with information on
patients. The second application concerns the access to documentary
repositories within a network of public and private institutions; a
mediator accesses the data sources located in each institution to
answer queries asked through a World Wide Web application. We detail
the characteristics of these applications in Section 2. We show that
these applications are representative of large classes of
applications. We then discuss, in Section 3, the impact of
unavailable data sources on the design of both applications. We
illustrate the limitations of classical mediator systems. We give in
Section 4 an overview of a novel sequential model of interaction which
fits the needs of both applications and overcomes some of the above
mentioned shortcomings. We review related work in Section 5. We
conclude and give directions for future work in Section 6.
Helan Galhardas, Eric Simon and Anthony Tomasic, "A Framework for
Classifying Scientific Metadata," Helena Galhardas, Eric Simon and
Anthony Tomasic, in Proceedings of the AAAI Workshop on Information
Integration, Fifteenth National Conference on Artificial Intelligence
(AAAI), Madison, Wisconsin, 1998.
Abstract
The scientific community, public organizations and administrations
have generated a large amount of data concerning the
environment. There is a need to allow sharing and exchange of this
type of information by various kinds of users including scientists,
decision-makers and public authorities. Metadata arises as the
solution to support these requirements. We present a formal framework
for classification of metadata that will give a uniform definition of
what metadata is, how it can be used and where it must be used. This
framework also provides a procedure for classifying elements of
existing metadata standards.
Philippe Bonnet and Anthony Tomasic, "Partial Answers for Unavailable Data Sources," in
Proceedings of the International Workshop on Flexible Query Answering
Systems (FQAS), Roskilde University, Denmark, 1998.
Abstract
Many heterogeneous database system products and prototypes exist
today; they will soon be deployed in a wide variety of
environments. Most existing systems suffer from an Achilles' heel:
they ungracefully fail in presence of unavailable data sources. If
some data sources are unavailable when accessed, these systems either
silently ignore them or generate an error. This behavior is improper
in environments where there is a non-negligible probability that data
sources cannot be accessed (e.g., Internet). In case some data sources
cannot be accessed when processing a query, the complete answer to
this query cannot be computed; some work can however be done with the
data sources that are available. In this paper, we propose a novel
approach where, in presence of unavailable data sources, the answer to
a query is a partial answer. A partial answer is a representation of
the work that has been done in case the complete answer to a query
cannot be computed, and of the work that remains to be done in order
to obtain this complete answer. The use of a partial answer is
twofold. First, it contains an incremental query that allows to obtain
the complete answer without redoing the work that has already been
done. Second, the application program can extract information from a
partial answer through the use of a secondary query, which we call a
parachute query. In this paper, we present a framework for partial
answers and we propose three algorithms for the evaluation of queries
in presence of unavailable sources, the construction of incremental
queries and the evaluation of parachute queries.
Philippe Bonnet and Anthony Tomasic, "Parachute Queries in the
Presence of Unavailable Data Sources", in Proceedings of Bases
de Donnees Advancees, BDA'98, Hammamet, Tunisia, October 1998.
Abstract
Mediator systems are used today in a wide variety of unreliable
environments. When processing a query, a mediator may try to access a
data source which is unavailable. In this situation, existing systems
either silently ignore unavailable data sources or generate an
error. This behavior is inefficient in environments with a
non-negligible probability that a data source is unavailable (e.g.,
the Internet). In the case that some data sources are unavailable, the
complete answer to a query cannot be obtained; however useful work can
be done with the available data sources. In this paper, we describe a
novel approach to mediator query processing where, in the presence of
unavailable data sources, the answer to a query is computed
incrementally. It is possible to access data obtained at intermediate
steps of the computation. We define two new evaluation models and
analytically model for these evaluation models the probability of
obtaining the answer to a query in the presence of unavailable data
sources. The analysis shows that complete answers are more likely in
our two evaluation models than in a classical system. We measure the
performance of our evaluation models via simulations and show that, in
the case that all data sources are available, the performance penalty
for our approach is negligible.
Anthony Tomasic, "Correct
View Update Translations via Containment," in Proceedings of
the Workshop on Deductive Database and Logic Programming, Second
International Conference on Logic Programming (ICLP), Santa
Margherita Ligure, Italy, also as Gesellschaft für Mathematik
und Datenverarbeitung, GMD-Studien Nr. 231, 1994.
Abstract
Given an intensional database (IDB) and an extension database (EDB),
the view update problem translates updates on the IDB into updates on
the EDB. One approach to the view update problem uses a translation
langauge to specify the meaning of a view update. In this paper we
prove properties of a translation language. This approach to the view
update problem studies the expressive power of the translation
language and the computational cost of demonstrating properties of a
translation. We use an active rule based database language for
specifying translations of view updates. This paper uses the
containment of one datalog program (or conjunctive query) by another
to demonstrate that a translation is semantically correct. We show
that the complexity of correctness is lower for insertion than
deletion. Finally, we discuss extensions to the translation language.
Invited Publications & Other Work
Anthony Tomasic, "XML/DBC:
A Standard API for Accesss to XML Repositories and Mediators,"
Invited Panel at 2nd Workshop on Data Integration over the Web
(DIWEB'02), Toronto, Canada, 2002.
Peter Fankhauser, Georges Gardarin, Mauricio Lopez, Jose Munoz, and
Anthony Tomasic, "Experiences in
Federated Databases: From IRO-DB to MIRO-Web," in Industrial
Track, Proceedings of the Twenty Fourth International Conference on
Very Large Databases (VLDB), New York, NY, 1998.
Catherine Houstis, Christos Nikolaou, Manolis Marazakis, Nicholas
Patrikalakis, Jakka Sairamesh, and Anthony Tomasic, "THETIS: Design of a Data Management and
Data Visualization System for Coastal Zone Management of the
Mediterranean Sea," in D-Lib Magazine, November, 1997.
Anthony Tomasic, Remy Amouroux, Philippe Bonnet, Olga Kapitskaia,
Hubert Naacke and Louiqa Raschid, "The Distributed Information Search
Component (Disco) and the World-Wide Web," Prototype Demonstration
Description in Proceedings of the 1997 ACM SIGMOD International
Conference on Management of Data, 1997.
Anthony Tomasic and Eric Simon, "Improving Access to
Environmental Information using Context Information," in SIGMOD
Record, March, 1997. Also appears in: 1st ERCIM Environmental
Modelling Group Workshop on Air Pollution Modelling, April 7-8,
1997, GMD FIRST Berlin, Germany.
Laurent Amsaleg, Philippe Bonnet, Michael
Franklin, Anthony Tomasic and Tolga Urhan, "Dynamic Query Execution Strategies for
Coping with Delays in Wide-Area Remote Access," in IEEE Data
Engineering Bulletin, 1997.
Anthony Tomasic and Hector Garcia-Molina, "Issues in Parallel
Information Retrieval," in IEEE Data Engineering Bulletin, 1994.
Anthony Tomasic, Distributed
Queries and Incremental Updates in
Information Retrieval Systems, Ph.D. Thesis, Princeton University, 1994.
Technical Reports, Technical Notes
Hubert Naacke, Georges Gardarin and Anthony Tomasic, "Leveraging
Mediator Cost Models with Heterogeneous Data Sources," INRIA
Technical Report RR-3143, 1997.
Olga Kapitskaia, Anthony Tomasic and Patrick Valduriez, "Dealing
with Discrepancies in Wrapper Functionality,"
INRIA Technical Report RR-3138, 1997.
George Mihaila, Louiqa Raschid and Anthony Tomasic, "Equal Time
for Data on the Internet with WebSemantics,"
INRIA Technical Report RR-3136, 1997.
Philippe Bonnet and Anthony Tomasic, "Partial
Answers for Unavailable Data Sources," INRIA Technical Report RR-3127, 1997.
Anthony Tomasic, Louiqa Raschid and Patrick Valduriez, "Scaling
Heterogeneous Databases and the Design of DISCO," INRIA Technical Report RR-2704, 1995.
Anthony Tomasic, Luis Gravano, Calvin Lue, Peter Schwarz and Laura
Haas, "Improving
Broker Performance with Multidimensional Data Structures," IBM
Technical Report Number RJ 9999, 1996.
Anthony Tomasic, Hector Garcia-Molina and Kurt Shoens, "Incremental
Updates of Inverted Lists for Text Document Retrieval,"
Stanford University Department of Computer Science Technical Note
Number STAN-CS-TN-93-1, 1993.
Anthony Tomasic and Hector Garcia-Molina, "Caching
and Database Scaling in Distributed
Shared-Nothing Information Retrieval Systems,"
Stanford
University Department of Computer Science Technical
Report Number STAN-CS-92-1456, 1992.
Anthony Tomasic and Hector Garcia-Molina, "Performance
of Inverted Indices in Distributed Text Document Retrieval
Systems," Stanford University Department of Computer Science
Technical Report Number STAN-CS-92-1434, 1992.
|