Home
Company
Company > Mission
Company > Team
Products
Products > Repository 2.0
Products > Mediator 2.0
Products > XMLizer 2.0
Products > XFE
Solutions
Solutions > Consulting
News
References
Information
Contact Us
Links

Anthony Tomasic is the US CEO and co-founder of E-XMLMedia, a XML-based, enterprise scale, software vendor start-up located in Pittsburgh, PA and Paris, France. E-XMLMedia was a winner in the French National Competition for New Ventures in 1999. Before starting E-XMLMedia, in reverse chronology, Anthony was Chief Technical Officer of Common Object, Inc., a next-generation banner advertising application service provider. For two years, he was an engineer for Digital Integrity, Inc., a digital intellectual property software start-up. For four years, he was an invited researcher at Dyade, a research and development consortium established by Institute National de Researche en Informatique et en Automatic (INRIA) and the Group Bull. He was scientific director for the team of students and engineers that built Disco, a distributed heterogeneous database system. During its existance, a paper describing Disco was the most cited data warehousing paper according to NEC CiteSeer Research Index. In 1994 he was a post-doctoral researcher at the IBM Almaden Research Center. From 1991 to 1994 he was research staff for the Database Group at Stanford University. At Stanford he completed his Ph.D. work on the performance of distributed information retrieval search engines. From Princeton University, he received in 1993 an MS and in 1994 a Ph.D. from the Department of Computer Science. From 1983 to 1989 he was a researcher at the European Computer-Industry Research Centre (ECRC). He received a BS from the Department of Computer Science of Indiana University, in 1983.

Articles

George Mihaila, Louiqa Raschid and Anthony Tomasic, "Locating and Accessing Data Repositories with WebSemantics," Technical Note, The VLDB Journal, 11(1), 2002.

Abstract

Many collections of scientific data in particular disciplines are available today on the World Wide Web. Most of these data sources are compliant with some standard for interoperable access. In addition, sources may support a common semantics, i.e., a shared meaning for the data types and their domains. However, sharing data among a global community of users is still difficult because of the following reasons: (i) data provides need a mechanism for describing and publishing available sources of data; (ii) data administrators need a mechanism for discovering the location of published sources and obtaining metadata from these sources; and (iii) users need a mechanism for browsing and selecting sources. This paper describes a system, WebSemantics, that accomplishes the above tasks. We describe an architecture for the publication and discovery of scientific data sources that is an extension of the World Wide Web architecture and protocols. We support catalogs containing metadata about data sources for some application domain. We define a language for discovering sources and querying their metadata. We then describe the WebSemantics prototype.

  • Hubert Naacke, Anthony Tomasic, and Patrick Valduriez, "Validating Mediator Cost Models with Disco," in Networking and Information Systems Journal, 2(5), 2000.

    Abstract

    Disco is a mediator system developed at INRIA for accessing heterogeneous data sources over the Internet. In Disco, mediators accept queries from users, process them with respect to wrappers, and return answers. Wrapper provide access to underlying sources. To efficiently process queries, the mediator performs cost-based query optimization. In a heterogeneous distributed database, cost-estimate based query optimization is difficult to achieve because the underlying data sources do not export cost information. Disco's approach relies on combining a generic cost model with specific cost information exported by wrappers. In this paper, we propose a validation of Disco's cost model based on experimentation with real Web data sources. This validation shows the efficiency of our generic cost model as well as the efficiency of more specialized cost functions.

  • Luis Gravano, Hector Garcia-Molina and Anthony Tomasic, "GlOSS: Text-Source Discovery over the Internet," in ACM Transactions on Database Systems, 24(2), 1999.

    Abstract

    The dramatic growth of the Internet has created a new problem for users: the location of relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Then, users present queries to the service, which returns an ordered list of promising text sources. This article describes GlOSS -- Glossary of Servers Server --, with two versions: bGlOSS, which provides a Boolean query retrieval model, and vGlOSS, which provides a vector-space retrieval model. We also present hGlOSS, which provides a decentralized version of the system. We extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimental evidence, based on actual data, that all three systems are highly effective at determining promising text sources for a given query.

  • Anthony Tomasic, Louiqa Raschid and Patrick Valduriez, "Scaling Access to Heterogeneous Databases with DISCO," in IEEE Transactions on Knowledge and Data Engineering, 10(5), 1998.

    Abstract

    Accessing many data sources aggravates problems for users of heterogeneous distributed databases. Database administrators must deal with fragile mediators, that is, mediators with schemas and views that must be significantly changed to incorporate a new data source. When implementing translators of queries from mediators to data sources, database implementors must deal with data sources that do not support all the functionality required by mediators. Application programmers must deal with graceless failures for unavailable data sources. Queries simply return failure and no further information when data sources are unavailable for query processing. The Distributed Information Search COmponent (DISCO) addresses these problems. Data modeling techniques manage the connections to data sources, and sources can be added transparently to the users and applications. The interface between mediators and data sources flexibly handles different query languages and different data source functionality. Query rewriting and optimization techniques rewrite queries so they are efficiently evaluated by sources. Query processing and evaluation semantics are developed to process queries over unavailable data sources. In this article we describe (a) the distributed mediator architecture of DISCO; (b) the data model and its modeling of data source connections; (c) the interface to underlying data sources and the query rewriting process; and (d) query processing semantics. We describe several advantages of our system.

  • Anthony Tomasic, Luis Gravano, Calvin Lue, Peter Schwarz, and Laura Haas, "Data Structures for Efficient Broker Implementation," in ACM Transactions on Information Systems, 15(3), July 1997.

    Abstract

    With the profusion of text databases on the Internet, it is becoming increasingly hard to find the most useful databases for a given query. To attack this problem, several existing and proposed systems employ brokers to direct user queries, using a local database of summary information about the available databases. This summary information must effectively distinguish relevant databases, and must be compact while allowing efficient access. We offer evidence that one broker, GlOSS, can be effective at locating databases of interest even in a system of hundreds of databases, and examine the performance of accessing the GlOSS summaries for two promising storage methods: the grid file and partitioned hashing. We show that both methods can be tuned to provide good performance for a particular workload (within a broad range of workloads), and discuss the tradeoffs between the two data structures. As a side effect of our work, we show that grid files are more broadly applicable than previously thought; in particular, we show that by varying the policies used to construct the grid file we can provide good performance for a wide range of workloads even when storing highly skewed data.

  • Anthony Tomasic and Hector Garcia-Molina, "Performance Issues in Distributed Shared-Nothing Information Retrieval Systems" in Information Processing and Management, 32(6), pp. 647-665, 1996.

    Abstract

    Many information retrieval systems provides access to abstracts. For example Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this article this database is studied by using a trace-driven simulation. We focus on a physical index design which accommodates truncations, inverted index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an ``optimal'' configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.

  • Anthony Tomasic and Hector Garcia-Molina, "Query Processing and Inverted Indices in Distributed Text Document Retrieval Systems," in The VLDB Journal, 2(3), 1993.

    Abstract

    The performance of distributed text document retrieval systems is strongly influenced by the organization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine those variables that most strongly influence response time and throughput. This leads to a set of design trade-offs over a wide range of hardware configurations and new parallel query processing strategies.

  • Conference Papers

  • Georges Gardarin, Antoine Mensch and Anthony Tomasic, "An Introduction to the e-XML Data Integration Suite," in Proceedings of the 8th International Conference on Extending Database Technology (EDBT), 2002.

    Abstract

    This paper describes the e-XML component suite, a modular product for integrating heterogeneous data sources under an XML schema and querying in real-time the integrated information using XQuery, the emerging W3C standard for XML query. We describe the two main components of the suite, i.e., the repository for warehousing XML and the mediator for distributed query processing. We also discuss some typical applications.

  • George Mihaila, Louiqa Raschid and Anthony Tomasic, "Equal Time for Data on the Internet with WebSemantics," in Proceedings of the 6th International Conference on Extending Database Technology (EDBT), 1998.

    Abstract

    Many collections of scientific data in particular disciplines are available today around the world. Much of this data conforms to some agreed upon standard for data exchange, i.e., a standard schema and its semantics. However, sharing this data among a global community of users is still difficult because of a lack of standards for the following necessary functions: (i) data providers need a standard for describing or publishing available sources of data; (ii) data administrators need a standard for discovering the published data and (iii) users need a standard for accessing this discovered data. This paper describes a prototype implementation of a system, WebSemantics, that accomplishes the above tasks. We describe an architecture and protocols for the publication, discovery and access to scientific data. We define a language for discovering sources and querying the data in these sources, and we provide a formal semantics for this language.

  • Laurent Amsaleg, Michael Franklin, Anthony Tomasic and Tolga Urhan, "Scrambling Query Plans to Cope with Unexpected Delays," in Parallel and Distributed Information Systems, 1996.

    Abstract

    Accessing numerous widely-distributed data sources poses significant new challenges for query optimization and execution. Congestion or failure in the network introduce highly-variable response times for wide-area data access. This paper is an initial exploration of solutions to this variability. We investigate a class of dynamic, run-time query plan modification techniques that we call query plan scrambling. We present an algorithm which modifies execution plans on-the-fly in response to unexpected delays in data access. The algorithm both reschedules operators and introduces new operators into the plan. We present simulation results that show how our technique effectively hides delays in receiving the initial requested tuples from remote data sources.

  • Anthony Tomasic, Louiqa Raschid and Patrick Valduriez, "Scaling Heterogeneous Distributed Databases and the Design of DISCO," in Proceedings of the 16th International Conference on Distributed Computing Systems, Hong Kong, 1996.

    Abstract

    Access to large numbers of data sources introduces new problems for users of heterogeneous distributed databases. End users and application programmers must deal with unavailable data sources. Database administrators must deal with incorporating new sources into the model. Database implementors must deal with the translation of queries between query languages and schemas. The Distributed Information Search COmponent (Disco) addresses these problems. Query processing semantics are developed to process queries over data sources which do not return answers. Data modeling techniques manage connections to data sources. The component interface to data sources flexibly handles different query languages and translates queries. This paper describes (a) the distributed mediator architecture of Disco, (b) its query processing semantics, (c) the data model and its modeling of data source connections, and (d) the interface to underlying data sources.

  • Luis Gravano, Hector Garcia-Molina and Anthony Tomasic, "Precision and Recall of GlOSS Estimators for Database Discovery," in Proceedings of the Third International Conference on Parallel and Distributed Information Systems (PDIS), Austin, Texas, 1994.

    Abstract

    On-line information vendors offer access to multiple databases. In addition, the advent of a variety of INTERNET tools has provided easy, distributed access to many more databases. The result is thousands of text databases from which a user may choose for a given information need (a user query). This paper, an abridged version, presents a framework for (and analyzes a solution to) this problem, which we call the text-database discovery problem (see full version for a survey of related work). Our solution to the text-database discovery problem is to build a service that can suggest potentially good databases to search. A user's query will go through two steps: first, the query is presented to our server (dubbed GlOSS, for Glossary-Of-Servers Server) to select a set of promising databases to search. During the second step, the query is actually evaluated at the chosen databases. GlOSS gives a hint of what databases might be useful for the user's query, based on word-frequency information for each database. This information indicates, for each database and each keyword in the database vocabulary, how many documents at that database actually contain the keyword, for each field designator (Sections 2 and 3). For example, a Computer-Science library could report that ``Knuth'' (keyword) occurs as an author (field designator) in 180 documents, the keyword ``computer,'' in the title of 25,548 documents, and so on. This information is orders of magnitude smaller than a full index since for each keyword field-designation pair we only need to keep its frequency, not the identities of the documents that contain it. To evaluate the set of databases that GlOSS returns for a given query, Section 4 presents a framework based on the precision and recall metrics of information-retrieval theory. In that theory, for a given query q and a given set S of relevant documents for q, precision is the fraction of documents in the answer to q that are in S, and recall is the fraction of S in the answer to q. We borrow these notions to define metrics for the text-database discovery problem: for a given query q and a given set of ``relevant databases'' S, P is the fraction of databases in the answer to q that are in S, and R is the fraction of S in the answer to q. We further extend our framework by offering different definitions for a ``relevant database'' (Section 4). We have performed experiments using query traces from the FOLIO library information-retrieval system at Stanford University, and involving six databases available through FOLIO. As we will see, the results obtained for different variants of GlOSS are very promising (Section 5). Even though GlOSS keeps a small amount of information about the contents of the available databases, this information proved to be sufficient to produce very useful hints on where to search.

  • Kurt Shoens, Anthony Tomasic and Hector Garcia-Molina, "Synthetic Workload Performance Analysis of Incremental Updates," in Proceedings of ACM Special Interest Group on Information Retrieval (SIGIR), Dublin, Ireland, 1994.

    Abstract

    Declining disk and CPU costs have kindled a renewed interest in efficient document indexing techniques. In this paper, the problem of incremental updates of inverted lists is addressed using a dual-structure index data structure that dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. The behavior of this index is studied with the use of a synthetically-generated document collection and a simulation model of the algorithm. The index structure is shown to support rapid insertion of documents, fast queries, and to scale well to large document collections and many disks.

  • Anthony Tomasic, Hector Garcia-Molina and Kurt Shoens, "Incremental Updates of Inverted Lists for Text Document Retrieval," in Proceedings of ACM Special Interest Group on Management of Data (SIGMOD), Minneapolis, MN, 1994.

    Abstract

    With the proliferation of the world's ``information highways'' a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index data structure. The index dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.

  • Luis Gravano, Hector Garcia-Molina and Anthony Tomasic, "The Efficacy of GlOSS for the Text Database Retrieval Problem," in Proceedings of ACM Special Interest Group on Management of Data (SIGMOD), Minneapolis, MN, 1994.

    Abstract

    The popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the text database discovery problem. The first part of this paper presents a practical solution based on estimating the result size of a query and a database. The method is termed GlOSS--Glossary of Servers Server. The second part of this paper evaluates the effectiveness of GlOSS based on a trace of real user queries. In addition, we analyze the storage cost of our approach.

  • Anthony Tomasic and Hector Garcia-Molina, "Caching and Database Scaling in Distributed Shared-Nothing Information Retrieval Systems," in Proceedings of ACM Special Interest Group on Management of Data (SIGMOD), Washington, D.C., 1993.

    Abstract

    A common class of existing information retrieval system provides access to abstracts. For example Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this paper this database is studied by using a trace-driven simulation. We focus on physical index design, inverted index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an ``optimal'' configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.

  • Anthony Tomasic and Hector Garcia-Molina, "Performance of Inverted Indices in Shared-Nothing Distributed Text Document Information Retrieval Systems," in Proceedings of the Second International Conference on Parallel and Distributed Information Systems (PDIS), San Diego, 1993.

    Abstract

    The performance of distributed text document retrieval systems is strongly influenced by the organization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine which variables most strongly influence response time and throughput. This leads to a set of design trade-offs over a range of hardware configurations and new parallel query processing strategies.

  • Anthony Tomasic, "View Update Translation via Deduction and Annotation," in Proceedings of the Second International Conference on Database Theory (ICDT), Bruges, Belgium, also as Springer Verlag Lecture Notes in Computer Science 326, 1988.

    Abstract

    First steps are taken in examining the view update problem in deductive databases. The class of recursive definite deductive databases is examined. A view update is defined as a statement of factual logical consequence of the deductive database. A translation is a minimul update on the facts of a deductive database such that the view update holds. The number of translations for a view update is exponential in the size of the database. Algorithms for view updates are presented and proven correct. They are based on SLD-resolution and are independent of the computation rule. Finally, as an example of a method for reducing the number of possible translations of a view update, rule annotations are introduced. A small number of unique annotations (proportional to the size of the database) is shown to produce unique translations of view updates.

  • Workshop Papers

  • Philippe Bonnet and Anthony Tomasic, "Unavailable Data Sources in Mediator Based Applications," in First International Workshop on Practical Information Mediation and Brokering, and the Commerce of Information on the Internet (I'MEDIAT'98), Tokyo, Japan, 1998.

    Abstract

    We discuss the problem of unavailable data sources in the context of two mediator based applications. We discuss the limitations of existing system with respect to this problem and describe a novel evaluation model that overcomes these shortcomings.

    Introduction

    Mediator systems are being deployed in various environments to provide query access to heterogeneous data sources. When processing a query, the mediator may have difficulty accessing a data source (due to network or server problems). In such cases the mediator is faced with the problem of unavailable data sources. In this paper, we discuss the problem of unavailable data sources in mediator based applications. We first introduce two applications that we are currently developing. The first application concerns a hospital information system; a mediator accesses data sources located in the different services to provide doctors with information on patients. The second application concerns the access to documentary repositories within a network of public and private institutions; a mediator accesses the data sources located in each institution to answer queries asked through a World Wide Web application. We detail the characteristics of these applications in Section 2. We show that these applications are representative of large classes of applications. We then discuss, in Section 3, the impact of unavailable data sources on the design of both applications. We illustrate the limitations of classical mediator systems. We give in Section 4 an overview of a novel sequential model of interaction which fits the needs of both applications and overcomes some of the above mentioned shortcomings. We review related work in Section 5. We conclude and give directions for future work in Section 6.

  • Helan Galhardas, Eric Simon and Anthony Tomasic, "A Framework for Classifying Scientific Metadata," Helena Galhardas, Eric Simon and Anthony Tomasic, in Proceedings of the AAAI Workshop on Information Integration, Fifteenth National Conference on Artificial Intelligence (AAAI), Madison, Wisconsin, 1998.

    Abstract The scientific community, public organizations and administrations have generated a large amount of data concerning the environment. There is a need to allow sharing and exchange of this type of information by various kinds of users including scientists, decision-makers and public authorities. Metadata arises as the solution to support these requirements. We present a formal framework for classification of metadata that will give a uniform definition of what metadata is, how it can be used and where it must be used. This framework also provides a procedure for classifying elements of existing metadata standards.

  • Philippe Bonnet and Anthony Tomasic, "Partial Answers for Unavailable Data Sources," in Proceedings of the International Workshop on Flexible Query Answering Systems (FQAS), Roskilde University, Denmark, 1998.

    Abstract

    Many heterogeneous database system products and prototypes exist today; they will soon be deployed in a wide variety of environments. Most existing systems suffer from an Achilles' heel: they ungracefully fail in presence of unavailable data sources. If some data sources are unavailable when accessed, these systems either silently ignore them or generate an error. This behavior is improper in environments where there is a non-negligible probability that data sources cannot be accessed (e.g., Internet). In case some data sources cannot be accessed when processing a query, the complete answer to this query cannot be computed; some work can however be done with the data sources that are available. In this paper, we propose a novel approach where, in presence of unavailable data sources, the answer to a query is a partial answer. A partial answer is a representation of the work that has been done in case the complete answer to a query cannot be computed, and of the work that remains to be done in order to obtain this complete answer. The use of a partial answer is twofold. First, it contains an incremental query that allows to obtain the complete answer without redoing the work that has already been done. Second, the application program can extract information from a partial answer through the use of a secondary query, which we call a parachute query. In this paper, we present a framework for partial answers and we propose three algorithms for the evaluation of queries in presence of unavailable sources, the construction of incremental queries and the evaluation of parachute queries.

  • Philippe Bonnet and Anthony Tomasic, "Parachute Queries in the Presence of Unavailable Data Sources", in Proceedings of Bases de Donnees Advancees, BDA'98, Hammamet, Tunisia, October 1998.

    Abstract

    Mediator systems are used today in a wide variety of unreliable environments. When processing a query, a mediator may try to access a data source which is unavailable. In this situation, existing systems either silently ignore unavailable data sources or generate an error. This behavior is inefficient in environments with a non-negligible probability that a data source is unavailable (e.g., the Internet). In the case that some data sources are unavailable, the complete answer to a query cannot be obtained; however useful work can be done with the available data sources. In this paper, we describe a novel approach to mediator query processing where, in the presence of unavailable data sources, the answer to a query is computed incrementally. It is possible to access data obtained at intermediate steps of the computation. We define two new evaluation models and analytically model for these evaluation models the probability of obtaining the answer to a query in the presence of unavailable data sources. The analysis shows that complete answers are more likely in our two evaluation models than in a classical system. We measure the performance of our evaluation models via simulations and show that, in the case that all data sources are available, the performance penalty for our approach is negligible.

  • Anthony Tomasic, "Correct View Update Translations via Containment," in Proceedings of the Workshop on Deductive Database and Logic Programming, Second International Conference on Logic Programming (ICLP), Santa Margherita Ligure, Italy, also as Gesellschaft für Mathematik und Datenverarbeitung, GMD-Studien Nr. 231, 1994.

    Abstract

    Given an intensional database (IDB) and an extension database (EDB), the view update problem translates updates on the IDB into updates on the EDB. One approach to the view update problem uses a translation langauge to specify the meaning of a view update. In this paper we prove properties of a translation language. This approach to the view update problem studies the expressive power of the translation language and the computational cost of demonstrating properties of a translation. We use an active rule based database language for specifying translations of view updates. This paper uses the containment of one datalog program (or conjunctive query) by another to demonstrate that a translation is semantically correct. We show that the complexity of correctness is lower for insertion than deletion. Finally, we discuss extensions to the translation language.

  • Invited Publications & Other Work

  • Anthony Tomasic, "XML/DBC: A Standard API for Accesss to XML Repositories and Mediators," Invited Panel at 2nd Workshop on Data Integration over the Web (DIWEB'02), Toronto, Canada, 2002.

  • Peter Fankhauser, Georges Gardarin, Mauricio Lopez, Jose Munoz, and Anthony Tomasic, "Experiences in Federated Databases: From IRO-DB to MIRO-Web," in Industrial Track, Proceedings of the Twenty Fourth International Conference on Very Large Databases (VLDB), New York, NY, 1998.

  • Catherine Houstis, Christos Nikolaou, Manolis Marazakis, Nicholas Patrikalakis, Jakka Sairamesh, and Anthony Tomasic, "THETIS: Design of a Data Management and Data Visualization System for Coastal Zone Management of the Mediterranean Sea," in D-Lib Magazine, November, 1997.

  • Anthony Tomasic, Remy Amouroux, Philippe Bonnet, Olga Kapitskaia, Hubert Naacke and Louiqa Raschid, "The Distributed Information Search Component (Disco) and the World-Wide Web," Prototype Demonstration Description in Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, 1997.

  • Anthony Tomasic and Eric Simon, "Improving Access to Environmental Information using Context Information," in SIGMOD Record, March, 1997. Also appears in: 1st ERCIM Environmental Modelling Group Workshop on Air Pollution Modelling, April 7-8, 1997, GMD FIRST Berlin, Germany.

  • Laurent Amsaleg, Philippe Bonnet, Michael Franklin, Anthony Tomasic and Tolga Urhan, "Dynamic Query Execution Strategies for Coping with Delays in Wide-Area Remote Access," in IEEE Data Engineering Bulletin, 1997.

  • Anthony Tomasic and Hector Garcia-Molina, "Issues in Parallel Information Retrieval," in IEEE Data Engineering Bulletin, 1994.

  • Anthony Tomasic, Distributed Queries and Incremental Updates in Information Retrieval Systems, Ph.D. Thesis, Princeton University, 1994.
  • Technical Reports, Technical Notes

  • Hubert Naacke, Georges Gardarin and Anthony Tomasic, "Leveraging Mediator Cost Models with Heterogeneous Data Sources," INRIA Technical Report RR-3143, 1997.

  • Olga Kapitskaia, Anthony Tomasic and Patrick Valduriez, "Dealing with Discrepancies in Wrapper Functionality," INRIA Technical Report RR-3138, 1997.

  • George Mihaila, Louiqa Raschid and Anthony Tomasic, "Equal Time for Data on the Internet with WebSemantics," INRIA Technical Report RR-3136, 1997.

  • Philippe Bonnet and Anthony Tomasic, "Partial Answers for Unavailable Data Sources," INRIA Technical Report RR-3127, 1997.

  • Anthony Tomasic, Louiqa Raschid and Patrick Valduriez, "Scaling Heterogeneous Databases and the Design of DISCO," INRIA Technical Report RR-2704, 1995.

  • Anthony Tomasic, Luis Gravano, Calvin Lue, Peter Schwarz and Laura Haas, "Improving Broker Performance with Multidimensional Data Structures," IBM Technical Report Number RJ 9999, 1996.

  • Anthony Tomasic, Hector Garcia-Molina and Kurt Shoens, "Incremental Updates of Inverted Lists for Text Document Retrieval," Stanford University Department of Computer Science Technical Note Number STAN-CS-TN-93-1, 1993.

  • Anthony Tomasic and Hector Garcia-Molina, "Caching and Database Scaling in Distributed Shared-Nothing Information Retrieval Systems," Stanford University Department of Computer Science Technical Report Number STAN-CS-92-1456, 1992.

  • Anthony Tomasic and Hector Garcia-Molina, "Performance of Inverted Indices in Distributed Text Document Retrieval Systems," Stanford University Department of Computer Science Technical Report Number STAN-CS-92-1434, 1992.

  • Home : Company : Products : Solutions : News : References : Information : Contact Us : Links
    Site Design by e-PowerApps.com