Main Article Content
Abstract
—Websites provide restrictive form-like interfaces which allow users to execute search queries on the underlying hidden databases. Hidden databases are widely prevalent on the web. This thesis considers the problem of estimating the size of a hidden database through its web interface. Due to limitations of a web interface, the number of returned tuples is usually restricted by a top-k constraint - when more than k tuples in the database match the specified condition, only k of them are preferentially selected by a ranking function and returned to the user. In this proposed system novel techniques which use a small number of queries to produce unbiased estimates with small variance. These techniques can also be used for approximate query processing over hidden databases.