Название: Foundations of Vector Retrieval Автор: Sebastian Bruch Издательство: Springer Год: 2024 Страниц: 196 Язык: английский Формат: pdf (true) Размер: 10.1 MB
This book presents the fundamentals of vector retrieval. To this end, it delves into important data structures and algorithms that have been successfully used to solve the vector retrieval problem efficiently and effectively.
This monograph is divided into four parts. The first part introduces the problem of vector retrieval and formalizes the concepts involved. The second part delves into retrieval algorithms that help solve the vector retrieval problem efficiently and effectively. It includes a chapter each on brand-and-bound algorithms, locality sensitive hashing, graph algorithms, clustering, and sampling. Part three is devoted to vector compression and comprises chapters on quantization and sketching. Finally, the fourth part presents a review of background material in a series of appendices, summarizing relevant concepts from probability, concentration inequalities, and linear algebra.
We are witness to a few years of remarkable developments in Artificial Intelligence (AI) with the use of advanced machine learning algorithms, and in particular, Deep Learning. Gargantuan, complex neural networks that can learn through self-supervision—and quickly so with the aid of specialized hardware—transformed the research landscape so dramatically that, overnight it seems, many fields experienced not the usual, incremental progress, but rather a leap forward. Machine translation, natural language understanding, information retrieval, recommender systems, and Computer Vision are but a few examples of research areas that have had to grapple with the shock. Countless other disciplines beyond Computer Science such as robotics, biology, and chemistry too have benefited from Deep Learning.
These neural networks and their training algorithms may be complex, and the scope of their impact broad and wide, but nonetheless they are simply functions in a high-dimensional space. A trained neural network takes a vector as input, crunches and transforms it in various ways, and produces another vector, often in some other space. An image may thereby be turned into a vector, a song into a sequence of vectors, and a social network as a structured collection of vectors. It seems as though much of human knowledge, or at least what is expressed as text, audio, image, and video, has a vector representation in one form or another.
It should be noted that representing data as vectors is not unique to neural networks and Deep Learning. In fact, long before learnt vector representations of pieces of data—what is commonly known as “embeddings”—came along, data was often encoded as hand-crafted feature vectors. Each feature quantified into continuous or discrete values some facet of the data that was deemed relevant to a particular task (such as classification or regression). Vectors of that form, too, reflect our understanding of a real-world object or concept.
The book emphasizes the theoretical aspects of algorithms and presents related theorems and proofs. It is thus mainly written for researchers and graduate students in theoretical Computer Science and database and information systems who want to learn about the theoretical foundations of vector retrieval.