# Book Review of “Boosting : Foundations and Algorithms”

Following is my review of Boosting : Foundations and Algorithms (by Robert E. Schapire and Yoav Freund) to appear in the  SIGACT book review column soon.

—————————————————————————————————————-

Book : Boosting : Foundations and Algorithms (by Robert E. Schapire and Yoav Freund)
Reviewer : Shiva Kintali

Introduction

You have k friends, each one earning a small amount of money (say 100 dollars) every month by buying and selling stocks. One fine evening, at a dinner conversation, they told you their individual “strategies” (after all, they are your friends). Is it possible to “combine” these individual strategies and make million dollars in an year, assuming your initial capital is same as your average friend ?

You are managing a group of k “diverse” software engineers each one with only an “above-average” intelligence. Is it possible to build a world-class product using their skills ?

The above scenarios give rise to fundamental theoretical questions in machine learning and form the basis of Boosting. As you may know, the goal of machine learning is to build systems that can adapt to their environments and learn from their experience. In the last five decades, machine learning has impacted almost every aspect of our life, for example, computer vision, speech processing, web-search, information retrieval, biology and so on. In fact, it is very hard to name an area that cannot benefit from the theoretical and practical insights of machine learning.

The answer to the above mentioned questions is Boosting, an elegant method for driving down the error of the combined classifier by combining a number of weak classifiers. In the last two decades, several variants of Boosting are discovered. All these algorithms come with a set of theoretical guarantees and made a deep practical impact on the advances of machine learning, often providing new explanations for existing prediction algorithms.

Boosting : Foundations and Algorithms, written by the inventors of Boosting, deals with variants of AdaBoost, an adaptive boosting method. Here is a quick explanation of the basic version of AdaBoost.

AdaBoost makes iterative calls to the base learner. It maintains a distribution over training examples to choose the training sets provided to the base learner on each round. Each training example is assigned a weight, a measure of importance of correctly classifying an example on the current round. Initially, all weights are set equally. On each round, the weights of incorrectly classified examples are increased so that, “hard” examples get successively higher weight. This forces the base learner to focus its attention on the hard example and drive down the generalization errors.

AdaBoost is fast and easy to implement and the only parameter to tune is the number of rounds. The actual performance of boosting is dependent on the data.

Summary

Chapter 1 provides a quick introduction and overview of Boosting algorithms with practical examples. The rest of the book is divided into four major parts. Each part is divided into 3 to 4 chapters.

Part I studies the properties and effectiveness of AdaBoost and theoretical aspects of minimizing its training and generalization errors. It is proved that AdaBoost drives the training error down very fast (as a function of the error rates of the weak classifiers) and the generalization error arbitrarily close to zero. Basic theoretical bounds on the generalization error show that AdaBoost overfits, however empirical studies show that AdaBoost does not overfit. To explain this paradox, a margin-based analysis is presented to explain the absence of overfitting.
Part II explains several properties of AdaBoost using game-theoretic interpretations. It is shown that the principles of Boosting are very intimately related to the classic min-max theorem of von Neumann. A two-player (the boosting algorithm and the weak learning algorithm) game is considered and it is shown that AdaBoost is a special case of a more general algorithm for playing a repeated game. By reversing the roles of the players, a solution is obtained for the online prediction model thus establishing a connection between Boosting and online learning. Loss minimization is studied and AdaBoost is interpreted as an abstract geometric framework for optimizing a particular objective function. More interestingly, AdaBoost is viewed as a special case of more general methods for optimization of an objective function such as coordinate descent and functional gradient descent.

Part III explains several methods of extending AdaBoost to handle classifiers with more than two output classes. AdaBoost.M1, AdaBoost.MH and AdaBoost.MO are presented along with their theoretical analysis and practical applications. RankBoost, an extension of AdaBoost to study ranking problems is studied. Such an algorithm is very useful, for example, to rank webpages based on their relevance to a given query.

Part IV is dedicated to advanced theoretical topics. Under certain assumptions, it is proved that AdaBoost can handle noisy-data and converge to the best possible classifier. An optimal boost-by-majority algorithm is presented. This algorithm is then modified to be adaptive leading to an algorithm called BrownBoost.

Many examples are given throughout the book to illustrate the empirical performance of the algorithms presented. Every chapter ends with Summary and Bibliography mentioning the related publications. There are well-designed exercises at the end of every chapter. Appendix briefly outlines some required mathematical background.

Opinion

Boosting book is definitely a very good reference text for researchers in the area of machine learning. If you are new to machine learning, I encourage you to read an introductory machine learning book (for example, Machine Learning by Tom M. Mitchell) to better understand and appreciate the concepts. In terms of being used in a course, a graduate-level machine learning course can be designed from the topics covered in this book. The exercises in the book can be readily used for such a course.

Overall this book is a stimulating learning experience. It has provided me new perspectives on theory and practice of several variants of Boosting algorithms. Most of the algorithms in this book are new to me and I had no difficulties following the algorithms and the corresponding theorems. The exercises at the end of every chapter made these topics much more fun to learn.

The authors did a very good job compiling different variants of Boosting algorithms and achieved a nice balance between theoretical analysis and practical examples. I highly recommend this book for anyone interested in machine learning.

—————————————————————————————————————-

# Recreational Math Books – Part II

In my previous post (see here) I mentioned some interesting puzzle books. In today’s post I will mention different type of recreational math books i.e., biographical books. Here are my top three books in this category. They are must-read books for anybody even remotely interested in mathematics.

There are about three mathematics : (1)  Andrew Wiles, whose determination to solve Fermat’s Last Theorem inspires future generations and gives a strong message that patience and focus are two of the most important assets that every mathematician should posses. (2) Paul Erdos, whose love for mathematics is so deep and prolific and (3) Srinivasa Ramanujan, whose story is different from any other mathematician ever.

This is one of the first “recreational” books I read. It starts with the history of Fermat’s last theorem (FLT), discusses the life style of early mathematicians and moves on to talk about Andrew Wiles’s 8 year long journey proving FLT. Watch this BBC documentary for a quick overview of Andrew Wiles’s story.

Paul Erdos is one of the greatest and most prolific mathematicians ever. The title of my blog is inspired by one of his famous sayings “My Brain is Open”. I don’t want to reveal any details of this book. You will enjoy this book more if you read it without knowing anything about Paul Erdos. I should warn you that there are some really tempting open problems in this book. When I first read this book (during my PhD days) I spent almost one full semester reading papers related to Twin Prime Conjecture and other number-theoretic problems. I also wrote a paper titled “A generalization of Erdos’s proof of Bertrand-Chebyshev’s theorem”. Watch this documentary “N is a number” for a quick overview of Paul Erdos’s story.

This is a very dense book. I bought it five years back and only recently finished reading it. This books covers lots of “topics” : south indian life-style, Hardy’s life, Ramanujan’s proofs and his flawed proofs, his journey to work with Hardy, his health struggles etc. It is definitely worth-reading to know the details of Ramanujan’s passion for mathematics.

—————————————————————————————————————————————

# Recreational Math Books – Part I

Most of us encounter math puzzles during high-school. If you are really obsessed with puzzles, actively searching and solving them, you will very soon run out of puzzles !! One day you will simply realize that you are not encountering any new puzzles. No more new puzzles. Poof. They are all gone. You feel like screaming “Give me a new puzzle“. This happened to me around the end of my undergrad days. During this phase of searching for puzzles, I encountered Graceful Tree Conjecture and realized that there are lots of long-standing open “puzzles”. I don’t scream anymore. Well… sometimes I do scream when my proofs collapse. But that’s a different kind of screaming.

Sometimes, I do try to create new puzzles. Most of the puzzles I create are either very trivial to solve (or) very hard and related to long-standing conjectures. Often it takes lots of effort and ingenuity to create a puzzle with right level of difficulty.

In today’s post, I want to point you to some of the basic puzzle books that everybody should read. So, the next time you see a kid screaming “Give me a new puzzle“, simply point him/her to these books. Hopefully they will stop screaming for sometime. If they comeback to you soon, point them to Graceful Tree Conjecture  :)

I will mention more recreational math books in part 2 of this blog post.

# Open Problems from Lovasz and Plummer’s Matching Theory Book

I always have exactly one bed-time mathematical book to read (for an hour) before going to sleep. It helps me learn new concepts and hopefully stumble upon interesting open problems. Matching Theory by Laszlo Lovasz and Michael D. Plummer has been my bed-time book for the last six months. I bought this book 3 years back (during my PhD days) but never got a chance to read it. This book often disappears from Amazon’s stock. I guess they are printing it on-demand.

If you are interested in learning the algorithmic and combinatorial foundations of Matching Theory (with a historic perspective), then this book is a must read. Today’s post is about the open problems mentioned in Matching Theory book. If you know the status (or progress) of these problems, please leave a comment.

—————————————————————————-

1 . Consistent Labeling and Maximum Flow

Conjecture (Fulkerson) : Any consistent labelling procedure results in a maximum flow in polynomial number of steps.

—————————————————————————-

2. Toughness and Hamiltonicity

The toughness of a graph $G$, $t(G)$ is defined to be $+\infty$, if $G = K_n$ and to be $min(|S|/c(G-S))$, if $G \neq K_n$. Here $c(G-S)$ is the number of components of $G-S$.

Conjecture (Chvatal 1973) : There exists a positive real number $t_0$ such that for every graph $G$, $t(G) \geq t_0$ implies $G$ is Hamiltonian.

—————————————————————————-

3. Perfect Matchings and Bipartite Graphs

Theorem : Let $X$ be a set, $X_1, \dots, X_t \subseteq X$ and suppose that $|X_i| \leq r$ for $i = 1, \dots, t$. Let $G$ be a bipartite graph such that

a) $X \subseteq V(G)$,

b) $G - X_i$ has a perfect matching , and

c) if any edge of $G$ is deleted, property (b) fails to hold in the resulting graph.

Then, the number of vertices in $G$ with degree $\geq 3$ is at most $r^3 {t \choose 3}$.

Conjecture : The conclusion of the above theorem holds for non-bipartite graphs as well.

—————————————————————————-

4. Number of Perfect Matchings

Conjecture (Schrijver and W.G.Valiant 1980) : Let $\Phi(n,k)$ denote the minimum number of perfect matchings a k-regular bipartite graph on 2n points can have. Then, $\lim_{n \to \infty} (\Phi(n,k))^{\frac{1}{n}} = \frac{(k-1)^{k-1}}{k^{k-2}}$.

—————————————————————————-

5. Elementary Graphs

Conjecture : For $k \geq 3$ there exist constants $c_1(k) > 1$ and $c_2(k) > 0$ such that every k-regular elementary graph on 2n vertices, without forbidden edges , contains at least $c_2(k){\cdot}c_1(k)^n$ perfect matchings. Furthermore $c_1(k) \to \infty$ as $k \to \infty$.

—————————————————————————-

6. Number of colorations

Conjecture (Schrijver’83) : Let G be a k-regular bipartite graph on 2n vertices. Then the number of colorings of the edges of G with k given colors is at least $(\frac{(k!)^2}{k^k})^n$.

—————————————————————————-

Theorem : A graph is perfect if and only if it does not contain, as an induced subgraph, an odd hole or an odd antihole.

—————————————————————————-

# Book Review of “Elements of Automata Theory”

During summer 2010 I started reading a book titled Elements of Automata Theory by Jacques Sakarovitch. It took me one year to read the book and submit my review to Bill Gasarch during summer 2011. It will be appearing in SIGACT book review column soon. I am posting my review here for the benefit of everybody.

———————————————————————————————————————————

Book : Elements of Automata Theory by Jacques Sakarovitch
Reviewer : Shiva Kintali

Introduction

During my undergrad I often found myself captivated by the beauty and depth of automata theory. I wanted to read one book on automata theory and say that I “know” automata theory. Couple of years later I realized that it is silly to expect such a book. The depth and breadth of automata theory cannot be covered by a single book.

My PhD thesis is heavily inspired by automata theory. I had to read several (fifty year old) papers and books related to automata theory to understand several fundamental theorems. Unfortunately, the concepts I wanted to learn are scattered in multiple books and old research papers, most of which are hard to find. When I noticed that Prof. Bill Gasarch is looking for a review of Elements of Automata Theory, I was very excited and volunteered to review it, mainly because I wanted to increase my knowledge about automata theory. Given my background in parsing technologies and research interests in space-bounded computation I wanted to read this book carefully. This book is around 750 pages long and it took me around one year to (approximately) read it. It is very close to my expectations of the one book on automata theory.

First impressions : Most of the books on automata theory start with the properties of regular languages, finite automata, pushdown automata, context-free languages, pumping lemmas, Chomsky hierarchy, decidability and conclude with NP-completeness and the P vs NP problem. This book is about “elements” of automata theory. It focuses only on finite automata over different mathematical structures. It studies pushdown automata only in the context of rational subsets in the free group. Yes, there is 750 pages worth literature studying only finite automata.

This book is aimed at people enthusiastic to know the subject rigorously and not intended as a textbook for automata theory course. It can also be used by advanced researchers as a desk reference. There is no prerequisite to follow this book, except for a reasonable mathematical maturity. It can be used as a self-study text. This book is a direct translation of its french original.

Summary

This book is divided into five major chapters. The first three chapters deal with the notions of rationality and recognizability. A family of languages are rationally closed if it is closed under rational operations (union, product and star). A language is reconizable if there exists a finite automata that recognizes it. The fourth and fifth chapters discuss rationality in relations. Chapter 0 acts as an appendix of several definitions of structures such as relations, monoids, semirings, matrices, graphs and concepts such as decidability. Following is a short summary of the five major chapters. There are several deep theorems (for example, Higman’s Theorem) studied in this book. I cannot list all of them here. The chapter summaries in the book have more details.

Chapter 1 essentially deals with the basic definitions and theorems required for any study of automata theory. It starts with the definitions of states, transitions, (deterministic and nondeterministic) automaton, transpose, ambiguity and basic operations such as union, cartesian product, star, quotient of a language. Rational operators, rational languages and rational expressions are defined and the relation between rationality and recognizability is established leading to the proof of Kleene’s theorem. String matching (i.e., finding a word in a text) is studied in detail as an illustrative example. Several theorems related to star height of languages are proved. A fundamental theorem stating that `the language accepted by a two-way automaton is rational’ is proved. The distinction between Moore and Mealy machines is introduced.

Chapter 2 deals with automata over the elements of an arbitrary monoid and the distinction between rational set and recognizable set in this context. This leads to a better understanding of Kleene’s theorem. The notion of morphism of automata is introduced and several properties of morphisms and factorisations are presented. Conway’s construction of universally minimal automaton is explained and the importance of well quasi-orderings is explained in detail. Based on these established concepts, McNaughton’s theorem (which states that the star height of a rational group language is computable) is studied with a new perspective.

Chapter 3 formalizes the notion of “weighted” automata that count the number of computations that make an element be accepted by an automaton, thus generalizing the previously introduced concepts in a new dimension. Languages are generalized to formal series and actions are generalized to representations. The concepts and theorems in this chapter makes the reader appreciate the deep connections of automata theory with several branches of mathematics. I personally enjoyed reading this chapter more than any other chapter in this book.

Chapter 4 builds an understanding of the relations realized by different finite automata in the order they are presented in chapters 1, 2 and 3. The Evaluation Theorem and the Composition Theorem play a central role in understanding this study. The decidability of the equivalence of transducers (with and without weigths) is studied. This chapter concludes with the study of deterministic and synchronous relations.

Chapter 5 studies the functions realized by finite automata. Deciding functionality, sequential functions, uniformisation of rational relations by rational functions, semi-monomial matrix representation, translations of a function and uniformly bounded functions are studied.

There are exercises (with solutions) at the end of every section of every chapter. These exercises are very carefully designed and aid towards better understanding of the corresponding concepts. First time readers are highly encouraged to solve (or at least glance through) these exercises. Every section ends with Notes and References mentioning the corresponding references and a brief historical summary of the chapter.

Opinion

Overall I found the book very enlightening. It has provided me new perspectives on several theorems that I assumed I understood completely. Most of the concepts in this book are new to me and I had no problems following the concepts and the corresponding theorems. The related exercises made these topics even more fun to learn. It was a joy for me to read this book and I recommend this book for anyone who is interested in automata theory (or more generally complexity theory) and wants to know the fundamental theorems of theory of computing. If you are a complexity theorist, it is worthwhile to look back at the foundations of theory of computing to better appreciate its beauty and history. This book is definitely unique in its approach and the topics chosen. Most of the topics covered are either available in very old papers or not accesible at all. I applaud the author for compiling these topics into a wonderful free-flowing text.

This book is nicely balanced between discussions of concepts and formal proofs. The writing is clear and the topics are organized very well from the most specific to the most general, making it a free-flowing text. On the other hand, it is very dense and requires lots of motivation and patience to read and understand the theorems. The author chose a rigorous way of explaining rationality and recognizability. Sometimes you might end up spending couple of hours to read just two pages. Such is the depth of the topics covered. Beginners might find this book too much to handle. I encourage beginners to read this book after taking an introductory automata theory course. This is definitely a very good reference text for researchers in the field of automata theory.

In terms of being used in a course, I can say that a graduate level course can be designed from a carefully chosen subset of the topics covered in this book. The exercises in the book can be readily used for such a course.

This is an expensive book, which is understandable based on the author’s efforts to cover several fundamental topics (along with exercises) in such a depth. If you think it is expensive, I would definitely suggest that you get one for your library.

———————————————————————————————————————————