Bosch classixx tumble dryer not spinning

Open the dryer door and remove all fluff from the door. Pull out the lint filter. Remove the fluff from the trough of the fluff filter and make sure that no fluff falls into the open shaft. Open the…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




A Neural Implementation of NBSVM in Keras

NBSVM is an approach to text classification proposed by Wang and Manning¹ that takes a linear model such as SVM (or logistic regression) and infuses it with Bayesian probabilities by replacing word count features with Naive Bayes log-count ratios. Despite its simplicity, NBSVM models have been shown to be both fast and powerful across a wide range of different text classification datasets. In this article, we cover the following:

Let’s begin by importing some necessary modules.

In a binarized document-term matrix, each document is represented as a long one-hot-encoded vector with most entries being zero. While our neural model could be implemented to accept rows from this matrix as input, we choose to represent each document as a sequence of word IDs with some fixed length, maxlen, for use with an embedding layer. An embedding layer in a neural network acts as a lookup-mechanism that accepts a word ID as input and returns a vector (or scalar) representation of that word. These representations can either be learned or preset.

In our case, the embedding layer will return preset Naive Bayes log-count ratios for the words represented by word IDs in a document. A model accepting documents represented as sequences of word IDs trains much faster than one accepting rows from a term-document matrix. While these two architectures technically have the same number of parameters, the look-up mechanism of an embedding layer reduces the number of features (i.e., words) and parameters under consideration at any iteration. That is, documents represented as a fixed-size sequence of word IDs are much more compact and efficient than large one-hot encoded vector from a term-document matrix with binarized counts.

Here, we convert the document-term matrix to a list of word ID sequences.

The final data preparation step involves computing the Naive Bayes log-count ratios. This is more easily done using the original document-term matrix. These ratios capture the probability of a word appearing in a document in one class (i.e., positive) versus another (i.e., negative).

We are now ready to define our NBSVM model. Our model utilizes two embedding layers. The first, as mentioned above, stores the Naive Bayes log-count ratios. The second stores learned weights (or coefficients) for each feature (i.e., word) in this linear model. Our prediction, then, is simply the dot product of these two vectors.

This simple model achieves a 92.5% accuracy on the IMDb test set with only a few seconds of training on a Titan V GPU. In fact, the model trains within seconds even on a CPU. Interestingly, this accuracy is higher than the result reported in the original paper¹ (which was only 91.22% using bigram features).

This article was inspired by a tweet² from Jeremy Howard in September 2017.

¹ Sida Wang and Christopher D. Manning: Baselines and Bigrams: Simple, Good Sentiment and Topic Classification; ACL 2012.

Add a comment

Related posts:

Renewable Energy UK projects success points.

The price of energy has reached record levels and continues growing in the UK. How to deal with this problem? One of the solutions may be increasing the amount of renewable energy sources. But…

Dylan Enright Joins The Org as Head of Growth

The Org has added Dylan Enright to its rapidly expanding team as the Head of Growth. Dylan will work out the New York office, and report to CEO Christian Wylonis. In his role as Head of Growth, Dylan…

Android Apps version and vulnerability Detection

This is an automation module for finding the mobile app version from the android and then matching it with the CVE database to find out whether the app is vulnerable or not based on its version. This…