This post aims to illustrate use of TensorFlow framework for implementing a simple Matrix Factorization (MF). MF is one of the widely used recommender systems that is especially exploited when we have access to tons of user explicit or implicit feedbacks. For details about matrix factorization and collaborative system refer to this paper.

For the purpose of this post we explore a simple movie recommendation by using the data from

\[\hat{r} = \vec{u}^t\vec{p}\] We define these two latent variables in TensorFlow as follows:

With regularization, the final cost function is as follows:

Our goal is to minimize the above cost function. One of the popular ways is by using stochastic gradient decent (SGD). SGD has a hyper parameter called

for i in xrange(500):

sess.run(training_step)

First lets look at some of the rating predictions from the training data. For example let's look at the first entry of the training set and compare the actual rating with rating from our MF prediction:

To calculate the accuracy we apply the MF model on the test set and calculate the average prediction error for all pairs of (user, item) in the test set.

For the purpose of this post we explore a simple movie recommendation by using the data from

*MovieLens*. The dataset is downloaded from MovieLens 100K Dataset. It contains of 100,000 ratings (1-5) from 943 users on 1682 movies. The*u.date*represents the rating in sparse format of (userid, movieid, rating, timestamp).### Load the data

import pandas as pd df = pd.read_csv('u.data', sep='\t', names=['user','item','rate', 'time']) msk = numpy.random.rand(len(df)) < 0.7 df_train = df[msk] user_indecies = [x-1 for x in df_train.user.values] item_indecies = [x-1 for x in df_train.item.values] rates = df_train.rate.valuesWe use 70% of the data for training. The rest will be used later for testing and accuracy calculation. The last two lines is for shifting the Ids to 0 (from 1).

### Variables

In matrix factorization user rating $r$ is formulated as an inner product of two latent vectors $\vec{u}$ and $\vec{p}$ which are two latent vectors in same space to represent the*user interest*and*movie feature*respectively.\[\hat{r} = \vec{u}^t\vec{p}\] We define these two latent variables in TensorFlow as follows:

**U**that shows latent representation of user interest and**P**that represents the latent values for items.feature_len = 10 U = tf.Variable(initial_value=tf.truncated_normal([943,feature_len]), name='users') P = tf.Variable(initial_value=tf.truncated_normal([feature_len,1682]), name='items')As mentioned before, all the ratings can be calculated as the product of these two variables

result = tf.matmul(U, P)

**result**is a 943x1682 matrix where its*(i,j)*entry shows the rating of user*i*for item*j*. We first make this matrix a flat vector and then we lookup (gather) the indices that we have the training data for.result_flatten = tf.reshape(result, [-1]) R = tf.gather(result_flatten, user_indecies * tf.shape(result)[1] + item_indecies, name='extracting_user_rate')

### Cost function

The goal of the matrix factorization is to predict rating of user*u*for item*p*. Let call that prediction $\hat{r}$. Then a naive cost function for a single prediction is the absolute difference between user actual rating and our predicted rating: \[\parallel r-\hat{r} \parallel_1 \] As a result, the recommender system cost function is sum of all predication errors for all users and all items: We can add this cost function to TensorFlow computation graph as follows:diff_op = tf.sub(R, rates, name='trainig_diff') diff_op_squared = tf.abs(diff_op, name="squared_difference") base_cost = tf.reduce_sum(diff_op_squared, name="sum_squared_error")

### Regularization

Similar to any machine learning modeling we need to add regularization to control model overfitting. The regularization here is based on the latent vectors for items and users: \[ \sum_{i,j\in \Delta} \lambda (\parallel u_i \parallel + \parallel p_j\parallel) \] where $\Delta$ is the set of all*(user, item)*ratings that is available in training set. In TF we have:lda = tf.constant(.001, name='lambda') norm_sums = tf.add(tf.reduce_sum(tf.abs(U, name='user_abs'), name='user_norm'), tf.reduce_sum(tf.abs(T, name='item_abs'), name='item_norm')) regularizer = tf.mul(norm_sums, lda, 'regularizer')

With regularization, the final cost function is as follows:

cost = tf.add(base_cost, regularizer)In summary, the above cost function is the representative of the following mathematical formulation: \[ \sum_{i,j\in \Delta} \parallel r_{i,j}-\hat{r_{i,j}} \parallel_1 + \lambda (||u_i|| + ||p_j||) \]

**Optimizer**

Our goal is to minimize the above cost function. One of the popular ways is by using stochastic gradient decent (SGD). SGD has a hyper parameter called *learning rate*or*step size*that determines the scale of walking in negative direction of gradient. We use an exponentially decaying learning rate:lr = tf.constant(.001, name='learning_rate') global_step = tf.Variable(0, trainable=False) learning_rate = tf.train.exponential_decay(lr, global_step, 10000, 0.96, staircase=True) optimizer = tf.train.GradientDescentOptimizer(learning_rate) training_step = optimizer.minimize(cost, global_step=global_step)

### Execution

Now its time to execute the above computation graph. First let's define a session that will run the graph:sess = tf.Session()Then we need to initialize all the variables.

init = tf.initialize_all_variables() sess.run(init)Finally, we will feed the data to the graph and run it in multiple epochs

for i in xrange(500):

sess.run(training_step)

### Testing

**Sanity Check**

First lets look at some of the rating predictions from the training data. For example let's look at the first entry of the training set and compare the actual rating with rating from our MF prediction:
u, p, r = df[['user', 'item', 'rate']].values[0] rhat = tf.gather(tf.gather(result, u-1), p-1) print "rating for user " + str(u) + " for item " + str(p) + " is " + str(r) + " and our prediction is: " + str(sess.run(rhat))

**Accuracy**

To calculate the accuracy we apply the MF model on the test set and calculate the average prediction error for all pairs of (user, item) in the test set.
df_test = df[~msk] user_indecies_test = [x-1 for x in df_test.user.values] item_indecies_test = [x-1 for x in df_test.item.values] rates_test = df_test.rate.values R_test = tf.gather(result_flatten, user_indecies_test * tf.shape(result)[1] + item_indecies_test, name='extracting_user_rate_test') diff_op_test = tf.sub(R_test, rates_test, name='test_diff') diff_op_squared_test = tf.abs(diff_op, name="squared_difference_test") cost_test = tf.div(tf.reduce_sum(tf.square(diff_op_squared_test, name="squared_difference_test"), name="sum_squared_error_test"), df_test.shape[0] * 2, name="average_error") print sess.run(cost_test)

**Note 1**: The full code can be found here: https://github.com/mamhamed/misc/tree/master/mf_with_tf**Note 2**: Please also refer to this ohAI post as some of the above code is inspired from there.
Beautiful! Thank you so much

ReplyDeleteur welcome, if you like it please also share to your network for others to benefit.

DeleteWhere does your input_tensor come from? Getting the error:

ReplyDeleteNameError: name 'T' is not defined

you can email me your code and I can take a look and debug the issue.

DeleteI think he mean P instead of T

DeleteIn the cost function why do you use the names you do?

ReplyDeletediff_op_squared = tf.abs(diff_op, name="squared_difference")

base_cost = tf.reduce_sum(diff_op_squared, name="sum_squared_error")

Neither your cost formula or these lines of code seem to imply squaring the differences