Simple Matrix Factorization with TensorFlow

Tuesday, December 20, 2016
This post aims to illustrate use of TensorFlow framework for implementing a simple Matrix Factorization (MF). MF is one of the widely used recommender systems that is especially exploited when we have access to tons of user explicit or implicit feedbacks. For details about matrix factorization and collaborative system refer to this paper.
For the purpose of this post we explore a simple movie recommendation by using the data from MovieLens. The dataset is downloaded from MovieLens 100K Dataset. It contains of 100,000 ratings (1-5) from 943 users on 1682 movies. The represents the rating in sparse format of (userid, movieid, rating, timestamp).

Load the data

import pandas as pd
df = pd.read_csv('', sep='\t', names=['user','item','rate', 'time'])
msk = numpy.random.rand(len(df)) < 0.7
df_train = df[msk]
user_indecies = [x-1 for x in df_train.user.values]
item_indecies = [x-1 for x in df_train.item.values]
rates = df_train.rate.values
We use 70% of the data for training. The rest will be used later for testing and accuracy calculation. The last two lines is for shifting the Ids to 0 (from 1).


In matrix factorization user rating $r$ is formulated as an inner product of two latent vectors $\vec{u}$ and $\vec{p}$ which are two latent vectors in same space to represent the user interest and movie feature respectively.
\[\hat{r} = \vec{u}^t\vec{p}\] We define these two latent variables in TensorFlow as follows: U that shows latent representation of user interest and P that represents the latent values for items.
feature_len = 10
U = tf.Variable(initial_value=tf.truncated_normal([943,feature_len]), name='users')
P = tf.Variable(initial_value=tf.truncated_normal([feature_len,1682]), name='items')
As mentioned before, all the ratings can be calculated as the product of these two variables
result = tf.matmul(U, P)
result is a 943x1682 matrix where its (i,j) entry shows the rating of user i for item j. We first make this matrix a flat vector and then we lookup (gather) the indices that we have the training data for.
result_flatten = tf.reshape(result, [-1])
R = tf.gather(result_flatten, user_indecies * tf.shape(result)[1] + 
              item_indecies, name='extracting_user_rate')

Cost function

The goal of the matrix factorization is to predict rating of user u for item p. Let call that prediction $\hat{r}$. Then a naive cost function for a single prediction is the absolute difference between user actual rating and our predicted rating: \[\parallel r-\hat{r} \parallel_1 \] As a result, the recommender system cost function is sum of all predication errors for all users and all items: We can add this cost function to TensorFlow computation graph as follows:
diff_op = tf.sub(R, rates, name='trainig_diff')
diff_op_squared = tf.abs(diff_op, name="squared_difference")
base_cost = tf.reduce_sum(diff_op_squared, name="sum_squared_error")


Similar to any machine learning modeling we need to add regularization to control model overfitting. The regularization here is based on the latent vectors for items and users: \[ \sum_{i,j\in \Delta} \lambda (\parallel u_i \parallel + \parallel p_j\parallel) \] where $\Delta$ is the set of all (user, item) ratings that is available in training set. In TF we have:
lda = tf.constant(.001, name='lambda')
norm_sums = tf.add(tf.reduce_sum(tf.abs(U, name='user_abs'), name='user_norm'), 
   tf.reduce_sum(tf.abs(T, name='item_abs'), name='item_norm'))
regularizer = tf.mul(norm_sums, lda, 'regularizer')

With regularization, the final cost function is as follows:
cost = tf.add(base_cost, regularizer)
In summary, the above cost function is the representative of the following mathematical formulation: \[ \sum_{i,j\in \Delta} \parallel r_{i,j}-\hat{r_{i,j}} \parallel_1 + \lambda (||u_i|| + ||p_j||) \]


Our goal is to minimize the above cost function. One of the popular ways is by using stochastic gradient decent (SGD). SGD has a hyper parameter called learning rate or step size that determines the scale of walking in negative direction of gradient. We use an exponentially decaying learning rate:
lr = tf.constant(.001, name='learning_rate')
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(lr, global_step, 10000, 0.96, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_step = optimizer.minimize(cost, global_step=global_step)


Now its time to execute the above computation graph. First let's define a session that will run the graph:
sess = tf.Session()
Then we need to initialize all the variables.
init = tf.initialize_all_variables()
Finally, we will feed the data to the graph and run it in multiple epochs
for i in xrange(500):


Sanity Check

First lets look at some of the rating predictions from the training data. For example let's look at the first entry of the training set and compare the actual rating with rating from our MF prediction:
u, p, r = df[['user', 'item', 'rate']].values[0]
rhat = tf.gather(tf.gather(result, u-1), p-1)
print "rating for user " + str(u) + " for item " + str(p) + 
      " is " + str(r) + " and our prediction is: " + str(


To calculate the accuracy we apply the MF model on the test set and calculate the average prediction error for all pairs of (user, item) in the test set.
df_test = df[~msk]
user_indecies_test = [x-1 for x in df_test.user.values]
item_indecies_test = [x-1 for x in df_test.item.values]
rates_test = df_test.rate.values
R_test = tf.gather(result_flatten, user_indecies_test * tf.shape(result)[1] + item_indecies_test, name='extracting_user_rate_test')
diff_op_test = tf.sub(R_test, rates_test, name='test_diff')
diff_op_squared_test = tf.abs(diff_op, name="squared_difference_test")

cost_test = tf.div(tf.reduce_sum(tf.square(diff_op_squared_test, name="squared_difference_test"), name="sum_squared_error_test"), df_test.shape[0] * 2, name="average_error")
Note 1: The full code can be found here:
Note 2: Please also refer to this ohAI post as some of the above code is inspired from there.


  1. Replies
    1. ur welcome, if you like it please also share to your network for others to benefit.

  2. Where does your input_tensor come from? Getting the error:
    NameError: name 'T' is not defined

    1. you can email me your code and I can take a look and debug the issue.

    2. I think he mean P instead of T

  3. In the cost function why do you use the names you do?

    diff_op_squared = tf.abs(diff_op, name="squared_difference")
    base_cost = tf.reduce_sum(diff_op_squared, name="sum_squared_error")

    Neither your cost formula or these lines of code seem to imply squaring the differences

  4. In the Testing section, under Accuracy, it should probably read: diff_op_squared_test = tf.abs(diff_op_test, name="squared_difference_test"). Right now you're referencing the training "diff_op" variable.

  5. This is Very very nice article. Everyone should read. Thanks for sharing. black dog names for male & badass dog names for german shepherds.

  6. This is very very nice article.

  7. I get this error in the execution part:

    ValueError: Tried to convert 'tensor' to a tensor and failed. Error: Dimension size, given by scalar input 2, must be non-negative but is -2007852676 for 'gradients/Reshape_grad/Reshape/UnsortedSegmentSum' (op: 'UnsortedSegmentSum') with input shapes: [1353674], [1353674], [] and with computed input tensors: input[2] = <-2007852676>.

  8. thank you for your valuable content....



Favorite Quotes

"I have never thought of writing for reputation and honor. What I have in my heart must out; that is the reason why I compose." --Beethoven

"All models are wrong, but some are useful." --George Box

Copyright © 2015 • Ensemble Blogging