This post aims to illustrate use of TensorFlow framework for implementing a simple Matrix Factorization (MF). MF is one of the widely used recommender systems that is especially exploited when we have access to tons of user explicit or implicit feedbacks. For details about matrix factorization and collaborative system refer to this paper.
For the purpose of this post we explore a simple movie recommendation by using the data from MovieLens. The dataset is downloaded from MovieLens 100K Dataset. It contains of 100,000 ratings (1-5) from 943 users on 1682 movies. The u.date represents the rating in sparse format of (userid, movieid, rating, timestamp).
\[\hat{r} = \vec{u}^t\vec{p}\] We define these two latent variables in TensorFlow as follows: U that shows latent representation of user interest and P that represents the latent values for items.
With regularization, the final cost function is as follows:
for i in xrange(500):
sess.run(training_step)
Note 2: Please also refer to this ohAI post as some of the above code is inspired from there.
For the purpose of this post we explore a simple movie recommendation by using the data from MovieLens. The dataset is downloaded from MovieLens 100K Dataset. It contains of 100,000 ratings (1-5) from 943 users on 1682 movies. The u.date represents the rating in sparse format of (userid, movieid, rating, timestamp).
Load the data
import pandas as pd df = pd.read_csv('u.data', sep='\t', names=['user','item','rate', 'time']) msk = numpy.random.rand(len(df)) < 0.7 df_train = df[msk] user_indecies = [x-1 for x in df_train.user.values] item_indecies = [x-1 for x in df_train.item.values] rates = df_train.rate.valuesWe use 70% of the data for training. The rest will be used later for testing and accuracy calculation. The last two lines is for shifting the Ids to 0 (from 1).
Variables
In matrix factorization user rating $r$ is formulated as an inner product of two latent vectors $\vec{u}$ and $\vec{p}$ which are two latent vectors in same space to represent the user interest and movie feature respectively.\[\hat{r} = \vec{u}^t\vec{p}\] We define these two latent variables in TensorFlow as follows: U that shows latent representation of user interest and P that represents the latent values for items.
feature_len = 10 U = tf.Variable(initial_value=tf.truncated_normal([943,feature_len]), name='users') P = tf.Variable(initial_value=tf.truncated_normal([feature_len,1682]), name='items')As mentioned before, all the ratings can be calculated as the product of these two variables
result = tf.matmul(U, P)result is a 943x1682 matrix where its (i,j) entry shows the rating of user i for item j. We first make this matrix a flat vector and then we lookup (gather) the indices that we have the training data for.
result_flatten = tf.reshape(result, [-1]) R = tf.gather(result_flatten, user_indecies * tf.shape(result)[1] + item_indecies, name='extracting_user_rate')
Cost function
The goal of the matrix factorization is to predict rating of user u for item p. Let call that prediction $\hat{r}$. Then a naive cost function for a single prediction is the absolute difference between user actual rating and our predicted rating: \[\parallel r-\hat{r} \parallel_1 \] As a result, the recommender system cost function is sum of all predication errors for all users and all items: We can add this cost function to TensorFlow computation graph as follows:diff_op = tf.sub(R, rates, name='trainig_diff') diff_op_squared = tf.abs(diff_op, name="squared_difference") base_cost = tf.reduce_sum(diff_op_squared, name="sum_squared_error")
Regularization
Similar to any machine learning modeling we need to add regularization to control model overfitting. The regularization here is based on the latent vectors for items and users: \[ \sum_{i,j\in \Delta} \lambda (\parallel u_i \parallel + \parallel p_j\parallel) \] where $\Delta$ is the set of all (user, item) ratings that is available in training set. In TF we have:lda = tf.constant(.001, name='lambda') norm_sums = tf.add(tf.reduce_sum(tf.abs(U, name='user_abs'), name='user_norm'), tf.reduce_sum(tf.abs(T, name='item_abs'), name='item_norm')) regularizer = tf.mul(norm_sums, lda, 'regularizer')
With regularization, the final cost function is as follows:
cost = tf.add(base_cost, regularizer)In summary, the above cost function is the representative of the following mathematical formulation: \[ \sum_{i,j\in \Delta} \parallel r_{i,j}-\hat{r_{i,j}} \parallel_1 + \lambda (||u_i|| + ||p_j||) \]
Optimizer
Our goal is to minimize the above cost function. One of the popular ways is by using stochastic gradient decent (SGD). SGD has a hyper parameter called learning rate or step size that determines the scale of walking in negative direction of gradient. We use an exponentially decaying learning rate:lr = tf.constant(.001, name='learning_rate') global_step = tf.Variable(0, trainable=False) learning_rate = tf.train.exponential_decay(lr, global_step, 10000, 0.96, staircase=True) optimizer = tf.train.GradientDescentOptimizer(learning_rate) training_step = optimizer.minimize(cost, global_step=global_step)
Execution
Now its time to execute the above computation graph. First let's define a session that will run the graph:sess = tf.Session()Then we need to initialize all the variables.
init = tf.initialize_all_variables() sess.run(init)Finally, we will feed the data to the graph and run it in multiple epochs
for i in xrange(500):
sess.run(training_step)
Testing
Sanity Check
First lets look at some of the rating predictions from the training data. For example let's look at the first entry of the training set and compare the actual rating with rating from our MF prediction:u, p, r = df[['user', 'item', 'rate']].values[0] rhat = tf.gather(tf.gather(result, u-1), p-1) print "rating for user " + str(u) + " for item " + str(p) + " is " + str(r) + " and our prediction is: " + str(sess.run(rhat))
Accuracy
To calculate the accuracy we apply the MF model on the test set and calculate the average prediction error for all pairs of (user, item) in the test set.df_test = df[~msk] user_indecies_test = [x-1 for x in df_test.user.values] item_indecies_test = [x-1 for x in df_test.item.values] rates_test = df_test.rate.values R_test = tf.gather(result_flatten, user_indecies_test * tf.shape(result)[1] + item_indecies_test, name='extracting_user_rate_test') diff_op_test = tf.sub(R_test, rates_test, name='test_diff') diff_op_squared_test = tf.abs(diff_op, name="squared_difference_test") cost_test = tf.div(tf.reduce_sum(tf.square(diff_op_squared_test, name="squared_difference_test"), name="sum_squared_error_test"), df_test.shape[0] * 2, name="average_error") print sess.run(cost_test)Note 1: The full code can be found here: https://github.com/mamhamed/misc/tree/master/mf_with_tf
Note 2: Please also refer to this ohAI post as some of the above code is inspired from there.