6
$\begingroup$

I've been working in data science for a long time, but very rarely have I been called upon to implement an ML algorithm; I've just ran other people's libraries. I'm trying to pick up the skill. I'm particularly interested in learning to implement recommender systems.

This page presents some architecture concepts for recommender systems. To start off, I'm interested in implementing the basic embedding layer model at the top of the page, with no other layers.

I'd like the user embeddings and item embeddings to be learned so that the same user keeps the same embedding until it is updated from new information about what items they do or don't click on, and similar for items.

I have a concept for a more complex design, but right now I don't even know how to implement these basic building blocks. I'd like a Keras implementation ideally. If not that, PyTorch, but I've only ever implemented NNs in Keras before and I'm actually in a bit of a time crunch because I want to be able to fully explain my recommender system if I'm asked a recommender system question in a job interview a couple weeks from now.

$\endgroup$

1 Answer 1

6
$\begingroup$

The model presented below follows a basic dot-product recommender architecture using learned embeddings, as described in the NVIDIA article linked in the OP. This approach learns dense vector representations (embeddings) for users and items, computing a preference score as the dot product: $$ s_{ui} = \mathbf{u}_u \cdot \mathbf{v}_i $$ where $\mathbf{u}_u$ and $\mathbf{v}_i$ are the user and item embeddings, respectively (Koren et al., 2009). The architecture deliberately avoids hidden layers or other complexity, making it an ideal starting point for collaborative filtering when explicit features are unavailable.

To implement this in Keras, the prerequisites are straightforward: integer-encoded user and item IDs, and either implicit or explicit feedback data (such as click/no-click or rating information). A sigmoid activation function maps the dot product score to a probability in $[0,1]$, making it particularly suitable for implicit feedback scenarios. This simple yet effective architecture forms the foundation for more complex recommendation systems while maintaining interpretability through its direct embedding-to-output pathway.

Model Overview

  • Inputs: two integer IDs — one for the user, one for the item.
  • Embeddings:
    • Each user ID is mapped to a vector (embedding) of size d.
    • Each item ID is similarly embedded.
  • Dot Product: Computes affinity scores $s_{ui} = \mathbf{u}_u \cdot \mathbf{v}_i$, where $\mathbf{u}_u$ and $\mathbf{v}_i$ are embeddings.
  • Sigmoid Activation: converts the raw dot product into a probability in the range $[0, 1]$, representing likelihood of interaction (eg. click).
  • Loss Function: binary_crossentropy, which is suitable for implicit feedback (eg. click/no-click).

Training and Evaluation

The model is optimised using binary cross-entropy loss for implicit feedback (Hu et al., 2008). Negative sampling is recommended for scalability (Koren et al., 2009).

After training, the model should be able to:

  • Predict how likely a user is to interact with a given item.
  • Rank items for a user to produce top-N recommendations.
  • Visualise user and item embeddings to interpret model behaviour.

Key Features

  • Static Embeddings: User/item embeddings remain fixed until updated via retraining.
  • Cold-Start Limitation: Requires retraining for new users/items (Schein et al., 2002).

References

Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. Proceedings of the 2008 IEEE International Conference on Data Mining, 263–272.

Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.

Schein, A. I., Popescul, A., Ungar, L. H., & Pennock, D. M. (2002). Methods and metrics for cold-start recommendations. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 253–260.


Appendix: Full Keras Code (Embeddings, Training, Prediction, Visualisation)

import numpy as np import tensorflow as tf from keras.models import Model from keras.layers import Input, Embedding, Dot, Flatten, Activation from keras.optimizers import Adam from sklearn.decomposition import PCA import matplotlib.pyplot as plt # ----- CONFIG ----- num_users = 3 num_items = 4 embedding_dim = 8 epochs = 50 top_n = 2 # ----- MODEL ----- user_input = Input(shape=(1,), name='user_id') item_input = Input(shape=(1,), name='item_id') user_embedding = Embedding(input_dim=num_users, output_dim=embedding_dim, name='user_embedding')(user_input) item_embedding = Embedding(input_dim=num_items, output_dim=embedding_dim, name='item_embedding')(item_input) dot_product = Dot(axes=-1)([user_embedding, item_embedding]) output = Activation('sigmoid')(Flatten()(dot_product)) model = Model(inputs=[user_input, item_input], outputs=output) model.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy']) model.summary() # ----- TOY DATA ----- user_ids = np.array([0, 0, 1, 2, 2]) item_ids = np.array([0, 1, 2, 1, 3]) clicks = np.array([1, 1, 1, 0, 0]) # ----- TRAIN ----- model.fit([user_ids, item_ids], clicks, epochs=epochs, batch_size=2, verbose=0) print("\nModel trained.") # ----- PREDICT ----- test_users = np.array([0, 1, 2]) test_items = np.array([2, 1, 0]) preds = model.predict([test_users, test_items]) for u, i, p in zip(test_users, test_items, preds): print(f"User {u} → Item {i}: predicted click probability = {p[0]:.4f}") # ----- SAVE MODEL ----- model.save("simple_recommender_model.keras") print("\nModel saved as 'simple_recommender_model.keras'") # ----- EXTRACT EMBEDDINGS ----- user_matrix = model.get_layer('user_embedding').get_weights()[0] item_matrix = model.get_layer('item_embedding').get_weights()[0] print("\nUser embedding matrix:") print(user_matrix) print("\nItem embedding matrix:") print(item_matrix) # ----- VISUALISE EMBEDDINGS (PCA) ----- def plot_embeddings(matrix, title, labels): pca = PCA(n_components=2) reduced = pca.fit_transform(matrix) plt.figure(figsize=(6, 4)) plt.scatter(reduced[:, 0], reduced[:, 1]) for i, label in enumerate(labels): plt.text(reduced[i, 0], reduced[i, 1], str(label), fontsize=12) plt.title(title) plt.grid(True) plt.tight_layout() plt.show() plot_embeddings(user_matrix, "User Embeddings (PCA)", labels=[f"U{u}" for u in range(num_users)]) plot_embeddings(item_matrix, "Item Embeddings (PCA)", labels=[f"I{i}" for i in range(num_items)]) # ----- TOP-N RECOMMENDATIONS ----- print("\nTop-N recommendations for each user:") for user in range(num_users): items = np.arange(num_items) user_array = np.full_like(items, user) scores = model.predict([user_array, items], verbose=0).flatten() ranked_items = items[np.argsort(scores)[::-1]][:top_n] print(f"User {user}: Top {top_n} items → {ranked_items.tolist()}") 

enter image description here enter image description here

The output shows the following:

Model Summary

  • 3 users × 8-dimensional embeddings = 24 parameters
  • 4 items × 8-dimensional embeddings = 32 parameters
  • Total trainable parameters = 56
    No hidden layers, just embedding matrices and a dot product + sigmoid.

Predictions

Predicted click probabilities for selected user–item pairs:

User 0 → Item 2: 0.5018 User 1 → Item 1: 0.5062 User 2 → Item 0: 0.4782 

These are all near 0.5 because the model has only seen a few examples and embeddings are still general. Over more training data and epochs, these will diverge more confidently.


Embedding Matrices

Your printed matrices (user and item embeddings) represent learned 8-dimensional vectors. Each row is a dense representation of a user or item’s “position” in latent space. These are optimised to reflect observed interactions.


PCA Visualisations

User Embeddings:

  • U0 and U1 are in the upper-right quadrant: they behave similarly.
  • U2 is in a different quadrant, reflecting dissimilar behaviour (disliked both items).

Item Embeddings:

  • I0 and I1 are clustered — both liked by U0.
  • I2 stands apart — only liked by U1.
  • I3 is slightly separate — disliked by U2.

PCA reduced 8D embeddings into 2D for visualisation, helping you spot clusters or anomalies.


Top-N Recommendations

User 0: Top 2 items → [0, 1] Matches training User 1: Top 2 items → [2, 3] Item 2 liked; 3 may be neutral User 2: Top 2 items → [2, 0] Best among remaining options 

This ranking is derived by predicting click probabilities across all items and sorting them in descending order.


$\endgroup$
4
  • 1
    $\begingroup$ Thanks so much! I'll get around to accepting this answer sometime soon, when I've had a chance to try out the system. If you wanted to incorporate user and item features (like for example item class from a taxonomy or user region), how would you do that? $\endgroup$ Commented Jul 18 at 19:02
  • $\begingroup$ You are very welcome. To incorporate features like item class or user region, you could add embedding layers for these categorical features. Then concatenate them with the user/item ID embeddings to create enriched user/item vectors. You can still use a dot product if the dimensions match, or optionally apply a dense layer to project them. This allows you to model similarities based on shared metadata (eg. same region), and helps with cold-start generalisations..../cont $\endgroup$ Commented Jul 20 at 17:43
  • $\begingroup$ /cont....code-wise In code, you could add additional Input() layers for user/item features, embed them, then combine with the ID embeddings. For instance, user_embedding + region_embedding → user vector. Same for item class. If their dimensions differ, a Dense() layer can bring them to a common size. You could then use a dot product or small MLP for prediction. Something like that anyway ! $\endgroup$ Commented Jul 20 at 17:44
  • 2
    $\begingroup$ I tried out an enhanced version of your code (with the addition of categorical and numerical features) on real data and it worked great! Thank you so much! $\endgroup$ Commented Jul 23 at 21:10

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.