The model presented below follows a basic dot-product recommender architecture using learned embeddings, as described in the NVIDIA article linked in the OP. This approach learns dense vector representations (embeddings) for users and items, computing a preference score as the dot product: $$ s_{ui} = \mathbf{u}_u \cdot \mathbf{v}_i $$ where $\mathbf{u}_u$ and $\mathbf{v}_i$ are the user and item embeddings, respectively (Koren et al., 2009). The architecture deliberately avoids hidden layers or other complexity, making it an ideal starting point for collaborative filtering when explicit features are unavailable.
To implement this in Keras, the prerequisites are straightforward: integer-encoded user and item IDs, and either implicit or explicit feedback data (such as click/no-click or rating information). A sigmoid activation function maps the dot product score to a probability in $[0,1]$, making it particularly suitable for implicit feedback scenarios. This simple yet effective architecture forms the foundation for more complex recommendation systems while maintaining interpretability through its direct embedding-to-output pathway.
Model Overview
- Inputs: two integer IDs — one for the user, one for the item.
- Embeddings:
- Each user ID is mapped to a vector (embedding) of size
d. - Each item ID is similarly embedded.
- Dot Product: Computes affinity scores $s_{ui} = \mathbf{u}_u \cdot \mathbf{v}_i$, where $\mathbf{u}_u$ and $\mathbf{v}_i$ are embeddings.
- Sigmoid Activation: converts the raw dot product into a probability in the range $[0, 1]$, representing likelihood of interaction (eg. click).
- Loss Function:
binary_crossentropy, which is suitable for implicit feedback (eg. click/no-click).
Training and Evaluation
The model is optimised using binary cross-entropy loss for implicit feedback (Hu et al., 2008). Negative sampling is recommended for scalability (Koren et al., 2009).
After training, the model should be able to:
- Predict how likely a user is to interact with a given item.
- Rank items for a user to produce top-N recommendations.
- Visualise user and item embeddings to interpret model behaviour.
Key Features
- Static Embeddings: User/item embeddings remain fixed until updated via retraining.
- Cold-Start Limitation: Requires retraining for new users/items (Schein et al., 2002).
References
Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. Proceedings of the 2008 IEEE International Conference on Data Mining, 263–272.
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.
Schein, A. I., Popescul, A., Ungar, L. H., & Pennock, D. M. (2002). Methods and metrics for cold-start recommendations. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 253–260.
Appendix: Full Keras Code (Embeddings, Training, Prediction, Visualisation)
import numpy as np import tensorflow as tf from keras.models import Model from keras.layers import Input, Embedding, Dot, Flatten, Activation from keras.optimizers import Adam from sklearn.decomposition import PCA import matplotlib.pyplot as plt # ----- CONFIG ----- num_users = 3 num_items = 4 embedding_dim = 8 epochs = 50 top_n = 2 # ----- MODEL ----- user_input = Input(shape=(1,), name='user_id') item_input = Input(shape=(1,), name='item_id') user_embedding = Embedding(input_dim=num_users, output_dim=embedding_dim, name='user_embedding')(user_input) item_embedding = Embedding(input_dim=num_items, output_dim=embedding_dim, name='item_embedding')(item_input) dot_product = Dot(axes=-1)([user_embedding, item_embedding]) output = Activation('sigmoid')(Flatten()(dot_product)) model = Model(inputs=[user_input, item_input], outputs=output) model.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy']) model.summary() # ----- TOY DATA ----- user_ids = np.array([0, 0, 1, 2, 2]) item_ids = np.array([0, 1, 2, 1, 3]) clicks = np.array([1, 1, 1, 0, 0]) # ----- TRAIN ----- model.fit([user_ids, item_ids], clicks, epochs=epochs, batch_size=2, verbose=0) print("\nModel trained.") # ----- PREDICT ----- test_users = np.array([0, 1, 2]) test_items = np.array([2, 1, 0]) preds = model.predict([test_users, test_items]) for u, i, p in zip(test_users, test_items, preds): print(f"User {u} → Item {i}: predicted click probability = {p[0]:.4f}") # ----- SAVE MODEL ----- model.save("simple_recommender_model.keras") print("\nModel saved as 'simple_recommender_model.keras'") # ----- EXTRACT EMBEDDINGS ----- user_matrix = model.get_layer('user_embedding').get_weights()[0] item_matrix = model.get_layer('item_embedding').get_weights()[0] print("\nUser embedding matrix:") print(user_matrix) print("\nItem embedding matrix:") print(item_matrix) # ----- VISUALISE EMBEDDINGS (PCA) ----- def plot_embeddings(matrix, title, labels): pca = PCA(n_components=2) reduced = pca.fit_transform(matrix) plt.figure(figsize=(6, 4)) plt.scatter(reduced[:, 0], reduced[:, 1]) for i, label in enumerate(labels): plt.text(reduced[i, 0], reduced[i, 1], str(label), fontsize=12) plt.title(title) plt.grid(True) plt.tight_layout() plt.show() plot_embeddings(user_matrix, "User Embeddings (PCA)", labels=[f"U{u}" for u in range(num_users)]) plot_embeddings(item_matrix, "Item Embeddings (PCA)", labels=[f"I{i}" for i in range(num_items)]) # ----- TOP-N RECOMMENDATIONS ----- print("\nTop-N recommendations for each user:") for user in range(num_users): items = np.arange(num_items) user_array = np.full_like(items, user) scores = model.predict([user_array, items], verbose=0).flatten() ranked_items = items[np.argsort(scores)[::-1]][:top_n] print(f"User {user}: Top {top_n} items → {ranked_items.tolist()}")

The output shows the following:
Model Summary
- 3 users × 8-dimensional embeddings = 24 parameters
- 4 items × 8-dimensional embeddings = 32 parameters
- Total trainable parameters = 56
No hidden layers, just embedding matrices and a dot product + sigmoid.
Predictions
Predicted click probabilities for selected user–item pairs:
User 0 → Item 2: 0.5018 User 1 → Item 1: 0.5062 User 2 → Item 0: 0.4782
These are all near 0.5 because the model has only seen a few examples and embeddings are still general. Over more training data and epochs, these will diverge more confidently.
Embedding Matrices
Your printed matrices (user and item embeddings) represent learned 8-dimensional vectors. Each row is a dense representation of a user or item’s “position” in latent space. These are optimised to reflect observed interactions.
PCA Visualisations
User Embeddings:
U0 and U1 are in the upper-right quadrant: they behave similarly. U2 is in a different quadrant, reflecting dissimilar behaviour (disliked both items).
Item Embeddings:
I0 and I1 are clustered — both liked by U0. I2 stands apart — only liked by U1. I3 is slightly separate — disliked by U2.
PCA reduced 8D embeddings into 2D for visualisation, helping you spot clusters or anomalies.
Top-N Recommendations
User 0: Top 2 items → [0, 1] Matches training User 1: Top 2 items → [2, 3] Item 2 liked; 3 may be neutral User 2: Top 2 items → [2, 0] Best among remaining options
This ranking is derived by predicting click probabilities across all items and sorting them in descending order.