I believe the confusion arises from the interpretation of $\mathrm{net}_j$.
Part 1. To clarify, we first review a general result. Let $\mathsf{G} = (\mathsf{V}, \mathsf{E})$ be a finite directed acyclic graph (DAG). For each node $v \in \mathsf{V}$, we denote the set of parents (i.e., nodes with incoming edges to $v$) by $\mathsf{pa}(v)$ and the set of children (i.e., nodes with outgoing edges from $v$) by $\mathsf{ch}(v)$. Assume that each node $v$ is a variable and a differentiable function of its parents, i.e., $v = f_v(\mathsf{pa}(v))$ for some differentiable function $f_v : \mathbb{R}^{\mathsf{pa}(v)} \to \mathbb{R}$.
Let $x$ be a node in $\mathsf{G}$, and let $\mathsf{de}_0(x)$ denote the set of all descendants of $x$, including $x$ itself. If $v \in \mathsf{de}_0(x)$, we express $v$ by recursively applying the functional relation $w = f_w(\mathsf{pa}(w))$ until either $x \in \mathsf{pa}(w)$ or $\mathsf{pa}(w) \cap \mathsf{de}_0(x) = \varnothing$. This allows us to write $v$ as a function of $x$ and the "non-descendants" of $x$. We denote this functional relation as $F_{vx}$: $$ v = F_{vx}(x, \mathsf{V}\setminus\mathsf{de}_0(x)). $$
Example. In OP's DAG, we have $$j = f_j(x, y) = f_j(x, f_y(x))$$ and hence $j = F_{jx} = f_j(x, f_y(x))$.
To avoid ambiguity, we denote the variable $v$ in this context by $[v]_x$, or simply $[v]$ if the variable $x$ is clear from the context. (Note that this is not standard notation.) Then, we have:
Theorem. Under the above setting,
$$ \frac{\partial [v]}{\partial x} = \sum_{y \in \mathsf{ch}(x)} \frac{\partial [v]}{\partial y}\frac{\partial y}{\partial x}. $$ More formally,
$$ \frac{\partial F_{vx}}{\partial x} = \sum_{y \in \mathsf{ch}(x)} \frac{\partial F_{vy}}{\partial y}\frac{\partial f_y}{\partial x}. $$
This result is not immediately obvious, so we provide a proof at the end for completeness.
Part 2. Now, we return to the original problem. The neural network represented by the DAG takes the form:
Here, we assume $x$ is the input layer, so $o_x$ represents the input, and each node is a non-learnable function (i.e., a function without learnable parameters) of its parents. This graph can be used to compute the derivative of $E$ with respect to each weight.
Using the notation from Part 1, we have: $$\begin{align*} \frac{\partial [\mathrm{net}_j]}{\partial o_x} &= \frac{\partial [\mathrm{net}_j]}{\partial \mathrm{net}_j}\frac{\partial \mathrm{net}_j}{\partial o_x} + \frac{\partial [\mathrm{net}_j]}{\partial \mathrm{net}_y}\frac{\partial \mathrm{net}_y}{\partial o_x} \\ &= w_{xj} + \frac{\partial [\mathrm{net}_j]}{\partial \mathrm{net}_y}w_{xy}. \end{align*}$$
This explains the discrepancy between the proof in the Wikipedia article and the OP's argument. In the article, the theorem in Part 1 is used to expand the partial derivatives of $E$ with respect to neuron outputs. As such, $\frac{\partial \mathrm{net}_j}{\partial o_x}$ refers to the partial derivative of $\mathrm{net}_j$ as a function of its direct parents, namely $o_y$, $w_{yj}$, $o_x$, and $w_{xj}$. In contrast, OP considers the derivative $\frac{\partial [\mathrm{net}_j]}{\partial o_x}$, where $\mathrm{net}_j$ is expanded as a composite function of $o_x$ (and other non-descendants). These two derivatives must be distinguished.
Proof of Theorem. Let $v$ be a descendent of $x$.
If $\mathsf{pa}(v)$ does not contain $x$ but contains some of the descendents of $x$, then $F_{vx}$ satisfies the relation:
$$ F_{vx} = f_v(F_{wx} : w \in \mathsf{pa}(v)). $$
So by the chain rule, we obtain
$$ \begin{align*} \frac{\partial F_{vx}}{\partial x} &= \sum_{v_1 \in \mathsf{pa}(v)} \frac{\partial f_v}{\partial v_1} \frac{\partial F_{v_1x}}{\partial x} \end{align*}$$
Otherwise, $\mathsf{pa}(v)$ is a subset of $\{x\} \cup (\mathsf{V}\setminus\mathsf{de}_0(x)) $ and $F_{vx} = f_v$. Hence,
$$ \begin{align*} \frac{\partial F_{vx}}{\partial x} &= \begin{cases} \frac{\partial f_v}{\partial x} & \text{if $x \in \mathsf{pa}(v)$}, \\ 0, & \text{if $x \notin \mathsf{pa}(v)$}. \end{cases} \end{align*}$$
Repeatedly applying these relations, we end up with
$$ \begin{align*} \frac{\partial F_{vx}}{\partial x} &= \sum_{x = v_n \to v_{n-1} \to \cdots \to v_1 \to v_0 = v} \frac{\partial v_0}{\partial v_1} \frac{\partial v_1}{\partial v_2} \cdots \frac{\partial v_{n-1}}{\partial v_n}, \end{align*}$$
where the final sum is over all paths from $x$ to $v$. The desired conclusion follows by grouping the sum by the penultimate node $v_{n-1}$ (i.e., the successor of $x$) in the path.