I suppose the pithy answer to your question is that this is a consequence of change-of-variables in integral calculus: $$\int_\color{gray}{a}^\color{gray}{b} f(g(t))g^\prime(t)dt = \int_{\color{gray}{g(a)}}^{\color{gray}{g(b})} f(u) du,$$ for a monotonically increasing function $g$, and continuous $f$ over $[a,b]$.
If you would like an appreciation for the geometry of what this represents though...
In short, the factor you refer to is needed to preserve total probability density (by rescaling the original density function).
[Using $g^\prime(g^{-1}(y))$ here is perhaps a typo...I think you're trying to express the derivative of the inverse transformation?]
The density function of $Y$ is better presented as $$f_Y(y)=f_X(g^{-1}(y)) \lvert\frac{d}{dy}g^{-1}(y)\rvert,$$ where $g$ is a monotonic function (and hence so is $g^{-1}$) over the support of $X$. Students love to ignore the $\frac{d}{dy}g^{-1}(y)$ factor, but it is where all the action happens for continuous $X$, as I hope to suggest to you!
(Side note: be careful with your notation: you have written $f_Y(\color{blue}{y})= f_X(\color{red}{x}) / g'(g^{-1}(\color{blue}{y}))$, which is a function of $y$ on the left hand side but still has an $x$ on the right hand side, which may contribute to confusion - the $x$ has been (and should be) replaced by $g^{-1}(y)$, a function of $y$.)
As Shubham Johri pointed out, when $X$ is discrete then (effectively) $\frac{d}{dy}g^{-1}(y)$ "falls away". However, "falling away" doesn't give you much of a feel for what is really going on (geometrically).
A good starting point is to accept that probability masses (associated with discrete variables) and densities (continuous variables) are "incompressible". When you perform a statistical transformation ($Y=g(X)$) you are shifting probability mass or density around, but preserving the total mass or density ($\sum_{\text{all y}} f_Y(y)=1$ or $\int_\mathbb{R} f_Y(y)dy=1$).
For a physical analogy for a continuous random variable, think of taking an inflated balloon: you can reshape the balloon, without popping it, in many different ways (by squashing it, for example), and the total volume in the balloon remains the same (yes, real gasses do compress/expand but that is not the point here - fill the balloon with water in your mind if it helps you get past the limitations of my physical analogy).
Now, for a monotonic transformation of a continuous random variable, consider how the supports map (from $X$ to $Y$). To make it concrete, let's suppose that the support of $X$ is $(0,1)$ and we are going to transform this variable to $Y=e^X-1=g(X)$ (notice that this "$g$" is monotonic). The transformed support of $Y$ is then $(e^0-1,e^1-1)=(0,e-1)$ (we can do this because $e^x-1$ is monotonic in $x$; if this were not the case, we would need to be far more careful in determining the new support). The point is that geometrically the support (not the density directly!) has been "stretched" by this transformation, from a range of $1$ for $X$ to a range of $e-1$ for $Y$. The density simply follows this stretching.
In order to preserve the total probability density (which is incompressible), the density of $Y$ will need to be rescaled. How much to rescale by? That depends on how much local "stretching" has occurred. (Think of the graph of the transformation, $y=e^x-1$, over the support of $X$ (i.e. $(0,1)$.) The amount of local "stretching" that must be compensated for is given precisely by the factor $\frac{d}{dy}g^{-1}(y)$! In this example, $g^{-1}(y)=\ln (y+1)$, and $\frac{d}{dy}g^{-1}(y)=\frac{1}{y+1}$. You will notice that (for this transformation) for values of $y>0$, this factor is less than $1$ i.e. the straight substitution you enquired about, $f_X(g^{-1}(y))$, is decreased by this factor. And it needs to be decreased more the larger $y$ is. Why is that (geometrically)? The transformation we are using ($e^X-1$) stretches (the $x$-axis) more, the larger $x$ is. The height of the density above this needs to be decreased then, in order to preserve the total density in a local region when it goes through the transformation. In non-rigorous terms, the density of $X$ is "smeared" out by this particular transformation.
Since point masses can't be "smeared", there is no need for this scaling factor in the density transformation.
[If the non-linear transformation above is confusing at first, try the same argument using a linear transformation, $Y=g(X)=a+b X$. In this case the scaling factor, $\lvert1/b\rvert$, is constant over the support. And, in the really trivial case where $b=1$, no scaling is required at all...because the transformation is only a shift, $Y=a+X$, of the distribution of $X$.]
$a^{b}$. $\endgroup$