To understand why, first consider how to combine different tests Since the loss is just a (quadratic) sum of the feature/particle losses, this is what we need to model this, lets consider losses made from overlapping gaussians now lets add them together but also add a multiplicative constant #c# to one of them ##Eq(d,d_1+c*d_2)## depending on #c# the auc of the addition chances There is an optimum value of c and if you use a value of c that is way to large, it can actually hurt your auc so assume: #Eq(c,1)#(unweighted addition) is a #c# that is way to big for toptagging so lets calculate the perfect c for a given distribution auc as function of c %show animation here ##Eq(mu_1B,0),Eq(mu_2B,0),Eq(mu_1S,1),Eq(mu_2S,c*alpha)## ##Eq(sigma_iB,sigma_iS),Eq(sigma_1,s_1),Eq(sigma_2,alpha*c*s_2)## ##Eq(mu_B,0),Eq(mu_S,1+c*alpha),Eq(sigma,sqrt(sigma_1**2+sigma_2**2))## fix the scale by demanding #Eq(mu_S,1)#, then maximum auc means minimum #sigma# (or #(sigma/s1)**2#) ##Eq((sigma/s1)**2,(1+(s_2/s_1)**2*alpha**2*c**2)/(1+alpha*c))## ##Eq(d/dc * (sigma/s1)**2,0)## ##Eq((1/(1+alpha*c)**3)*2*y*(c*alpha*(s_2/s_1)**2-1),0)## ##Eq(c,1/(alpha*(s_2/s_1)**2))## ##Eq(alpha,1.0),Eq(s_2,0.75),Eq(s_1,0.5)## compare to numerics: ##Eq(c,0.4444),Eq(c_n,0.4436),Eq(sigma_c_n,0.0024)## ##Eq(c,1/(alpha*(s_2/s1)**2))## but you can approximate \begin{equation} \alpha \propto loss \end{equation} \begin{equation} #s# \propto loss \end{equation} so \begin{equation} c \propto loss^{-3} \end{equation} %some tabular comparing the benefits/problems of this bodge %atm some test que Benefits Problems So maybe use weigths in training to let the network focus more on the important things First Goal: Reach the same quality for a small Network (8 nodes) in splittet and nonsplittet training here 8 nodes, 4 of those weigthed with a factor auc as a function of this factor apparently still something i dont understand First Goal: Reach the same quality for a small Network (8 nodes) in splittet and nonsplittet training here 8 nodes, 4 of those weigthed with a factor auc as a function of this factor apparently still something i dont understand