tocarina/howto4/data/old2.swp/old.swp/03problems.swp/03rocs

193 lines
3.4 KiB
Plaintext
Raw Normal View History

2022-02-24 11:53:43 +01:00
<subsection Why does the Graph fall of?>
<frame>
<split>
<que>
<list>
<e>To understand why, first consider how to combine different tests</e>
<e>Since the loss is just a (quadratic) sum of the feature/particle losses, this is what we need</e>
<e>to model this, lets consider losses made from overlapping gaussians</e>
</list>
</que>
<que>
<i f="dist1" f2="dist2"></i>
</que>
</split>
</frame>
<frame>
<split>
<que>
<list>
<e>now lets add them together</e>
<e>but also add a multiplicative constant #c# to one of them</e>
<e>##<h>Eq(d,d_1+c*d_2)##</e>
<e>depending on #c# the auc of the addition chances</e>
</list>
</que>
<que>
<i f="adda"></i>
</que>
</split>
</frame>
<frame>
<split>
<que>
<list>
<e>There is an optimum value of c</e>
<e>and if you use a value of c that is way to large, it can actually hurt your auc</e>
<e>so assume: #Eq(c,1)#(unweighted addition) is a #c# that is way to big for toptagging</e>
<e>so lets calculate the perfect c for a given distribution</e>
</list>
</que>
<que>
<i f="abc" wmode=True>auc as function of c</i>
</que>
</split>
</frame>
%show animation here
<frame>
##Eq(mu_1B,0),Eq(mu_2B,0),Eq(mu_1S,1),Eq(mu_2S,c*alpha)##
##Eq(sigma_iB,sigma_iS),Eq(sigma_1,s_1),Eq(sigma_2,alpha*c*s_2)##
##Eq(mu_B,0),Eq(mu_S,1+c*alpha),Eq(sigma,sqrt(sigma_1**2+sigma_2**2))##
fix the scale by demanding #Eq(mu_S,1)#, then maximum auc means minimum #sigma# (or #(sigma/s1)**2#)
##Eq((sigma/s1)**2,(1+(s_2/s_1)**2*alpha**2*c**2)/(1+alpha*c))##
</frame>
<frame>
##Eq(d/dc * (sigma/s1)**2,0)##
##Eq((1/(1+alpha*c)**3)*2*y*(c*alpha*(s_2/s_1)**2-1),0)##
##Eq(c,1/(alpha*(s_2/s_1)**2))##
##Eq(alpha,1.0),Eq(s_2,0.75),Eq(s_1,0.5)##
compare to numerics:
##Eq(c,0.4444),Eq(c_n,0.4436),Eq(sigma_c_n,0.0024)##
</frame>
<frame title="Why is that useful?">
##Eq(c,1/(alpha*(s_2/s1)**2))##
but you can approximate
\begin{equation} \alpha \propto loss \end{equation}
\begin{equation} #<empty>s# \propto loss \end{equation}
so
\begin{equation} c \propto loss^{-3} \end{equation}
</frame>
<frame>
<i f="superscale"></i>
</frame>
<frame>
%some tabular comparing the benefits/problems of this bodge
%atm some test que
<split>
<que>
Benefits
<list>
<e easy to use>
<e fast to train>
<e quite good results>
</list>
</que>
<que>
Problems
<list>
<e Probably not the best possible compression/rejection, since there is no Interaction between particles>
<e Does not use the Graph to its full potential>
</list>
</que>
</split>
So maybe use weigths in training to let the network focus more on the important things
</frame>
<frame>
<split>
<que>
<list>
<e>First Goal: Reach the same quality for a small Network (8 nodes) in splittet and nonsplittet training</e>
<e>here 8 nodes, 4 of those weigthed with a factor</e>
<e>auc as a function of this factor</e>
<e>apparently still something i dont understand</e>
</list>
</que>
<que>
<i f="auwei"></i>
</que>
</split>
</frame>
<frame>
<split>
<que>
<list>
<e>First Goal: Reach the same quality for a small Network (8 nodes) in splittet and nonsplittet training</e>
<e>here 8 nodes, 4 of those weigthed with a factor</e>
<e>auc as a function of this factor</e>
<e>apparently still something i dont understand</e>
</list>
</que>
<que>
<i f="auwei2"></i>
</que>
</split>
</frame>
<ignore>
<split>
<que>
<list>
<e></e>
<e></e>
<e></e>
</list>
</que>
<que>
</que>
</split>
<frame>
<split>
<que>
<list>
<e></e>
<e></e>
<e></e>
</list>
</que>
<que>
<i f="none"></i>
</que>
</split>
</frame>
<frame>
<list>
<e></e>
<e></e>
<e></e>
</list>
</frame>
</ignore>