My suggestion: Compare to normal parameters. This means you get two lists of AUC scores Your params: [0.80,0.75,0.73,....,0.95] Pyod params: [0.82,0.71,0.48,....,0.95] look at two values $\sum_i your_i-pyod_i$ Total improvment. If positive, then your parameters help;) But hard to see if this is significant Fraction of $your_i>pyod_i$ Quantised, so does not care about improving your parameters further But easy to see if this is significant 0.5->Probably just random 0.9->Probably quite significant