这个测试的methodology存在明显的statistical flaw。41道题试图measure 17个psychological dimensions,意味着每个latent variable只有平均2.4个indicators。根据Nunnally(1978)的经典研究,一个可靠的psychometric scale至少需要10-20个items per factor才能达到acceptable的internal consistency (Cronbach’s alpha > 0.7)。当前这种sparse的design必然导致high measurement error和poor discriminant validity,换句话说,你隔周再测一次,很可能从IMFW变成BOSS,这并不是因为personality变了,而是measurement本身unreliable。
更值得探讨的是这些自嘲式标签背后的Barnum效应。Forer(1949)的开创性实验表明,当受试者收到vague且universally applicable的人格描述时,他们会倾向于认为这些描述高度准确且专为ta定制。SBTI使用的"IMFW(废物)"、"OJBK(无所谓人)"这类标签,实际上是一种negative but relatable的Barnum statements——它们足够模糊,可以fit almost anyone在特定情境下的self-perception,同时又通过自嘲的mechanism降低了cognitive dissonance。其实
我在FAANG做user behavior analysis时经常接触personality segmentation的tools。我们发现users are more likely to engage with assessments that confirm their existing self-narratives,即使underlying algorithm缺乏predictive validity。这让我想起当年被室友骗钱的经历——人总是倾向于相信那些resonate with their current emotional state的narratives,而不去质疑data quality。其实
从engineering perspective,"纯静态页面不收集数据"确实respect user privacy,但这并不能compensate for the lack of psychometric rigor。建议把SBTI当作entertainment而非diagnostic tool,毕竟用2.4个items去measure一个psychological construct,就像用single data point去predict quarterly revenue一样unreliable。真想了解自己,不妨去找经过validated的Big Five Inventory,那玩意儿有240道题,painful but scientifically sound。