FairVis is Helping Data Scientists Discover Societal Biases in their Machine Learning Models
Wednesday November 6 2019
2019年11月6日,星期三
Researchers at Georgia Tech, Carnegie Mellon University, and University of Washington have developed a data visualization system that can help data scientists discover bias in machine learning algorithms.
乔治亚理工大学、卡内基梅隆大学和华盛顿大学的研究人员开发了一个数据可视化系统,可以帮助数据科学家发现机器学习算法中的偏差。
FairVis, presented at IEEE Vis 2019 in Vancouver, is the first system to integrate a novel technique that allows users to audit the fairness of machine learning models by identifying and comparing different populations in their data sets.
FairVis是第一个集成了一种新技术的系统,该技术允许用户通过识别和比较数据集中的不同种群来审核机器学习模型的公平性。 在线咨询>>>
According to School of Computational Science and Engineering (CSE) Professor and co-investigator Polo Chau, this feat has never been accomplished by any platform before, and is a major contribution of FairVis to the data science and machine learning communities.
据计算科学与工程学院(CSE)教授兼联合研究员Polo Chau称,这一壮举从未在任何平台上实现过,是FairVis对数据科学和机器学习社区的重大贡献。
“Computers are never going to be perfect. So, the question is how to help people prioritize where to look in their data, and then, in a scalable way, enable them to compare these areas to other similar or dissimilar groups in the data. By enabling comparison of groups in a data set, FairVis allows data to become very scannable,” he said.
“计算机永远不会是完美的。因此,问题是如何帮助人们确定数据的优先顺序,然后以可伸缩的方式,使他们能够将这些区域与数据中的其他相似或不同的组进行比较。通过对数据集中的组进行比较,FairVis允许数据变得非常可扫描。
In order to do accomplish this, FairVis uses two novel techniques to find subgroups that are statistically similar.
为了做到这一点,FairVis使用了两种新的技术来寻找统计上相似的子组。
The first technique groups similar items together in the training data set, calculates various performance metrics like accuracy, and then shows users which groups of people the algorithm may be biased against. The second technique uses statistical divergence to measure the distance between subgroups to allow users to compare similar groups and find larger patterns of bias.
第一种技术将训练数据集中的相似项组合在一起,计算各种性能指标(如准确性),然后向用户显示算法可能会偏向哪一组人。第二种技术使用统计散度来测量子组之间的距离,允许用户比较相似的组并找到更大的偏差模式。
These outputs are then viewed and analyzed through FairVis’ visual analytics system, which is designed to specifically discover and show intersectional bias.
然后,通过FairVis的视觉分析系统来查看和分析这些输出,该系统专门用于发现和显示交叉偏差。
如需进一步了解,或有任何相关疑问,欢迎在线咨询留学专家。如果您对自己是否适合留学还有疑虑,欢迎参与前途出国免费评估,以便给您进行准确定位。
