Main Highlights:
- Twitter’s conversation warning is the latest in a long-running push to educate users about the need for online civility.
- One technique for reducing bias in language algorithms is to train them using synthetic data in addition to open-source data.
- Synthetic data may be generated from relatively small real-world datasets.
- A borderless society may level the playing field for speakers of lesser-used languages and foster more cross-cultural understanding.
Twitter’s dialogue warning is the latest in a long-running campaign to encourage people to be more respectful online. Perhaps more concerning is that we train large-scale AI language models on data extracted from frequently poisonous online exchanges. Unsurprisingly, we see this prejudice replicated in machine-generated language. What if, when we construct the metaverse – literally the future iteration of the web – we used AI to filter out harmful discourse permanently?
Language as a Facetune?
At the moment, academics are focusing their efforts on improving the accuracy of AI language models. A person in the loop, for example, may make a significant difference in multilingual translation models. Human editors can ensure that cultural subtleties are accurately conveyed in a translation and efficiently teach the computer to avoid future errors. Consider people as a diagnostic tool for our AI systems.
If you think of the metaverse as a scaled-up version of SimCity, this form of AI translation may quickly make us all bilingual when we communicate. A borderless society might level the playing field for people (and their avatars) who speak lesser-used languages and encourage more cross-cultural understanding. It may potentially provide new avenues for international business.
There are significant ethical concerns associated with using AI as a Facetune for language. Yes, we can control the language’s style, indicate instances when models fail to behave as intended, and even adjust literal meaning. However, how far is excessive? How can we maintain a diverse culture while restricting abusive or disrespectful speech and behavior?
A methodology for ensuring the fairness of algorithms
One strategy for making language algorithms less biased is to train them using synthetic data in addition to data from the open internet. Based on relatively limited real-world datasets, synthetic data may be produced.
It is possible to construct synthetic datasets that represent the population of the actual world (not just the ones that speak the loudest on the internet). It’s pretty straightforward to determine where a dataset’s statistical features are out of whack and consequently where synthetic data would be most beneficial.
All of this begs the question: Is virtual data going to be a significant component in creating fair and equal virtual worlds? Could our choices in the metaverse even affect how we think about and communicate with one another in the actual world? Suppose the final result of these technological choices is a more civil global discourse that enables us to share. In that case, synthetic data may be worth its computational weight in gold.
However, as tempting as it is to believe that we can click a button and alter behavior to create an entirely new virtual world, this is not a decision that technologists will make alone. It is unknown whether businesses, governments, or individuals will dominate the metaverse’s laws guiding fairness and behavioral standards. With so many competing interests at stake, it would be prudent to seek advice from renowned technology experts and consumer advocates on how to proceed.
Perhaps assuming the existence of a consortium for collaboration between all conflicting interests is wishful thinking. Still, we must do so to talk about impartial language AI now. Each year of delay requires the retrofitting of dozens, if not hundreds, of metaverses to conform to any prospective standards. These questions regarding what it means to have a fully accessible virtual ecology demand consideration before the metaverse’s broad acceptance occur.