Multilingualism on the Internet has long been more than just an enthusiast’s dream – it’s a process which has been ongoing for more than a decade. This process has borne fruit in the form of internationalised domain names such as not only the Serbian .срб domain but also .rs domain names that can be registered using the expanded range of Latin characters used in the languages of ethnic minorities living in Serbia. But in order for all the world’s languages and corresponding scripts to be used in top-level domains (TLDs) in a standardised and consistent way, beginning in 2014 at the initiative of ICANN (the Internet Corporation for Assigned Names and Numbers), voluntary working groups (Generation Panels) were formed, tasked with drawing up technical and lexical documents based on which rules would be devised for using a particular script for top-level domains. Taking part in this extremely important work, and in the long-term, committed operation of two such panels, were experts from the Serbian National Internet Domain Name Registry (RNIDS): Mirjana Tasić, advisor to RNIDS and Dušan Stojičević, currently a member of the RNIDS Board of Governors.
ICANN’s view has been that to ensure the global accessibility of the Internet, it is not enough merely to ensure technical requirements for connection are met. The majority of the world’s population does not use the English alphabet (early standards were based solely around these 26 letters), and barely a third of its population uses Latin script. That is why the community with ICANN at its centre has for years actively been working on popularising internationalised domain names (IDNs), which allow domain names to be registered using characters from local scripts (Arabic, Chinese, Cyrillic and others). In March a public debate was held which resulted in the approval of the fifth version of the Root Zone Label Generation Rules (RZ-LGR-5), and 26 scripts were integrated into the rules, used by hundreds of languages all over the world.
Mirjana and Dušan talked to us about their experiences heading international working groups, the process itself, and the results achieved.
Mirjana TasićNo instruction manual
“In order to arrive at rules at the level of ICANN regarding opening up TLDs (top-level domains) for scripts in non-English/Latin scripts, a number of working groups were set up, tasked with comprehensively studying a single script and producing rules. Later these rules were sublimated by the Integration Panel, which I also participated in, and drawn up for all world scripts so that opening up a TLD would not destabilise the current DNS system,” explains Dušan Stojičević, who chaired the Cyrillic panel. This panel tackled the Cyrillic scripts used by languages around the world, including Serbian, Russian, Belarusian, Ukrainian, Mongolian, Bulgarian and Northern Macedonian Cyrillic scripts, as well as the Cyrillic script used in countries such as Kazakhstan and Uzbekistan.
Stojičević goes on to explain that the rules were primarily aimed at anticipating the problem of phishing-a (for example, while “PC” and “PC” superficially look the same, one is Cyrillic and the other is Latin – RS and PC respectively). However, this type of ambiguity was just one of many encountered by the experts in the working groups. Mirjana Tasić took over chairing the Latin Panel in 2016. The panel wrapped up its work this year, and Mirjana says that the working group spent almost two years addressing the problem of variant characters in the Latin script which are considered to represent the same sound (such as, e.g. ß in German).
“Our task was to come up with a single document that would resolve the majority of the problems relating to the use of the Latin script on the Internet and to determine the acceptable range of Latin characters that could be used in creating top-level domains,” explains Mirjana Tasić. She says that the greatest challenge faced by the working groups was to address a complex problem with no prior example as to how this should be done.
As of 2014, this task was tackled by 17 working groups covering 26 scripts and bringing together 270 volunteers such as Mirjana and Dušan, who worked more than ten thousand hours to produce the proposals for the standardisation of new top-level domains.
Challenge-by-challenge
Mirjana Tasić recounts how the Latin Panel established that more than 1,200 languages of the world used the Latin script, but only 210 of them were still living languages. Another issue was that numerous Latin characters were no longer in use, even though included in the UNICODE standard, and so one of the tasks of the working group was to recognise and eliminate characters no longer used in living languages. Beginning with the basis of the 26 characters of the English alphabet, the working group recognised 210 “living” Latin characters that would be acceptable for use when creating TLDs.
If the Latin panel was sometimes overwhelmed with information, the Cyrillic panel occasionally had to deal with political debates. “No matter how expert the panel members, experts too sometimes stray into historical debate. It was my job to address any tendency towards nationalist zeal on the part of the panel members. I did so in the simplest possible way – the first draft document had no text, just tables and bare facts. We avoided any talk of the history of the Cyrillic script, or of the history of the territories in which it is used, and this got the panel back onto its primary task – to draft the rules,” Dušan Stojičević recalls.
Dušan Stojičević
He further explains that there were many differing opinions on the approach to be taken towards this unique set of problems: any prohibitions – for that is what the rules essentially were – could be either minimal or far-reaching. “We decided to go for a far-reaching set of rules, and to leave little room for interpretation after the fact. The obvious similarities between Cyrillic and Latin – which is still the primary script used for domains at the end of the day – was the main reason for taking a restrictive approach,” Stojičević says.
Thousands of hours and thousands of pages
“The final report of the Latin Panel, approved after public debate, was summarised in 87 pages. However there were eight versions of the final text, and we had more than 1,000 pages of working documents, of which around 200 pages were included in the final version, in the appendices to the main report. The results of the work were summarised in an xml document with around 11,300 rows. This copious material was presented, among other reasons, in order to give an insight into just how much work had been done,” says Mirjana Tasić. As the chair of the group, Mirjana Tasić was tasked with preparing the agenda for the meetings, compiling minutes, summarising conclusions and organising the meetings themselves. More than 130 online meetings were held with the team working on the panel. “At one point ICANN allocated more people to help with the effort because the volume of work – which were were doing on a voluntary basis – exceeded our abilities,” says Mirjana.
The Cyrillic Panel produced a similar report – a technical and lexical document of some 100 pages, which was presented in multiple forms, (including xml, with strict rules), with none of the political or historical discussion which had encumbered the early work of this panel.
“Behind this finished work is a major and committed effort, which finally ended in Istanbul, in the ICANN Hub, where most of the members of the Cyrillic panel gathered to perform the final polishing of the documents and rules. Over two days of intensive work, we went through all the working documents and all of the Cyrillic letters, comparing them with other scripts (Armenian, Latin, Greek etc.), and the final rules were published. Working on the integration panel was interesting too – this is where the entire ruleset for all the world’s languages was sublimated. There were no political stumbling-blocks in this group, just wide-ranging discussion and many specific issues were brought up regarding the work of the panels, since the first task was to verify the work of all the panels. Since we were from all over the world (and therefore different time zones), virtual meetings were often held in the middle of the night, which demanded additional effort, since the topic was very complex,” Dušan Stojičević says.
A job that is never done
“The only conservative thing about the Internet is the respect for standards. Standards can be supplemented over time, and in line with new demands, but nothing is ever removed, only added,” says Mirjana Tasić. She adds that the Internet is a living thing, as are languages and alphabets.
Dušan Stojičević notes that he has not stepped down as Chair of the Cyrillic Panel because languages and scripts change all the time. “We have published the new rules, they have been implemented, and new TLDs are already required to adhere to them (for example a new round of distributions of gTLDs is expected soon), but these rules are changing, too. I will give you two examples – while we were working on the rules, Montenegro took the decision to introduce two new Cyrillic characters, but this decision has now been reversed. Another example is Kazakhstan, which has officially moved over to Latin script, but Russian Cyrillic is still in use there for their language (Kazakh) and they have a Cyrillic TLD. So the work goes on – I am still in touch with the ICANN coordinators, although we are just monitoring events around the world which would require a future rule-change, like the aforementioned.”
However, issuing new rules, constantly improving them and launching new gTLDs is not enough for the Internet to truly and fundamentally become multilingual. Stojičević notes that it took ten years after the first IDNs before a real balance in the rules was achieved amongst all the scripts at the level of top-level domains. Also, the IDN project can be considered a success in those environments where a local script is used in all areas of life, such as China, Russia, Japan and India. But from the point of view of average users, IDN is insufficiently accessible – keyboards, operating systems and content in these languages and scripts are the only cure, domains are just a screw connecting the Internet and multilingualism. “For IDN to be a complete success, users need to be provided with a simple and self-contained computing environment in their mother tongue and script,” concludes Dušan Stojičević.