Language names not in Unicode CLDR
See CLDR or http://cldr.unicode.org/index/process. We cannot help you more with CLDR at the moment, other than possibly creating an account for you to input data there (and I think that requires an existing locale).
It does require an existing locale. See details at CLDR#Creating a new locale.
Catalan already has a locale at CLDR.
I think that in order to add Extremaduran to the list of language names which can be localised at CLDR you need to file a ticket to make a change request at CLDR Change Requests. I don't think you need an account to do this. The process for requesting data to be added (to common data I think) is described at "CLDR process". You could try explaining why you want it added to the list of language names.
OK, thank you all for your interesting comments. I will finally not ask to Unicode new locales (in plural; Extremaduran was only one of some other languages that have no translations). My prejudice was the belief that Unicode's CLDR supported the translation of the names of "all" languages in the Earth. But "Unicode CLDR 22.0 contains data for 215 languages..." (http://cldr.unicode.org/index/downloads/cldr-22).
Just for curiosity, I wonder where the two words "Extremaduran" and "estremeñu" come from. As far as I guess, the CLDR extension feeds from Unicode's CLDR, but they are not there. Perhaps they comes from Names.php. Anyway, thanks again.
Yes, from Names.php
I think that the languages supported here but not in the group of language names which can be localised at CLDR, the "Locale Display Names - Languages", are:
aeb, akz, aln, als, aro, arq, ary, arz, ase, avk, azb, bar, bbc, bcc, bcl, bew, bfq, bjn, bpy, bqi, brh, bxr, cbk-zam (not a 3 letter code), cps, diq, dtp, egl, eml, esu, ext, fit, frc, frp, gag, gan, gbz, glk, gom, guc, gur, hak, hif, hsn, izh, jam, jut, kgp, khw, kiu, koi, kri, krj, lez, lfn, lij, liv, lmo, ltg, lzh, lzz, mhr, mrj, mwv, mzn, nan, njo, nov, pcd, pdc, pdt, pfl, pms, pnb, pnt, prg, qug, rgn, rif, rmy, rtm rue, rug, saz, sdc, sei, sgs, sli, sly, stq, szl, tcy, tkr, tly, tru, tsd, ttt, vec, vep, vls, vmf, vro, wuu, xmf, yrl, zea.
I am willing to request the addition of these languages for localisation at CLDR. When submitting these to CLDR we could also:
- provide the English names used by MediaWiki for these languages where known
- ask the registered translators for their preferred English name, where missing
- check with registered translators whether the existing English name is still the preferred name.
If we think that any or all of the above are worth doing, then I am willing to do the work involved.
Lloffiwr, I think it's worth sending them this list. You can file a request in their trac and they'll tell you if that's enough (unlikely) or they want more (possible) or they're not interested.
I have received an e-mail from John Emmons of CLDR concerning ticket 6763 at CLDR, as follows:
"I am starting to prepare to do work on this ticket that you opened - requesting new language names be added to CLDR. This ticket was presented to the CLDR TC a few weeks back and the concept was generally approved by the committee, pending some confirmation that you or someone else at translatewiki will be able to provide us a reasonable amount of translated material for these new language names. As has already been pointed out by my colleagues, many of these will not fall into the "modern coverage" bucket that the "big players" such as Google and Apple will translate to. Without a plan to offer translated material ( either via bulk upload or via survey tool entry ), adding these additional languages would be a virtually pointless exercise on our part.
So, if you can offer a plan that will convince me that this is worth doing, I'm agreeable. But we need to act pretty quickly, as I would want to have this all in place to open CLDR 26 data entry on May 1."
He wrote again wanting a response by 18th April. Unfortunately, I have not had time to post this here until today. Are there any translators interested in putting language name translations onto CLDR? If so, please reply to this thread and mention the code of the language into which you normally translate.
If you would be willing to provide translations but not enter them on CLDR, please mention this and I will let CLDR know.
It looks as though these languages are going to be added, except for some codes which are not standard and the codes als, bcc, bcl, bxr, diq, mhr, pnb which are all macrolanguages. Am I right in thinking that adding these might cause a problem further down the line if the locales are migrated to actual language codes instead of macrolanguage codes?
When the new codes are live on CLDR I will put something on translatewiki.net news about this. Could we also provide some publicity on the central banner, to see if we can encourage translators to contribute to CLDR?
I'd say not to bother about those non-standard codes. Sure, we can use sitenotice once those language names are added to the English source for CLDR, but first we could send direct messages to Language support team members for languages with existing CLDR locales. In the meanwhile I'll email Amir, Santhosh and the CLDR survey tool admin to figure out account creation for our translators.
Language names were added to English source yesterday! http://unicode.org/cldr/trac/changeset/10166 In May we'll translate them. :)
I see some of the needed translations into Japanese (and a couple of other languages) were added at https://git.wikimedia.org/tree/mediawiki%2Fextensions%2Fcldr/HEAD/LocalNames . Can someone merge them?
whym, yes, you can. :) Send me an email and I'll add you to the CLDR survey tool as soon as possible. Let me know if you want to add translations in all those languages or only Japanese.
I have added a news item; the banner can wait till 8 May, if I remember.
The survey tool on CLDR will be open for contributions from 8 May to 19 June, for those keen to contribute as soon as possible. If you already have an account at CLDR you can log in here.
I have reviewed the list of aliases at CLDR. Apart from the macrolanguages als, bcc, bcl, bxr, diq, mhr, pnb and rmy, there are 3 codes on this list, which are used at translatewiki.net:
- mo - Moldovan, deprecated in CLDR - CLDR use ro_MD
- sh - Serbo-Croatian in translatewiki.net, Serbian (Latin) in CLDR. CLDR use sr_Latn for Serbian(Latin)
- tl - Tagalog in twn, Filipino in CLDR. CLDR use fil for Filipino.
These 3 codes are already in CLDR so I assume there must be a way of mapping the CLDR code to the twn code.
Thanks, that's useful. The other day I was stupidly wondering how could CLDR not have Tagalog as locale... I'm not sure about aliasing but surely one bug should be filed for each of those languages to be renamed to its proper language code, can you do that? At least tl sounds uncontroversial.
I think that Siebrand is already aware of these, and will know better than I whether they should be changed.
What we commonly call "Tagalog" in Wikimedia is the "Filipino" (or Pilipino) language in standards. But the language code "tl" is ambiguous, it can be considered as a macrolanguage encompassing the traditional Tagalog and the modern Filipino. Filipino has its CLDR data under its standard code as an individual language. Note that the traditional Tagalog was not written with the Latin script, and was not so much creolized with lots of borrows and important simplifications of the phonology. "tl" is not recommended, but as a macrolanguage, can be considered like "zh" for Chinese (even if most of the time it just means modern Mandarin, and most of the time in the simplified version of the Han script). "tl-Tglg" on the opposite only qualifies the traditional language (the modern Filipino is almost never written in the traditional script, and that's probably why "tl" is not standardized as including Filipino). Wikimedia makes an exception to that view on its localized sites (but not in Wiktionary which preferably uses more precise language codes).