Wiktionary:Beer parlour/2024/January

Proto-Berber

This topic actually includes two proposals. The first is to remove the hyphens from entries such as Reconstruction:Proto-Berber/am-an. After all, we don't add hyphens to indo-european words like *wĺ̥kʷ-os. The second is to treat numidian as a dialect of proto-berber, as it seems to have been. Therefore, we could treat the numidian GLD as an attested form of proto-berber *aǵăllid. Ελίας (talk) 13:52, 1 January 2024 (UTC)[reply]

User:USERNAME for confirmed group

Tim Utikal — This unsigned comment was added by Tim Utikal (talk • contribs).

@Tim Utikal: Did you make a mistake here? Are you trying to be whitelisted for certain user rights? —Justin (koavf)❤T☮C☺M☯ 21:38, 1 January 2024 (UTC)[reply]

yes I'm sorry Tim Utikal (talk) 21:39, 1 January 2024 (UTC)[reply]

Per Wiktionary:Confirmed users, this can be granted in exceptional cases, but it will also just naturally occur to your account after a few days and several edits. Is there a particular need to have it quicker? —Justin (koavf)❤T☮C☺M☯ 22:02, 1 January 2024 (UTC)[reply]

@Tim Utikal You are already in "autoconfirmed users" so there should be no need for this. Benwing2 (talk) 23:17, 1 January 2024 (UTC)[reply]

not really I just find it strange that I didn't already get it since I meet the requirements for quite a while now. I just realized anyway sorry for bothering. Tim Utikal (talk) 00:00, 2 January 2024 (UTC)[reply]

Deprecating `Latnx`

For background, Latnx is a script code used for the “extended” Latin script, meaning it covers the whole range Latin-script characters, while Latn only covers the common ones. Originally, the two were given different CSS styles, because most unusual Latin characters (such as those in IPA) were only supported by specialist fonts, and we didn’t want to use those fonts for languages that didn’t need it. That hasn’t been an issue for a long time.

So far as I can tell, the Latnx script code does absolutely nothing which is distinct from the normal Latn code. There are currently no special styles assigned to it in MediaWiki:Gadget-LanguagesAndScripts.css, and I can’t find any other special uses that make it a necessity. It’s dead weight, and has been since at least some time before 2019: @Erutuon removed all special styles assigned to it in this diff (when it was handled by MediaWiki:Common.css), but according to their edit summary this was “because the rule before my recent edits had no effect”, implying that it had been defunct for some time before then.

A few other things to consider:

It adds clutter, which is annoying.
It duplicates stuff unnecessarily in Module:scripts/data.
Most languages which use characters only covered by Latnx are currently only set to use Latn, and fixing this would be a massive headache. This is normally okay, but becomes a problem with single-character entries for those characters, since the script module checks for Latn, finds no characters match, so assigns them the script code None, which causes browsers to not render them properly.
There is no performance advantage to this, so far as I can tell.

Given all of this, I think we should just get rid of it. Theknightwho (talk) 00:10, 2 January 2024 (UTC)[reply]

Given that it does nothing,

Support. CitationsFreak (talk) 00:15, 2 January 2024 (UTC)[reply]

@Theknightwho: I'd like to have User:Erutuon and User:This, that and the other weigh in to verify that this removal is OK; if so, I would support its removal. Benwing2 (talk) 02:29, 2 January 2024 (UTC)[reply]

@Benwing2:, shall I ping them for the other two script-deprecating votes? CitationsFreak (talk) 03:32, 2 January 2024 (UTC)[reply]

@CitationsFreak No need; by mentioning them I've already pinged them. Benwing2 (talk) 03:51, 2 January 2024 (UTC)[reply]

Ah. CitationsFreak (talk) 06:26, 2 January 2024 (UTC)[reply]

@Benwing2 @Theknightwho I can confirm that Latnx does nothing special styling-wise, but 35 Lua modules mention this script code, and it's possible that one of them is doing some special handling when this script code is found. I'll leave the Lua stuff in your hands. This, that and the other (talk) 05:48, 3 January 2024 (UTC)[reply]

I still need to do a thorough check, but I should note:

A significant number of these are users’ private modules, which are the owner’s responsibility to update to account for any changes. I think they’re mostly sandboxes, anyway.
Several more are language data modules, which we would expect: any that contain a language using Latnx will mention it.
Yet more only mention it alongside Latn to make sure both are treated in the same way.
There are no special modules associated with Latnx.

I haven’t seen anything (so far) which gives me cause for concern.

Theknightwho (talk) 17:35, 3 January 2024 (UTC)[reply]

Given that we now know deleting the code will have no practical effect, I'm going to go ahead and start actioning this. Theknightwho (talk) 16:56, 6 January 2024 (UTC)[reply]

Deprecating `xzh-Tibt`

The script code xzh-Tibt is only used for the Zhang-Zhung language. It does precisely one thing differently from Tibt, which is that it adds “BabelStone Tibetan sMar-chen” to its list of fonts, which displays Tibetan writing as though it were the Marchen script (which is one of its daughter scripts).

However, as explained on BabelStone’s website, this font was created in 2007 as a stopgap until the Marchen script had been encoded in Unicode, which it was back in 2016. The correct script code for this is Marc, and it’s already set up on our system. That leaves us with xzh-Tibt as a confusing duplicate, for the sake of an unnecessary font which almost no-one has anyway.

We only have 3 lemmas for Zhang-Zhung, ~~two of which need to be moved to the correct encoding, so this shouldn’t be too arduous~~ Looking at the source, they seem to have been recorded in the Tibetan script, so no moves are necessary. Theknightwho (talk) 01:10, 2 January 2024 (UTC)[reply]

Delete. Perhaps a pedantic point, but I note that the .xzh-Tibt font-family rule doesn't add that font to the list, it entirely substitutes the .Tibt, .xzh-Tibt font list with its own one-font list. This, that and the other (talk) 05:52, 3 January 2024 (UTC)[reply]

@This, that and the other Good point. It’s also problematic because Zhang-Zhung is primarily attested in the conventional Tibetan script in documents post-dating their fall to the Tibetan Empire, so we should simply change one of its scripts from this to Tibt and be done with it.

There are, in fact, other unencoded scripts that were also used for Zhang-Zhung (such as Marchung, among several others), but again, this code doesn’t facilitate those. Theknightwho (talk) 17:21, 3 January 2024 (UTC)[reply]

@This, that and the other - Given there have been no objecions, I'm going to change Zhang-Zhung to the regular Tibt code (while keeping Marc). Would it please be possible for you to remove the relevant bits from the CSS? Theknightwho (talk) 01:11, 26 January 2024 (UTC)[reply]

@Theknightwho

Done This, that and the other (talk) 01:45, 26 January 2024 (UTC)[reply]

Deprecating `pjt-Latn`

This is only used by the Pitjantjatjara language of Australia. However, I see no reason why it has a separate script code: the only difference is that it three fonts listed in MediaWiki:Gadget-LanguagesAndScripts.css, which are Microsoft Sans Serif, Tahoma, and Code2000, with a fallback of using a sans-serif font (which we use for everything anyway).

The justification given in the CSS file is Pitjantjatjara (ḻ ṉ ṟ ṯ and capitals), which I assume is because font support used to be poor for these; no doubt that’s why Code2000 is listed, which supports many unusual characters. This has long been unnecessary, and they’re even included in Latn, not just Latnx. It’s clearly just a holdover from the long-deleted Template:pjt-Latn, created before we even had modules. Theknightwho (talk) 01:33, 2 January 2024 (UTC)[reply]

The "Georgia" font used for page titles by the default (Vector) skin doesn't have these characters. The heading on ngiṉṯaka, displayed here without the pjt-Latn override font:

ngiṉṯaka

This looks pretty ugly to me; the retroflex consonants look markedly smaller than the surrounding letters. I'm inclined to hold onto the special rules for now, although we could definitely choose a nicer font. This, that and the other (talk) 05:58, 3 January 2024 (UTC) 22:07, 3 January 2024 (UTC)[reply]

@This, that and the other I’m happy to have special rules, but I don’t think it necessitates having a separate script code. We can just set the rules to apply to Latn when used with language pjt. This issue will come up with a lot of other languages, so the current solution is unwieldy and scales poorly. Theknightwho (talk) 06:02, 3 January 2024 (UTC)[reply]

@Theknightwho So long as this works, and gets applied to the page title specifically, I'll be satisfied. This, that and the other (talk) 06:07, 3 January 2024 (UTC)[reply]

@This, that and the other: The method we use now in Module:headword (display title with script class) won't fix the page title (header level 1) in ngiṉṯaka because the display title doesn't contain the language code. Always adding script class and language code to the display title with pjt wouldn't make sense because some Pitjantjatjara entries have other entries on the same page. Maybe it would be okay to include the language code whenever there are letters with line below like ṉ, though that requires some new field for unusual characters in Module:headword or language data, and might end up with display title conflicts (last one wins, but there might be an error message) if there are multiple languages' entries on a page with the same unusual characters. — Eru·tuon 20:59, 9 January 2024 (UTC)[reply]

Petition to upgrade Medieval Greek

from Sarri.greek. Notifying, especially for Ancient Greek @Mahagaja, Erutuon, JohnC5, @Atelaes, ObsequiousNewt; also @Benwing2, Chuck Entz. Happy 2024 to everyone. Could en.wikt, please reconsider for Medieval Greek[…]

1) using the linguistic term Medieval Greek instead of the historical term 'Byzantine' for its name (Module:languages). It is also visible at {{grc-IPA}}).
2) upgrading Medieval Greek from etymology language (currently under grc) to autonomous language section? This is needed to correct the title Ancient Greek over words of 6th century onwards at Cat:Medieval Greek (Category:Byzantine Greek). I know that this is a nuisance for modules, but please consider updating; it is an omission of too many centuries for Greek.

All reference sources are listed at previous petition 2023. The 2019 Cambridge Grammar of Medieval and Early Modern Greek DOI intro presents information in English.). At el.wiktionary Cat:Med.Greek we use code gkm. Treated in polytonic script. No other templates needed. Period, from Justinian's Novellae (those written in Greek) to medieval texts extending to Late Medieval, equivalent to Early Modern. Thank you in advance for taking time to look into this. From el.wiktionary, ‑‑Sarri.greek ^♫ I 09:20, 2 January 2024 (UTC)[reply]

I have no objection, but WT:RFM is the usual venue for requesting splits (in this case, splitting gkm Medieval Greek out from grc Ancient Greek). —Mahāgaja · talk 09:31, 2 January 2024 (UTC)[reply]

Thank you very much @Mahagaja, I will make application there (my browser has problem there, too long page). ‑‑Sarri.greek ^♫ I 09:40, 2 January 2024 (UTC)[reply]

Discussion at Wiktionary:Language_treatment_requests#Medieval_Greek_from_Ancient_Greek ‑‑Sarri.greek ^♫ I 09:27, 26 September 2024 (UTC)[reply]

Removing Old Galician-Portuguese references/further readings in Galician entries

Pinging @Stríðsdrengur, @MedK1, @Sarilho1 and @Froaringus.

Recently I've been looking for Galician entries with quotations prior to 1500, thus considered Old Galician-Portuguese, and in the meantime also came across Galician entries (most of them) with OGP references/further readings, such as Corpus Xelmírez and Dicionario de Dicionarios do galego medieval. I think they should be removed, since they're used for OGP. Do you guys agree? Amanyn (talk) 17:03, 2 January 2024 (UTC)[reply]

Pinging @Froaringus because the last ping didn't work (I think). Amanyn (talk) 17:17, 2 January 2024 (UTC)[reply]

@Amanyn, I agree with this wholeheartedly, but Froaringus does not — he added the references there and wants them to stay where they are, so right now we're in a bit of an impasse (you can look at our convo here). I believe the current de facto treatment is to keep them in both pages. MedK1 (talk) 01:56, 3 January 2024 (UTC)[reply]

@MedK1: I may be 100% mistaken, but as I recall, pinging only works if you use it and sign your post, so going back to a previous post and adding in {{Ping|Foo}} doesn't work. (Again, I am an ignorant person, so I am wrong often.) —Justin (koavf)❤T☮C☺M☯ 03:52, 3 January 2024 (UTC)[reply]

Yeah, Chuck Entz even corrected me in Wiktionary:Information desk/2023/December for doing that. Amanyn (talk) 16:27, 3 January 2024 (UTC)[reply]

Oh, that's good to know! Thank you! MedK1 (talk) 01:15, 4 January 2024 (UTC)[reply]

@MedK1 Alright then. I think we should discuss that with Froaringus again later, but now something a bit off-topic (since I've already pinged some OGP contributors, they might also see this) — don't you think quotations should be added to OGP alternative forms/spellings? While they make sense (the ones I came across so far), I think it just makes sense to prove they existed with a quoatation. For example, I believe that coelho was an OGP word, but shouldn't there be a quotation to prove? And that applies sometimes to main forms too: the quoation used for cõelho, for example, has cõello instead, and the same for meninho, whose quoation has menỹo. Of course it makes sense that if cõello existed cõelho also did (and the same for meninho, being men��o an abbreviation, and the "y" used instead because the author was Spanish(?)), but I think a quoation with the word written like it is in the title would make more sense and fit it better. Amanyn (talk) 16:23, 3 January 2024 (UTC)[reply]

@Amanyn I agree with that too! The "infrastructure" for OGP right now is super rudimentary; we only have 900 lemmas so far, too — something like what you're proposing would definitely be an improvement here, especially since we're treating each version as its own lemma (see WT:AROA-OPT. MedK1 (talk) 01:15, 4 January 2024 (UTC)[reply]

@MedK1 Alright then! I won't start with it right now since I don't have any experience with quoations, but I'll look into it later. Amanyn (talk) 20:01, 4 January 2024 (UTC)[reply]

Sorry to be late to the party... I had Covid recently and decided to take a time off. Yes, Galician or Portuguese quotation earlier than 1500 (give or take) should be moved to OGP. But the references and etymological and historical info should not be removed from the Galician entries. I mean, I had cherry-picked and translated these quotations in the past (from Galician medieval sources, when we still don't has a timeframe) and added the historical and etymological info and references to their respective entries. In my opinion is the quotation which now should be moved to OGP, but not info should be removed in the etymological or reference sections in Galician. Froaringus (talk) 15:16, 15 January 2024 (UTC)[reply]

Alright, thank you for your opinion, said like that it makes sense and I understand your point. Hope you're better too. Amanyn (talk) 20:09, 18 January 2024 (UTC)[reply]

List of verbs by conversion of final voiceless /s/ into voiced /z/

An example is excuse with the homographs: verb /ɪkˈskjuːz/ vs noun /ɪkˈskjuːs/. Similarly, / close (verb /kloʊz/ vs adj./adv. /kloʊs/), use (verb /juz/ vs noun /jus/), and advice vs advise.

I'd also create a list for nouns derived change of stress into inition position (e.g., record, present, protest, rebel, refuse, etc.) ? JMGN (talk) 20:19, 2 January 2024 (UTC)[reply]

@JMGN are you asking for help to make such a list, or are you looking to make it yourself? These word should all be in Category:English heteronyms. If you are looking to contribute, this content may be in scope for our Appendix. You can do a search for the page title, including the Appendix: prefix, and click the red link to start contributing. This, that and the other (talk) 05:32, 3 January 2024 (UTC)[reply]

@JMGN — FWIW, there are several pairs of such words in English, where the voiceless final consonant is a noun or adjective, and the voiced final consonant is a verb. Most such pairs that I'm aware of involve spelling differences.

Consider:

loath (adjective), loathe (verb)
breath (noun), breathe (verb)
reef (noun, as of a sail), reeve (verb)
life (noun), live (verb)
strife (noun), strive (verb)

Interestingly, these all seem to have fricatives as the finals. I can't think of any such pairs that end in stops. ‑‑ Eiríkr Útlendi │^{Tala við mig} 19:32, 17 January 2024 (UTC)[reply]

Google Groups to stop archiving new Usenet posts

A banner has appeared on Google Groups:

Effective from 22 February 2024, Google Groups will no longer support new Usenet content. Posting and subscribing will be disallowed, and new content from Usenet peers will not appear. Viewing and searching of historical data will still be supported as it is done today.

You can read more at [2].

This will make Google Groups useless for citing new terms and hot words via Usenet as time moves on. Given that Usenet has not seen substantial message traffic for some years, the impact on Wiktionary will be minimal.

However, as a matter of practicality, unless anyone else is able to find another Usenet archive, I think it would be worth changing CFI so that Usenet messages are only considered durably archived up to 21 February 2024 (perhaps to be voted on after Google goes ahead with the change). This, that and the other (talk) 05:15, 3 January 2024 (UTC)[reply]

Just another "killedbygoogle". G has been crap for Usenet access for years anyway: they folded it into their inferior Google Groups offering, and then lost interest in Groups (which was even in the beginning an inferior clone of what Yahoo! had). Anyone else miss DejaNews? Equinox ◑ 05:20, 3 January 2024 (UTC)[reply]

You can still access new Usenet posts, just on a different client. CitationsFreak (talk) 09:08, 3 January 2024 (UTC)[reply]

Google reminds me of the strangler fig, a hemiepiphyte, which rapidly grows up trees in tropical rain forests (not having to waste energy on forming a strong trunk itself), often eventually killing the tree. Not all strangler figs kill their support/host. Are they doing to Wikimedia what they have done to Usenet. DCDuring (talk) 20:51, 3 January 2024 (UTC)[reply]

We're not owned by Google. I think we won't be a victim of Google. CitationsFreak (talk) 23:16, 3 January 2024 (UTC)[reply]

Strangler fig don't own the trees they rely on to reach the sun before those trees, weakened by the sap the figs have taken, die in their shade. (I think there's a rich metaphor here.) DCDuring (talk) 01:35, 4 January 2024 (UTC)[reply]

@CitationsFreak are you aware of any ongoing archiving effort for Usenet posts? This, that and the other (talk) 23:30, 3 January 2024 (UTC)[reply]

There's https://www.usenetarchives.com/ (no search function, however) and (for the 80s Usenet stuff) https://usenet.trashworldnews.com . CitationsFreak (talk) 23:34, 3 January 2024 (UTC)[reply]

No full-text search = practically useless for our purposes.

Incidentally, my curiosity was piqued by the part of Google's statement which says that Usenet is now mainly used for sharing binaries (files) rather than text-based emails. Seemingly the binaries being shared involve "Linux ISOs": piracy of software, movies, porn, etc. In this context, it seems that the Usenet community of today has an incentive not to keep comprehensive, searchable archives. This, that and the other (talk) 06:33, 4 January 2024 (UTC)[reply]

Honestly, while full-text search would be a great thing to have for our purposes (as well as in general), you can find some slang from certain Usenet subgroups (or whatever they're called) in the relevant subgroup [1]. This is what the OED did, as a matter of fact.

[1]As an example, you could expect to find more skateboarding slang in alt.rec.skateboards than in comp.os.linux.setup, although that shouldn't mean you wouldn't find none. CitationsFreak (talk) 07:06, 4 January 2024 (UTC)[reply]

@CitationsFreak Good point. However, I was poking around the website and I can't find a single message from 2023, or even the second half of 2022. Are you sure they are still archiving? This, that and the other (talk) 02:03, 7 January 2024 (UTC)[reply]

The Winter/Summer 2024 Competition is here!

After looking at the response I got from the original idea I wrote, I have decided to officially make it a contest that Wiktionarians can partake in. The goal is to, in the format of a play, define as many English words as possible before we reach "zzzs".

RULES

1. All definitions must be in alphabetical order, starting at "a" and ending at "zzzs". (That means that, for the "trumspringa" example above, you couldn't define "a" or "1984" as your next definition, since "a" comes before "trumpsringa" and "1984" starts with a number.)

2. Each person can use three sentences for their definition, two for the use and definition of a word and one for some optional stage directions. You do not need to use two sentences to use and define a word, although it is encouraged. Sign your entries like this: ([Wiktionarian]) after what you wrote.

3. Each line of dialogue must be formatted like a play script would be. This is the speaker's name, in bold and all-caps, followed by a colon and them some dialogue. In addition, the word that is being defined must be in bold. Here is an example:

RANDOM CITIZEN #1: I have trumspringa. I have the desire to give up my job for farming. (RandomCitizen1)

Stage directions should be italicized, as so:

RANDOM CITIZEN #1 tosses his briefcase to the left-hand side of the stage.

4. The next word you define should come fairly close after the previous word defined (something like 600 lemmas after the word that's defined). Here is an example of something that would be valid:

RANDOM CITIZEN #1: I have trumspringa. I have the desire to give up my job for farming. (RandomCitizen1) Being the ts is a hard job. Being the person who something something is a hard job. (RandomCiti2)

And here is something that would not:

RANDOM CITIZEN #2: I have trumspringa. I have the desire to give up my job for farming. (RandomCitizen1) This is because you can't zyxt. This is because you can't see. (MidEngFan)

This is because "trumspringa" and "ts" are less than 600 entries away from each other (if they existed with the defined meaning on Wikt), and "zyxt" is more than that (if it was an English lemma).

5. I'm allowed to make up new rules when I feel that they are needed.

So, that's it. Have any questions on them? CitationsFreak (talk) 06:35, 4 January 2024 (UTC)[reply]

Sounds fun! Can you link to it though (or create a page, if you haven't already)? How are we determining whether something is about 600 entries away? Is there a particular list of lemmas we could use as a gauge? Andrew Sheedy (talk) 07:06, 4 January 2024 (UTC)[reply]

Here's the list of every English lemma we have at Wiktionary. Page will be made shortly. CitationsFreak (talk) 07:08, 4 January 2024 (UTC)[reply]

@Andrew Sheedy Here's the link: Wiktionary Winter/Summer Competition 2023. CitationsFreak (talk) 07:17, 4 January 2024 (UTC)[reply]

Great, thanks! Could we maybe make it a rule that every sentence of stage directions also has to include a word within about 600 lemmas of the previous word (though without defining it)? Otherwise I think stage directions risk becoming either boring or stealing the show from the dialogue. So following your example, a stage direction that would work instead could be RANDOM CITIZEN #1 puts on a pair of swim trunks. Andrew Sheedy (talk) 07:19, 4 January 2024 (UTC)[reply]

I shoulda said it earlier but I think the original goal of the Wiktionary "competitions" was to add more content to the project. Making learning fun!! Anyway, we have moved beyond the time where "find a word beginning with such-and-such a letter pair" is a challenge involving research and creation, since Wikt is quite huge now; so maybe legitimate games are all that are left :) I hope Wonderfool won't ruin your play (or maybe he will make it great). Equinox ◑ 21:51, 6 January 2024 (UTC)[reply]

Bit concerned about User:Mynewfiles

Smells like our old buddy "Pass a Method". Look at the huge torrent of made-up USA terms tonight. None with citations, none with anything, just pure Polyfilla. Equinox ◑ 07:50, 7 January 2024 (UTC)[reply]

I can guarantee you 10,000% that I am not "Pass a method". Mynewfiles (talk) 07:54, 7 January 2024 (UTC)[reply]

This isn't PaM, just a3a0. — SURJECTION ^{/ T / C / L /} 09:28, 7 January 2024 (UTC)[reply]

Looks like Wonderfool to me. Denazz (talk) 13:44, 10 January 2024 (UTC)[reply]

Affix segmentation with hyphens in derived terms lists in proto-languages

For PIE entries we use hyphens to separate morphological segments in the Derived terms section. I then extended this formatting to Proto-Celtic, given that e.g.:

This increases reconstruction transparency and readability. This goes triple for Proto-Celtic since it had many stacked-prefix words which become hard to parse if run together.
Celtic allows particles and pronouns to intervene between the first and second morphemes of a verb, so making the insertion position clearer with prefix separation by hyphens would be a benefit.
Importing formatting qualities of our well-regarded PIE entries should lift Proto-Celtic entries to a similar quality level as them.

But Victar deleted a set of hyphens on one Celtic verb where I formatted like that, and told me to discuss the issue over here. So should PIE-style hyphenation of affixes be used in derived terms sections of other proto-languages? — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 18:13, 9 January 2024 (UTC)[reply]

There shouldn't be hyphens in Reconstruction entry names, but there's nothing wrong with displaying them with hyphens, e.g. {{l|cel-pro|*atinoweti|*ati-noweti}}. —Mahāgaja · talk 07:33, 10 January 2024 (UTC)[reply]

@Mellohi! Why are you bypassing links to the Proto-Celtic entries? The reason we do this on PIE entries is because they are often too speculative to create actual entries for, but that is far less the case with Proto-Celtic.

I still also do not in support of hyphens in derived terms in lists, which is in breaking of how we format most languages, in addition to being more difficult to read, in my opinion, but may I suggest an alternative? What I've done on some Proto-Iranian entries is add {{q|+ prefix}} to list entry redlinks, see RC:Proto-Iranian/hmáwčati. Is this close enough to what you're going for? -- Sokkjō 23:30, 10 January 2024 (UTC)[reply]

I think we should strive for consistency. We display PIE term segmented by morphemes, because it is universally considered useful. I think that principle should be extended to other reconstructed languages. I think separating prefixes with hyphens is imperative at any rate, for the reasons listed above. (Victar's alternative suggestion looks messy and has no precedent.) If anything, the argument should be about whether we should display *uφor-φi-φoik-e, but I don't currently have a view on that.

(Aside: Mellohi!'s formatting {{l|cel-pro||*uɸor-ɸiɸoike}}, without redlink, is preferable, as it isn't clear whether the term existed in that form in Proto-Celtic or was derived somewhere along the way to Proto-Brythonic. That's the way it's done for PIE. In my view our current guidelines concerning this are lacking, but that's a discussion for another day.) —Caoimhin ceallach (talk) 12:57, 11 January 2024 (UTC)[reply]

If the goal is purely to "strive for consistency", than PIE entries should have their hyphens removed, as hyphen-less derived terms lists is the overwhelming standard. I also don't see many editors for Germanic, Latin or Greek supporting this format for those languages. The argument for cold consistency is flawed from the start because each language has their own linguistic needs and community preferences, both on en.Wikt and academically. -- Sokkjō 22:06, 11 January 2024 (UTC)[reply]

We're only talking about reconstructed languages. I said we should segment words of reconstructed languages with hyphens, to the extent that we consider it useful, as this is a means of clarifying word structure that we already use for PIE. This is not 'cold consistency'. —Caoimhin ceallach (talk) 22:15, 11 January 2024 (UTC)[reply]

We have both Latin and Greek reconstructions as well. Also, your comment on redlinks is not accurate as we can usually rather easily tell when a compound is formed in Celtic or Brythonic by various sound changes. But even so, we always try to reconstruct compounds at the lowest level unless it's obvious they were formed earlier. There is absolutely no reason to void out the link for a PC term in a derived terms list. -- Sokkjō 22:21, 11 January 2024 (UTC)[reply]

For attested languages I think we should stick to the existing orthographic conventions as much as possible. Fully reconstructed languages are different.

I'm not aware of a way to tell when the compounds in question were formed. Unless there is clear evidence that a term existed, I think there should be no redlink. —Caoimhin ceallach (talk) 22:45, 11 January 2024 (UTC)root[reply]

So not on RC:Latin/hendo but yes on RC:Proto-Germanic/erþō. Consistent. Some examples are finding *amm- instead of *ėmm- in i-umlaut environments, or consonant clusters *mβ instead of *mm. Do you also believe that Proto-Germanic entries should void out links in their derived terms lists, for, you know, consistency? -- Sokkjō 21:18, 12 January 2024 (UTC)[reply]

@Sokkjo Then let's change PIE, if consistency is key. If not, why not? The logic of what's being proposed is pretty clear: when a language is not attested, reconstructions should use hyphens. The logic behind it seems very straightforward. Theknightwho (talk) 21:34, 12 January 2024 (UTC)[reply]

It is not. See my comments above. -- Sokkjō 21:44, 12 January 2024 (UTC)[reply]

@Sokkjo I've read them - what I was responding to was your comment that seemed to purposefully misunderstand @Caoimhin ceallach. Theknightwho (talk) 21:45, 12 January 2024 (UTC)[reply]

I don't understand what you're trying to say. —Caoimhin ceallach (talk) 22:10, 12 January 2024 (UTC)[reply]

Which part? I was explaining how you can tell if a compound is formed in PBry. or later over PC. -- Sokkjō 22:34, 12 January 2024 (UTC)[reply]

The example *amm- is unreliable since *ambi- often lost the *i to syncope before it could cause i-affection.^[1] Presence or absence of i-affection of a prefix in Brittonic is not diagnostic of when it was prefixed. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 01:26, 13 January 2024 (UTC)[reply]

1000%, the *i in *ambi- was syncopated before i-umlaut took place. I'm referring to i-umlaut from the root, not from the prefix itself. I assumed that was clear, but thanks for the painfully obvious reference. -- Sokkjō 09:21, 13 January 2024 (UTC)[reply]

@Sokkjo Tone down the snark. It's absolutely unnecessary. Vininn126 (talk) 10:34, 13 January 2024 (UTC)[reply]

Yes sir. 🫡 Glad to see the whole anti-Victar team here. User:Djkcel, wanna comment too? -- Sokkjō 10:36, 13 January 2024 (UTC)[reply]

That's not toned down. Are you even capable of cooling it and having a conversation without belittling someone else? Vininn126 (talk) 10:42, 13 January 2024 (UTC)[reply]

❤️ -- Sokkjō 10:47, 13 January 2024 (UTC)[reply]

Please try to be clear and specific. We're talking about the compounds listed in *ɸiɸoike. And please don't forget that the actual topic of this thread is hyphenation. —Caoimhin ceallach (talk) 11:18, 13 January 2024 (UTC)[reply]

Perhaps you lost track, but the issue is two pronged: 1. hyphenation and 2. the nulling out of links. Above was addressing the latter. -- Sokkjō 11:22, 13 January 2024 (UTC)[reply]

I've said it before, but maybe it'll help to say it again: I really do appreciate the work you put in here. I especially appreciate the way you organise PIE-, and other Proto-entries. I wish we could cooperate in a fruitful way. Instead we always get these long threads filled with fluff. I don't understand the point you tried to make above. I am asking for clarification and I'm asking you to give cogent arguments for your position (which you seem to hold ardently) on the topic of this thread: hyphenation in Proto-entries. —Caoimhin ceallach (talk) 13:59, 13 January 2024 (UTC)[reply]

@Mellohi!, Caoimhin ceallach, Djkcel, Vininn126, Victar Given that Victar/Sokkjo is the only person objecting, and is self-evidently unwilling or unable to explain himself in a way that any other user in this thread can make sense of, the consensus seems to be that Mellohi's and Caoimhin ceallach's use of hyphenation is acceptable in reconstructions. Theknightwho (talk) 05:20, 16 January 2024 (UTC)[reply]

None of my questions were answers or concerns addressed -- just a bunch of bad faith canvasing from the discord. -- Sokkjō 05:33, 16 January 2024 (UTC)[reply]

@Sokkjo You had lots of responses, most of which were trying to get you to clarify what you were trying to say. Theknightwho (talk) 05:48, 16 January 2024 (UTC)[reply]

References

^ Schrijver, Peter C. H. (1995) Studies in British Celtic historical phonology (Leiden studies in Indo-European; 5), Amsterdam, Atlanta: Rodopi, pages 268-276

Reusing references: Can we look over your shoulder?

Apologies for writing in English.

The Technical Wishes team at Wikimedia Deutschland is planning to make reusing references easier. For our research, we are looking for wiki contributors willing to show us how they are interacting with references.

The format will be a 1-hour video call, where you would share your screen. More information here.
Interviews can be conducted in English, German or Dutch.
Compensation is available.
Sessions will be held in January and February.
Sign up here if you are interested.
Please note that we probably won’t be able to have sessions with everyone who is interested. Our UX researcher will try to create a good balance of wiki contributors, e.g. in terms of wiki experience, tech experience, editing preferences, gender, disability and more. If you’re a fit, she will reach out to you to schedule an appointment.

We’re looking forward to seeing you, Thereza Mengs (WMDE)

I was a bit puzzled, but the German text^[3] makes clear that this is about how editors use the same reference more than once in an article. This does not appear relevant to Wiktionary entries, but doesn't everybody use <ref name="..."> for this purpose, if needed? --Lambiam 13:37, 10 January 2024 (UTC)[reply]

Yes, I use <ref name="..."> on Wiktionary entries. I learned this technique on English Wikipedia circa 2018-2019. This usually happens in my editing here on Wiktionary when there is an authoritative source relevant to both the etymology and pronunciation of a term. I used this technique with Amnok, Tuman, Tumen and Yalu for instance, and I know there are some other ones like these four. --Geographyinitiative (talk) 13:44, 10 January 2024 (UTC) (Modified)[reply]

I use <ref name="..."> too. It's the only method shown at Help:Footnotes#Multiple_citations_of_the_same_reference_or_footnote, and I don't recall seeing approaches other than this one. Voltaigne (talk) 16:06, 10 January 2024 (UTC)[reply]

While <ref name="..."> mostly works, one occasionally needs to make references to different pages of the same work, especially for grammatical details. We even have some templates that are set up for taking multiple pages, though that's rarely appropriate for inline references. One trick I've used in the other place is to use an inline notation for pages (basically, adding colon plus page number), using Wikipedia's {{rp}}, but that has the disadvantage of not being invented here. Another method is to have one table of page references and another of the referenced works themselves. --RichardW57m (talk) 17:15, 10 January 2024 (UTC)[reply]

@Lambian, Geographyinitiative:: I hit the matter of multiply referenced books this weekend, with the PTS dictionary references for olugga and quotations for එවං (evaṃ). The former is an example where {{code||2=<ref name="...">}} doesn't merge the references, and the latter is an example where we don't use the reference mechanism at all. (I chose to use both quotations because they may exhibit different collocations - they are probably nowhere near independent in the sense of CFI.) While I don't think the duplications are too bad, it might be better not to have the repetition. We do have dictionary reference templates that link to multiple pages, such as {{R:pi:Childers}}, but in such cases that does not associate assertions with single pages well. --RichardW57m (talk) 10:50, 15 January 2024 (UTC)[reply]

There's a fairly intense use of {{rp}} at https://incubator.wikimedia.org/wiki/Wp/nod/ตั๋วเมือง (Thai script), and its less widely renderable master https://incubator.wikimedia.org/wiki/Wp/nod/ᨲ᩠ᩅᩫᨾᩮᩬᩥᨦ (Tai Tham script). --RichardW57m (talk) 13:09, 15 January 2024 (UTC)[reply]

Words formed by substitution: new template suggestion

In Blood and Crip slang it's common for words to be respelled to start with "B" and "C", respectively. Some examples are kick to bick, cool to bool, Compton to Bompton (currently redlinked, see [4]), as well as (in the other direction) bro to cro and brodie to crodie. The problem is how to explain this in an etymology. On crodie I originally used {{blend|en|Crip|brodie|t1=a member of the Crips gang}} which isn't correct, and doing {{blend|en|C|brodie}} doesn't feel right either since it doesn't make much sense to say a word is being blent with a single letter. I propose a template {{substitution}} or {{subs}} which would function like this:

{{subs|en|kick|B|alt1=(k)ick}}, producing "Orthographic substitution of B into (k)ick"

Other words in English formed similarly are medireview and cdesign proponentsist. I'm curious if there exist similar cases in other languages. Ioaxxere (talk) 03:30, 11 January 2024 (UTC)[reply]

Is this only orthographic or is bick also pronounced with /b/?
Some more examples of either this or a related phenomenon are the (sometimes jocular, sometimes serious/academic) substitution of cis↔trans, e.g. Transformers→Cisformers and translate→cislate (where I said in an HTML comment just a few hours ago that "blend" didn't seem right); also atmosphere→atmosflat; which are also pronounced differently. I am ambivalent about whether this needs to be templatized. - -sche (discuss) 04:53, 11 January 2024 (UTC)[reply]

Right, we have to talk about it. I wondered why you haven’t just created an entry for “prefix” b-, maybe not enough examples. Where is the line though?

It’s not orthographic, this derives from speech. Those gangbangers by default aren’t even as literate as we imagine well-behaved citizens, so I could hear lots of audio examples in my playlists in the background that I have not gotten around to quote. no bap is as often said as no kizzy, but nibba is hardly the like situation, though playing upon the older GM use. Either can be argued separately, whether we should have them at all—where we see the limits of attestation again, where quoting from texts misses out the legitimacy in so far it depends upon speech. But I think the case is lost since we include bowdlerisations and starred sh*t. Fay Freak (talk) 12:11, 11 January 2024 (UTC)[reply]

How is this different from pig Latin and double Dutch? is there a limit on which words can be so modified? As a taboo avoidance, it also reminds me of minced oaths and things like you see in the etymology of bear, where another word is substituted. There's also a phenomenon I've heard of with religious Jews where they change sounds/letters in divine names because they're too sacred to say or write, i.e. "Elokim" instead of "Elohim". Chuck Entz (talk) 14:46, 11 January 2024 (UTC)[reply]

Just now I’ve had the sudden realization that this “substitution” is a simulfix. I added that two and a half years ago to the modules after overviewing the affix types. This part of speech is hitherto exclusively used in the entry 🅱️. Fay Freak (talk) 15:14, 11 January 2024 (UTC)[reply]

Moravian

(Notifying Solvyn, Atitarev, Benwing2, Hergilei, Zhnka, Jan.Kamenicek): and perhaps @Thadh, @Sławobóg, and @Mahagaja as people potentially able to comment: should we consider splitting Moravian as an L2? Have there been discussions about this in the past? Vininn126 (talk) 21:20, 11 January 2024 (UTC)[reply]

@Vininn126 Yes. Sławobóg (talk) 21:24, 11 January 2024 (UTC)[reply]

Shows how good my memory is! We only had one commenter on Moravian (and I don't even necessarily agree with the comment @Bezimenen), so I think we should give it a little more attention and get input from Czech editors. Vininn126 (talk) 21:28, 11 January 2024 (UTC)[reply]

Political argument is bad argument. Sławobóg (talk) 21:51, 11 January 2024 (UTC)[reply]

I'm not making any arguments - I disagree with the idea that a billion can pop up. I have no idea how similar the two lects are and I'm trying to sus that out. Vininn126 (talk) 21:56, 11 January 2024 (UTC)[reply]

I'm referring to his "+ I don't want to give food for thought to Z-Russians". Sławobóg (talk) 22:17, 11 January 2024 (UTC)[reply]

Ah, sorry! Yes, I feel that's completely irrelevant. Vininn126 (talk) 22:19, 11 January 2024 (UTC)[reply]

In fact there is nothing like a unified Moravian language or dialect, there are just many different dialects in Moravia. They can be grouped into five basic groups, which differ from each other very significantly, see File:Moravian_dialects.png. --Jan Kameníček (talk) 22:21, 11 January 2024 (UTC)[reply]

Are they related at least typologically? How different are they from each other? Vininn126 (talk) 22:23, 11 January 2024 (UTC)[reply]

From my lay point of view they differ from each other as much as they differ from standard Czech. Eastern Moravian dialects have also quite a lot of Common with Slovak language. Lachian dialects are a very variable group (with some dialects closer to Polish than others), which some linguists even do not consider to be a single group, but divide them into more smaller groups. --Jan Kameníček (talk) 09:14, 14 January 2024 (UTC)[reply]

They still definitely have, and in the past had, Czech as Dachsprache, and it would never be wrong to add Moravian to Czech, like it rarely turns out wrong to add regionalisms of Arabic dialects under Arabic; here it is even more unintuitive to split and not necessary a priori. One consideration when separating the dialect was not too make up stuff too much, passer-by editors don’t expect: there are people who don’t read instruction manuals and get started with assembling their furniture or playing the board-game right away, though I am not one of them. Hence I at least don’t like a split. Fay Freak (talk) 12:24, 12 January 2024 (UTC)[reply]

It might still be worth it to at least give it an etym-only code. Vininn126 (talk) 12:59, 12 January 2024 (UTC)[reply]

@Vininn126 If as per User:Jan.Kamenicek there is no unified Moravian lect, then IMO it makes no sense to have a "Moravian" etym code, but it might make sense to have several etym codes, one per major dialect area. I don't know. (Although there may be no need for even this; for example, I am doing some work on Catalan now, and Catalan is essentially a pluricentric language with a standardized dialectal norm for Valencian plus various other dialects outside of the Central Catalan standard, e.g. Balearic, Algherese, Northern, Northwestern, but so far we've found no need for etym codes for these dialects; categorizing labels and accent qualifiers are enough.) But I agree with User:Fay Freak that an L2 split is unlikely to make sense given the situation. Benwing2 (talk) 17:19, 12 January 2024 (UTC)[reply]

Adding non-English names for languages to the language data

Before I say anything else, I want to be clear that I'm not suggesting we start adding tons of random translations of language names to our language data module. Instead, this came up because I was looking into Loloish languages; we have approximately 100 of them, and our coverage isn't great at the moment. With one exception, we've not subdivided them into subfamilies at all.

Researching Loloish languages is difficult for a few reasons:

There are a lot of them - we have close to a hundred, but the real number may be much higher, as many are only spoken in a few villages.
Most are poorly documented at best.
Though there's some literature in English, most of it isn't - particularly if you want anything from earlier than the last 20 years.
They all have very similar names (e.g. Aluo, Akha, Akeu, Axi, Azha, Azhe, Ayizi, Hani, Honi, Bisu, Biyo, Lalo, Lalu, Lolopo, Lahu, Lamu, Limi, Lisu, Lipo, Lope, Lopi, Mili, Micha, Moji, Muji, Nasu, Nisu, Nusu, Nuosu etc. etc. etc.)
They all go by at least 2 different names: an autonym, and a transliteration of the name used by the dominant language in the area; usually Chinese, but sometimes Thai, Lao or Vietnamese. The ISO have chosen between them pretty randomly, presumably based on whatever name was used on the application. Sometimes they can have as many as 3 or 4 names, if spread over national boundaries, and sometimes you can throw in a literal English calque as well (e.g. "White Yi" or "Eastern Samadu"). This also leads to a lot of confusion when trying to determine if two authors are referring to the same language, particularly when homonyms occur.

At the moment, we have a way to keep track of language aliases and varieties in the extra language data modules (e.g. Module:languages/data/2/extra), which are then displayed on the category page for the language. However, this is currently restricted to English. I thought it might be useful to allow select translations which you might plausibly come across when trying to research a particular language; a native name would obviously be useful, but in cases like this it would also be useful to keep track of the Chinese names (especially given that many of these would likely fail CFI in English). These could be displayed below the English name(s) (potentially in a dropdown box). Theknightwho (talk) 16:49, 13 January 2024 (UTC)[reply]

Sounds like a good idea but I wonder how well we can stop newbies from adding and/or requesting to be added some bs translations in obscure languages. For instance, if someone came up to me and said the Mandarin name for Moji is, say, 不知道, I, as a person who barely know any Mandarin at all, would be tempted to believe them, and wouldn't really know how to check this. And then we have a faulty name sitting there for years on end. Thadh (talk) 17:31, 13 January 2024 (UTC)[reply]

@Thadh The extra language data modules are also template-editor-only, so I guess that should help to some extent. Theknightwho (talk) 18:46, 13 January 2024 (UTC)[reply]

Sounds cleaner, to have separate data fields for names in other relevant languages, if it serves disambiguation(example restriction criterion to exclude adding tons of random translations). I have template editor originally to add synonyms, as the described problem occurs in English variant spellings alone for the Horn of Africa plentily, and the database names are not the best, often back in the day specifically sporting an arbitrary rare spelling only to be ASCII-compliant, when a year ago or so we added our first L2 language name containing ü, so other lects should have Ethiopist spellings later.

Users can request all they want, it is apparently not avoidable to have only trusted users adding them and have other users make motions with usage examples—they can also create the language names in the main-space for the purpose and this approach is acceptable if one is specifically interested; adding them to the language data was just faster and necessary not to get drowned when I already had a buttload of tabs open while creating some Proto-Semitic with further comparisons, so Thadh has a pseudo-concern here, you believe yourself more than anyone. Fay Freak (talk) 18:24, 13 January 2024 (UTC)[reply]

As I understand it, data modules can be edited by many classes of user. We might want to be extra restrictive for this module. It does not seem likely that all attested language names should be includable in the module, so extra discretion is required, eg, for polysemic language names. DCDuring (talk) 18:46, 13 January 2024 (UTC)[reply]

Formatting of Hesychian glosses

I've recently been adding and editing Hesychian glosses (for Ancient Greek), and encountered multiple different formats for entries. It might be appropriate at this time to establish a standard formatting for Hesychian glosses. My preference goes out to the way σχερός (and διαιτός) is formatted (although the automatic 'request for translation' attached to the quotation from Hesychian does not seem nescessary). @Chuck Entz, @Erutuon? AntiquatedMan (talk) 11:49, 14 January 2024 (UTC)[reply]

I think that's a good idea. Wouldn't it also be nice to have a Category:Ancient Greek(/Thracian/Lydian) terms recorded by Hesychius? —Caoimhin ceallach (talk) 21:58, 17 January 2024 (UTC)[reply]

Sound like a good idea to me, though I unfortunately don't have experience with creating categories. Do you (know someone who has)? Maybe there is a way to add words with "RQ:grc:Hesychius" to such a category automatically (though the "grc" currently makes it so that they erroneously get added to the category "Greek terms with quotations")? Might have to find a different template. I'm open to suggestions. AntiquatedMan (talk) 10:51, 18 January 2024 (UTC)[reply]

Extending Cantonese Jyutping

There are a considerable amount of Hong Kong Cantonese words that are borrowed from English, some of which contains sounds not found in Cantonese, e.g. /ɹ/. I suggest that we should extend the jyutping system to represent such phonomes, by adding the initials r /ɹ/, zh /tʃ/, ch /tʃʰ/, sh /ʃ/ and codas f /f/, s /s/, zh /tʃ/, ch /tʃʰ/,sh /ʃ/, q /ʔ/, as well as allowing rime combinations that are not permitted in jyutping, e.g. oem. The system has been made to be very intuitive to use, in fact it is just a codification of an ad hoc system commonly used in relevant discussion. Some examples can be found at User:Wpi/jyutping.

This system can also be used to transcribe other words that cannot be represented in jyutping, including certain particles and onomatopoeia, and also historical pronunciation of Cantonese before z/c/s and zh/ch/sh were merged.

However it should be noted that I am not particularly certain on the treatment of z/c/s - {{zh-pron}} outputs the palato-alveolar series when the nucleus is a high back vowel and alveolar otherwise, but this is not a phonemic distinction as far as I am aware of, when only the modern phonology of standard Cantonese is of concern. IMO jyutping z/c/s should always be alveolar.

@Justinrleung, RcAlex36, Mahogany115 – wpi (talk) 13:42, 14 January 2024 (UTC)[reply]

@Wpi: Thanks for starting the conversation on this. I am in general in favour of most of the extensions here. However, there are a few that I don't know if we should add, such as -q, which I can't think of any examples of. I also wonder if there are other attempts at extending Jyutping that we can follow. — justin(r)leung _{{ (t...) | c=› }} 04:52, 22 January 2024 (UTC)[reply]

@Justinrleung: Thanks. I agree that some of them, such as -q, shouldn't be added if there isn't any actual example.

I know that there are a number of ad hoc jyutping-like attempts: earliest one that I am aware of is trading trei1 ding2 in 香港粵語與英語的語碼轉換 by 李楚成; another notable one would be drai1 (probably a typo of draai1) for dry in Bauer's ABC Cantonese Dictionary (I've only went through A-F, so there might be more in the later sections) None of them are systematic (though the one used by words.hk is probably the most extensive one but it's not properly documented), but from my observation the proposed extended system should be fully compatible with all such ad hoc ones. – wpi (talk) 14:29, 22 January 2024 (UTC)[reply]

@Wpi: Actually for these two consonant clusters tr- and dr-, I wonder if we should treat them as chr- and zhr-. Words.hk has chr- for something like trainee. — justin(r)leung _{{ (t...) | c=› }} 15:36, 22 January 2024 (UTC)[reply]

I see. In that case I think a more systematic one-to-one mapping approach is better, at the expense of slightly obscuring the pronunciation and the relation to the English word. – wpi (talk) 11:21, 23 January 2024 (UTC)[reply]

Notability for conlangs

What is the threshold for notability of conlang words here? I see we have Na'vi and Toki Pona here. Is there a rule about what makes a conlang notable enough for inclusion? Does the work of fiction need to satisfy the English wikipedia GNG for example? Or does the language itself need extensive coverage? Or can just any conlang have an Appendix and entries here? Immanuelle (talk) 22:10, 14 January 2024 (UTC)[reply]

@Immanuelle No, Joe Schmo's made-up conlang can't be added even to the Appendix. I'm not sure the formal process but there needs to be consensus concerning notability, and any proposals for conlangs going into the mainspace are subject to far more stringent rules. Benwing2 (talk) 22:55, 14 January 2024 (UTC)[reply]

@Benwing2 I guess we can look at Appendix:High Valyrian and see details of its creation. It seems it was discussed here Wiktionary:Grease pit/2023/July#High Valyrian before being added. I don't know what the Grease Pit is though. Immanuelle (talk) 23:01, 14 January 2024 (UTC)[reply]

@Immanuelle The Grease Pit is for technical discussions. That discussion should have been here in the Beer Parlour as it is a policy issue. Is there any specific conlang you're interested in having added? Note that CAT:Belter Creole language bypassed the normal discussion process and then was retroactively blessed, but I would strongly oppose anyone else doing something like that. Benwing2 (talk) 23:11, 14 January 2024 (UTC)[reply]

@Benwing2 okay that is good I am in the right place. I want to add Ithkuil. Immanuelle (talk) 23:20, 14 January 2024 (UTC)[reply]

@Immanuelle ~~While "Appendix" is just about synonymous with "Wild West",~~ [see below] I am not sure that Ithkuil is the best fit for adding to Wiktionary:

Unlike other "experimental" constructed languages like Toki Pona (211 entries - which may well become mainspace-worthy in a few years' time) or Lojban (3721 entries), it seems like Ithkuil has a sprawling lexicon that would amount to potentially tens of thousands of entries - and that's not counting compounds.
The language is evidently still under revision. According to Wikipedia, a new "release" with completely revised phonology was published only last year.

It might be better to wait until the language has bedded down a bit before adding it. ~~However, having said all that, I would not personally get in the way of any efforts to add it, so long as they are confined to the Appendix. Things don't get in anyone else's way there.~~ [see below] This, that and the other (talk) 23:42, 15 January 2024 (UTC)[reply]

@Immanuelle: Since no one’s yet posted a link to the official policy, see Wiktionary:Criteria_for_inclusion#Constructed_languages: “One use in a durably archived source is the minimum attestation required for an individual entry in an appendix-only constructed language.” As for whole languages, they “may have lexicons in the Appendix namespace at the community's discretion”. In short, you need (1) community approval for the language, and (2) the ability to point to an attestation in a published source (upon request) for each word. — Vorziblix (talk · contribs) 14:53, 16 January 2024 (UTC)[reply]

Heh, I didn't realise we had a policy on this. I take back my statement above in that case! This, that and the other (talk) 00:31, 19 January 2024 (UTC)[reply]

Theknightwho adding new language codes again

User:Theknightwho is once again adding new language codes without any discussion, despite being asked multiple times not to do so. @Benwing2, -sche, Chuck Entz, Mahagaja, Mnemosientje --{{victar|talk}} 23:11, 15 January 2024 (UTC)[reply]

@Victar What changed? I can't tell from that diff. @Theknightwho Can you please revert your additions of new language and family codes? Victar is right that you definitely should not be doing this without prior consensus. Benwing2 (talk) 23:18, 15 January 2024 (UTC)[reply]

@Benwing2 You should note that @Victar is (again) lying: it’s family codes, and they’re for the Loloish languages who no-one has touched in years. That’s why I felt safe doing it without discussion, because it’s not an area any editor has sustained interest in. I can raise a discussion about it, but it’s very obvious that Victar is doing this for personal reasons, which is why he lied about it being language codes. Given he has form on objecting to these kinds of proposals if I make them, I have don’t want him engaging in any discussion that I initiate unless he goes out of his way to not behave in the way he always tends to do. Theknightwho (talk) 23:35, 15 January 2024 (UTC)[reply]

Whether it's language codes or language families and groups or macrolanguages or whatever, I think it's a distinction without a difference for all ISO 639 codes. —Justin (koavf)❤T☮C☺M☯ 06:21, 16 January 2024 (UTC)[reply]

Also note that without proto-languages (which I did not add), these codes have little practical effect on mainspace. Theknightwho (talk) 23:40, 15 January 2024 (UTC)[reply]

@Benwing2: Diff link fixed. --{{victar|talk}} 23:47, 15 January 2024 (UTC)[reply]

@Theknightwho: You've been asked not to create languages codes --- be them language families, etymology codes, or proper languages -- without starting a discussion first. You still haven't reverted your changes to the Dravidian family tree. It doesn't matter if the language family isn't well covered; Arawak is very poorly covered, but it didn't stop User:Vorziblix (an admin for much longer than you) from starting a thread to petition some changes. You appear to think you're above protocol, above needing approval, and anyone that calls you out is doing it for "personal reasons" and break into ad hominem attacks. This behavior really needs to stop. --{{victar|talk}} 05:53, 16 January 2024 (UTC)[reply]

@Victar I don't think I'm above protocol, and I'm happy to have a discussion about it. However, family codes in and of themselves have very little impact on mainspace, unlike language codes, because I can't go and create 100 entries in a language that people didn't want to be added, for example.

I also don't think "anyone that calls [me] out is doing it for "personal reasons"", which is why I don't say that kind of thing about Benwing2, Chuck Entz or many other people who have called me out for doing things from time to time. I think you're doing it for personal reasons, because you have lied about this twice now, and it all started after I caught you out being dishonest about some academic you claimed to know personally. Just stop, please. Theknightwho (talk) 05:59, 16 January 2024 (UTC)[reply]

And yet here we are again.^[1][2] How am I lying, when I provided a diff link? And regarding Dr. Borjian, how absurd you would think I would fabricate a story about speaking with him. Email him yourself, if you like.

You've been an editor here for 2 years -- I've been working with language codes on the project for nearly 10 years. I understand the process, and if anyone was doing what you are doing, I would have thrown down a red flag. I'm not here on some personal vendetta, even though that might be your modus operandi. Receipts: Wiktionary:Beer_parlour/2015/January#Proto-Arawak, Wiktionary:Beer_parlour/2017/May#Language_codes_for_Bourbonnais_and_Poitevin, Wiktionary:Etymology_scriptorium/2018/January#PII_language_codes, etc.

--{{victar|talk}} 18:14, 16 January 2024 (UTC)[reply]

@Victar Because language codes are not family codes, and you are fully aware that adding family codes is less impactful so less likely to draw attention from others if you mentioned that in the title. And please - you accuse quite literally everyone who disagrees with you of some kind of personal bias (e.g. your bizarre conspiracy theories on the thread about Proto-Celtic hyphenation), so you can drop the weird projection.

It was extremely obvious you had fabricated whatever conversation you claimed to have with Borjian, because (a) you quoted an article in which Borjian only mentioned Gorgani in passing to make it seem like his opinion was that it was a dialect, (b) you didn't even acknowledge Borjian was the author of the main article I was basing the proposal on (in which he referred to it as a language numerous times), and (c) you gave the wrong date for the article you were citing, which conveniently post-dated it to the other (when it in fact pre-dated it by several years). So either you intentionally gave the wrong date to make it wrongly look like he'd changed his mind, or (more likely) you made a mistake while bullshitting about your credentials, and got caught off-guard.

Theknightwho (talk) 08:43, 17 January 2024 (UTC)[reply]

To quote Koavf above, "it's a distinction without a difference". You're making a big deal about nothing. The fact that you believe the above about Borjian to be true speaks more to you than me. --{{victar|talk}} 18:04, 17 January 2024 (UTC)[reply]

@Victar No, all it speaks to is that you prioritise condescension over honest discussion. The article (The Extinct Language of Gurgan, 2008) exists, whether you like it or not. Theknightwho (talk) 19:54, 17 January 2024 (UTC)[reply]

Aside from the ad hominems (which do distract from the point at hand), I have to agree with Victar & Benwing here. It annoys me too sometimes to have discussions that end up taking forever, such as this discussion which took over a year, but I do think that it's important to at least have the discussion first when it comes to language code changes. If no one responds, then fine, the change can be made (and noted in the discussion), but it should be had. AG202 (talk) 06:16, 16 January 2024 (UTC)[reply]

And I've been frustrated in the past for how slow the process can be, but like all requests, RFDs, RFMs, etc., having the rational documented is important. --{{victar|talk}} 18:17, 16 January 2024 (UTC)[reply]

I will roll this back and open a thread about the Loloish languages. However, I'll do it later on today since it involves lots of data modules and I haven't got time right now. Theknightwho (talk) 08:43, 17 January 2024 (UTC)[reply]

It's been nearly a month and User:Theknightwho has not undone any of his additions nor started any discussion, as far as I can tell, on their inclusion. @Benwing2, -sche, Chuck Entz, Mahagaja, Mnemosientje, AG202 --{{victar|talk}} 21:01, 10 February 2024 (UTC)[reply]

@Victar I think he did start a discussion in WT:RFM although I am not completely sure since there are multiple discussions for adding new languages. User:Theknightwho can you undo your changes? Benwing2 (talk) 21:17, 10 February 2024 (UTC)[reply]

@Benwing2 Not on this, since it's complex and I've been putting things together. I'm unsure why @Victar is being so impatient - I wasn't aware that he had much interest in East Asian languages. Theknightwho (talk) 21:20, 10 February 2024 (UTC)[reply]

Now per User:Theknightwho's retaliatory nature, he reverted a slew of my edits in punishment for posting this. --{{victar|talk}} 21:28, 10 February 2024 (UTC)[reply]

@Victar At what point do you think playing the victim is going to start working for you? I haven't seen much success for you so far.

I made these reverts ([5], [6], [7], [8]) because they were typical examples of you pointlessly removing information added by others in order to claim ownership over entries; something you routinely do, which we've all seen before. You must think we were all born yesterday. Theknightwho (talk) 21:44, 10 February 2024 (UTC)[reply]

@Benwing2 or @-sche, would you mind reverting User:Theknightwho's additions for him? And Benwing, no, none of what he posted to WT:RFM are requests for these particular language families. --{{victar|talk}} 22:35, 10 February 2024 (UTC)[reply]

@Victar I dont need other people to do things for me, thanks. Theknightwho (talk) 22:40, 10 February 2024 (UTC)[reply]

Middle English and Scots to WT:RFVE and WT:RFDE

Would anyone object if we moved requests for verification in Middle English and Scots from RFVN to RFVE?

Rationale:

There is sufficient institutional familiarity with these languages at RFVE to enable the requests to be dealt with there. I'm not aware of any Middle English editors who don't also dabble in Modern English, even if only the Early Modern chronolect.
Resolving the English requests at WT:RFVE, one regularly finds oneself straying into Middle English- and Scots-specific resources such as MED and DSL. What's more, some resources (OED) cover all three languages.
WT:RFVN is terribly backlogged. This will help, even if only modestly.

Dealing with Old English RFVs requires its own specific skill set which is not shared by many (most?) RFVE editors, and it would be better not to move Old English RFVs for the time being.

If this is done, we would need to do the same for RFD. This is mostly academic, as RFDs in Middle English or Scots are rare. There are none open at present. This, that and the other (talk) 23:32, 15 January 2024 (UTC)[reply]

@This, that and the other No objections. I agree about keeping Old English out. Benwing2 (talk) 05:16, 16 January 2024 (UTC)[reply]

@This, that and the other We should add to that any language derived from Middle English. The same arguments apply, and it would reduce the workload more for RFVN. CitationsFreak (talk) 16:01, 16 January 2024 (UTC)[reply]

@This, that and the other:: I would be inclined to restrict it to languages that can inherit from Middle English - I'm not sure that creoles and pidgins should be included. --RichardW57m (talk) 17:03, 16 January 2024 (UTC)[reply]

That seems reasonable (Middle English, Scots, plausibly Yola, but not pidgins). - -sche (discuss) 17:39, 16 January 2024 (UTC)[reply]

I agree - the creoles and pidgins require specialised resources and knowledge and are best left at RFVN, but it makes sense to also bring Yola and Fingallian across to RFVE. My understanding is that verifying a Yola or Fingallian word is a matter of simply searching through the extant texts in each language, which are few in number. This, that and the other (talk) 22:12, 16 January 2024 (UTC)[reply]

Yes, agreed: that would give us English, Middle English, Scots, Yola and Fingallian (i.e. Anglic minus Old English). Theknightwho (talk) 21:21, 17 January 2024 (UTC)[reply]

Oppose: it seems like your rationale is based off of Middle English being "basically" English, which seems reasonable if you only look at Late Middle English (c. 1300–1499) but obviously doesn't apply for the earlier text. No amount of "straying" will help you understand the Ormulum, for example, and I don't think it would be a good idea for RFVs to be resolved by people with very limited ability in a language. There's also the inconvenience of having to to specify the language when making an RFVE as well as having to learn the differing CFIs (LDL vs WDL). Scots admittedly doesn't seem as problematic (depending on your take on the dialect/language debate), although many works in Middle Scots are extremely hard to understand as well and I would oppose it for the reasons stated above. Ioaxxere (talk) 03:56, 18 January 2024 (UTC)[reply]

My rationale is mainly to do with institutional familiarity - many of the people who would handle Middle English RFVs at RFVN also frequent RFVE. The overlap is not total, of course; editors who don't wish to participate in Middle English RFVs can ignore them, as they presumably do now. And ignorant contributions from uninformed editors can be disregarded, just as they would be today.

As for inconvenience, specifying the language and dealing with LDL vs WDL is already something that other RFV venues have to deal with, and do. These are pragmatic matters, easily handled. This, that and the other (talk) 05:33, 18 January 2024 (UTC)[reply]

It's a side point, but Middle Scots should definitely have its own L2, if I'm honest. Theknightwho (talk) 05:37, 18 January 2024 (UTC)[reply]

Support Andrew Sheedy (talk) 17:53, 20 January 2024 (UTC)[reply]

Notwithstanding the opposition from Ioaxxere, I see a general consensus for the change, so I'm going to go ahead and implement it. This, that and the other (talk) 00:38, 30 January 2024 (UTC)[reply]

Goral

Let me preface with the fact that I know I just split Masurian in an unconventional move, so I don't want to come across that I am pushing this hard, rather I'm trying to resolve a potentially problematic lect.

This is the last so-called "microlanguage" within Poland originating from Old Polish, therefor it's the last ethnolect that could cause any sort of trouble when adding content for. In all my research, no other dialect stands out against the others as much as Goral, Kashubian (which isn't even from Old Polish), Masurian, and Silesian, and all sources I check say the same thing.^[1] Whenever I look at another dialect, i.e. Northern Borderlands or Kujawy, it's without a shadow of a doubt a dialect of Polish.

The term microlanguage is a slippery term in Slavic linguistics, but in Polish linguistics these four lects are always mentioned together as microlanguagee.

Kashubian isn't even from Old Polish and essentially all linguists consider it a language, and some call it a "dialect" only in terms of prestige. Silesian is likely to get official minority language status from the Polish government this year and the only people to still call it a dialect usually belong to an older school. Masurian is highly divergent.

There is one more name floating around, which in Polish is called "Wicki", a literary form from the 80's and 90's, but it never took off, and the Northern Borderlands dialect far prevails.

Goral sits in a sort of in-between zone. It's mutual intelligibility with Polish is higher, closer to the intelligibility between Polish and Silesian, but with a more distinct phonology, as it's not as divergent, notably it has w:Masuration and initial stress, and historically retains reflexes of the pochylone vowels (<á é ó> being /ɒ ɨ̃~ɘ o/) up until the twentieth century, where /ɒ/ merged with /ɔ/ (bringing it closer to Polish). The phoneme /ɨ/ is realized as /i/ and is written <ý> after <rz s z c>, and <rz> is /r̝/ regionally, but made it a sibilant like Polish also regionally. Final -ch has changed to -k. Labialization of initial /ɔ/ is present, which is found in many dialects, but can also be found within the stem, something much rarer for dialects and pre-iotation and initial aspiration is present in highly lexicalized words.

There is a sense of mixed Polish/Slovak identity, however the fervor in this is perhaps less, especially in comparison to Silesian. That being said, Goral identity is incredibly important to them and could be compared to Irish in Ireland.

Declension is also markedly different from standard Polish, and most editors have expressed that they would not want "dialectal" forms alongside standard forms, meaning that if someone wanted to document them they would have to make a separate table.

Poles report mixed things when talking about intelligibility - it highly depends on the speakers given, ranging from very intelligible to hardly.

omniglot has some good materials, and dialektologia has one of the best summaries.

Artur Czesak (2008) Sytuacja językowo-polityczna etnolektów górnośląskiego i podhalańskiego wśród (obok) słowiańskich mikrojęzyków literackich‎^[9] (in Polish) gives a very comprehensive summary of Silesian and Goral (and mentions the other two ethnolects in passing), explaining how they fit the definition of "microlanguage" perfectly well. Silesian, since the time of writing the article, has developed a much stronger literary standard.

One problem with the loss of pochylone á is that if Goral were to be set as a dialect, alt forms would be messy. We would have something like "obsolete Goral" followed by Goral. An example can be found in klag - it historically had pochylone á. In Middle Polish, we do not reflect this in the orthography, as is not the standard when talking about Middle Polish. In alternative forms, however, we could give * {{alt|pl|klág||obsolete Goral}}* {{alt|pl|klog||Goral}}. I have not seen any other dialect of Polish do this.

It would be possible to lemmatize to a single writing system. There are many dialects of Goral, and people typically write how they want, but there are literary tendencies.

There is no shortage of resources, including every dictionary here (among others_, plenty of literature reaching back to the 19th century, a modern article website, and even a corpus. Vininn126 (talk) 12:30, 15 January 2024 (UTC)[reply]

In short I'm unsure how to handle the situation and I would like input.

I see two main options

Give an L2. Upside - we are not "clogging up" Polish entries with too much dialectal information (Masurian has been an excellent example of how this is beneficial), and other dialects would be nearly as "rich". We would set it as a descendant of Old Polish, perhaps with a code zlw-gor. Downsides include - an overrepresentation of Lechitic languages/the fact Goral is more intelligible with Polish.
Label with a potential etymcode - I see this option as a compromise, therefor resolving the "overrepresentation of Lechitic languages", but perhaps being messy when it comes to labels. As for the etymcode, I'm not sure how useful it would be, but it's proven more useful for Middle Polish.

References

^ Maciej Mętrak ((Can we date this quote?)) “Unrecognised languages of Poland?”, in ENGHUM summer school poster session‎^[1]

Vininn126 (talk) 16:05, 17 January 2024 (UTC)[reply]

A comment to myself - I'd compare the situation with these lects to the situation between English, Scots, and Yola, for comparison. Recognizable, but definitely on a different scale in terms of difference/identity, etc. Vininn126 (talk) 21:15, 17 January 2024 (UTC)[reply]

@Vininn126 If there is no literary standard, but simply a collection of dialects with some mutual intelligibility with Polish, I would be inclined not to split. Also the sense of separate ethnic identity should have no bearing on whether we do a linguistic split. Benwing2 (talk) 05:57, 18 January 2024 (UTC)[reply]

@Benwing2 Let me clarify on what I meant - Podhalian is the de-facto literary standard for the region. There's also a long literary tradition in the region. Vininn126 (talk) 07:35, 18 January 2024 (UTC)[reply]

@Vininn126 Still, my instinct is you should wait before pushing through another Polish split. Benwing2 (talk) 08:28, 18 January 2024 (UTC)[reply]

@Benwing2 Perhaps. I'm not actually pushing! I'm debating options. The previous statement was a clarification. The reason I'm doing it now is so that people don't start adding content and we change our minds later, having to clean everything up. Vininn126 (talk) 08:32, 18 January 2024 (UTC)[reply]

@Vininn126 Are people actually adding significant amounts of content? If not I think there's no urgency. (The reason I'm saying "pushing" is your presentation doesn't feel completely even-handed; rather you seem to have already decided you want to split, and your writeup reflects this.) In my experience, cleanup that requires merging is often harder than cleanup that requires splitting, and an ill-thought-out split can lead to huge messes (cf. modern Scots, as well as "Norwegian" vs. "Norwegian Bokmål" vs. "Norwegian Nynorsk"; I'm also coming to the tentative conclusion that Old Catalan should be an etym-only variety of Catalan similar to Old Italian and Italian, rather than its own L2). I also think you need to enumerate the downsides of creating a new L2, e.g. duplicated templates that tend to bit-rot as changes are made to the source; duplication of effort in entries that are the same in the two closely related L2's (modulo the dialectal spelling differences), which would go away by having them merged and using soft redirects (e.g. {{alt form of}}) from one to the other; etc. These downsides aren't always obvious at first, but become more apparent in time, so the experience so far with Masurian may not be representative of the longer-term considerations. Benwing2 (talk) 09:02, 18 January 2024 (UTC)[reply]

@Benwing2 These are all fair points. And I don't feel that my write-up reflects this - I tried to present arguments representing both sides. Perhaps you are right and we can start this topic at a later time. Vininn126 (talk) 09:05, 18 January 2024 (UTC)[reply]

Red and Black-Link Disverifications

At present, we don't seem to have a good process for collectively verifying, disverifying or deleting red-linked terms or their black variant in inflection tables. One method would be to create an entry and then subject it to {{rfv}} etc., but this method is frowned upon, if not formally prohibited. Would it be appropriate to allow the creation of entries with {{rfv}} where one editor reasonably believes a term exists and meets CFI but another reasonably believes it doesn't? Creating and deleting for failing RfV would leave a record that verification attempts had failed.

At the moment, there seem to be a number of valid reasons for leaving red-links in entries:

Time. There are more important things to do with the time available for Wiktionary, and Wiktionary may not have the highest claim on one's time.
Availability of evidence. It may not be feasible and affordable to track down evidence for the existence of a word.
Clutter. Adding every variant of every form of a Zulu verb is not helpful; we have a search utility that can find them in the inflection tables. They are also a problem if the generation is wrong - and I suspect several table-generators of over-generating.

I therefore reject the idea of having a time-limit on the existence of red links. --RichardW57m (talk) 13:08, 18 January 2024 (UTC)[reply]

@User:RichardW57 Where have time limits been proposed for red links? DCDuring (talk) 22:01, 18 January 2024 (UTC)[reply]

I think within the past year, in the Grease Pit or Beer Parlour. It wasn't being urged as a formal policy. A similar sentiment was expressed by @This, that and the other on 10 December in Inflections_with_a_red_link_for_singular, where an objection was raised to the notion of permanent red links, thought no time limit was suggested. I do work with a related principle, that although indefinite red links can be OK, I don't maintain that orange links can be OK. --RichardW57 (talk) 22:57, 18 January 2024 (UTC)[reply]

My comment at GP was as follows:

I don't think we should have "forever redlinks". A redlink implies a missing entry. If the entry for a term will never be linked, I feel like it should be displayed using the alt parameter.

I stand by this comment, especially as far as it concerns lemmas. Put another way, a redlink is an invitation to create an entry. If we know a would-be lemma is not going to satisfy CFI, it should not be displayed as a redlink.

As for inflection tables, we don't insist that every non-lemma form of every word must be attested to CFI standards. This would be totally impractical and a poor use of our limited volunteer resources. However, what is necessary before creating non-lemma form entries is an understanding that the inflection table in the entry is correct (for example, the word is attributed to the correct noun class, and our template for that noun class is accurate). This is trivial to verify for languages like English, with few non-lemma forms, and Latin, with an extremely well-documented morphology, but much more challenging in other cases.

I suppose our aspirational "end goal" is to ultimately create non-lemma form entries for every form in every inflection table. This is largely the case already for languages like English and Latin. But it is unlikely to happen quickly, or perhaps ever, for other languages. As a compromise, we turn the redlinks in inflection tables black, so they don't act as such prominent "invitations". This, that and the other (talk) 23:59, 18 January 2024 (UTC)[reply]

Part of me wonders if those should be red (or even turn green with the accelerated nonlemma maker). In cases where a lemma fails RFV, maybe have another bot that automatically move the nonlemma forms? CitationsFreak (talk) 02:27, 19 January 2024 (UTC)[reply]

I definitely have problems with redlinks that would not meet CFI. Many English one should go under a Collocations header at lemma entries. There are many situations that lead to red or orange links. I don't think they can be lumped together very well. For example, in addition to inflected forms and SoP collocations, more-or-less technical vocabulary can appear in entries for other more-or-less technical entries, eg, names of diseases and anatomical features in taxonomic entries. It may be a long time before someone spends the time to write a definition for an entry like that, especially if the entry is not to be as vacuous as many of our new technical entries often are. The point of the organism name templates is to gather evidence from Wiktionary itself about the 'need' for a particular name. To the extent that the system works, it is because the system is relatively closed, taxonomic names are almost guaranteed to have citations, and the templates link to other projects (articles @ WP, taxonomy @ Species) that have some information. Maybe we could do something similar with chemical names. DCDuring (talk) 03:05, 19 January 2024 (UTC)[reply]

@CitationsFreak: I think that (re)moving them is a human job, not a bot job. The solution may be to correct rather than to simply delete terms that fail RfV. --RichardW57m (talk) 10:59, 19 January 2024 (UTC)[reply]

I meant removing nonlemmas whose lemma form fails RFV. CitationsFreak (talk) 16:22, 19 January 2024 (UTC)[reply]

@CitationsFreak: Suppose Polish lump (“good-for-nothing”, noun) failed RfV, but lump (“thrift shop”) survived. (Neither currently has quotations, but Polish is a 'well-documented language'.) Then courtesy of @Vininn126, we'd have to delete the accusative case from the definition line of lumpa, leaving just the genitive case. That's not easy for a bot. Or rather, we would if his merger of the forms of two different nouns to a single entry were correct. I think it's actually plain wrong, as he made that noun form the accusative of an inanimate masculine, contradicting the declension tables. --RichardW57m (talk) 17:29, 19 January 2024 (UTC)[reply]

I'm sure we could have a bot that knows from the declation table that "lumpa" is the accusative case of "lump" for one sense only. Might be a pain to code, but I could see it working. CitationsFreak (talk) 18:03, 19 January 2024 (UTC)[reply]

@CitationsFreak: Now imagine a bot trying to remove the inflections of Moksha вирь (viŕ)! For that matter, consider trying to remove the inflections of a Latin verb. A real nasty would be to remove the non-lemmas from Sanskrit देव (deva, “god”) - the sandhi form देवो (devo) isn't even listed at the entry! --RichardW57 (talk) 23:39, 19 January 2024 (UTC)[reply]

Sorry for the late reply, but a bot could go through the inflection tables, and see every last inflection of every Latin verb, and every declension of देव there, and see that a nonlemma a certain case for a certain meaning of a word. I expect this bot to find these nonlemmas the same way they make nonlemma form of your entry when you make a new one. CitationsFreak (talk) 06:30, 21 January 2024 (UTC)[reply]

@CitationsFreak: But one doesn't see the sandhi form Sanskrit देवो (devo) under देव (deva)! (In itself, that should be fixable by fetching the list from "What links here".) Remember that the critical problem is disentangling {{form of}} when the lemma corresponds to multiple entries, where a particular combination of tags applies to only some of the multiple lemmas covered by the invocation. We also have the cases where different senses have different inflections, and then it may not be reasonable to have a different form-of invocation for each sense.

As to your last sentence, I take it you don't mean 'unreliably'. Some form-inserting bots balk at modifying an existing language section, with some confusing results for Latin past participles. --RichardW57 (talk) 10:05, 21 January 2024 (UTC)[reply]

@This, that and the other: Even English and Latin have their problems with inflection. The analytic comparatives and superlatives of English present problems. We rashly assert that lumbar is not comparable, but at https://neupsykey.com/anatomy-of-the-spine-an-overview/ we find "Orientation of the facets changes to a more lumbar orientation in the T9 to T12 region." (T9 to T12 are the bottom four thoracic vertebrae of the standard human.) Even the conjugation of Latin amō has cautionary footnotes.

More worryingly, we have no mechanism for challenging inflection table modules and templates, and change is supposed to be by consensus, so we're stuck with Pali o-stem dative singulars in -atthaṃ because they're in the Thai wikibook on Pali (and also the Thai Wikipedia), but don't appear in English language grammars. There's also misgeneration of syncopated forms in i-stems - grammars cover these latter forms sketchily. I'd like to create entries for dodgy looking syncopated forms and have them fail RfV. I also have my doubts about masculine dative singulars in -āya, though I know buddhāya is attested - at least, in the phrase "namo buddhāya". --RichardW57m (talk) 13:09, 19 January 2024 (UTC)[reply]

Are the things mentioned common in Pali? If so, then the inflection tables are working. If not, make a Beer Parlour thread about it. CitationsFreak (talk) 18:09, 19 January 2024 (UTC)[reply]

@CitationsFreak: There are quite a few mostly little things wrong like that. The agreement between modern grammar books is poor, and I suspect some of the ancient grammars are simply wrong. An additional issue is that Pali was seriously described, when someone was asking about a spell-checker, as a language of exceptions. I therefore tweaked the declension system to easily accept additional forms, though there is no mechanism to, for example, exclude a specific ablative singular; one would have to override the set to exclude a single value. One problem is that it was set up to accept some probably irregular forms as regular, such as feminine -ī stem accusatives in -iyaṃ, and vocative plural in -ave for -u stems because of the frequent Magadhism bhikkhave. The issues I complained of are in the data tables, though fixing the dodgy syncopation requires changes to the logic gluing endings onto stems, and does need a lot of data to be collected. It may be that the assimilations resulting from syncopation may always need to be treated as exceptions.

Rejecting forms by RfV would give me cover for fixing some of the problems - but at present it seems that I mustn't create entries simply to get them rejected by RfV. Hence this Beer Parlour thread. --RichardW57 (talk) 22:12, 19 January 2024 (UTC)[reply]

You could just make a thread saying "I notice that we don't reflect how Pali speakers typically inflect words in these ways." I'm sure that it would go over much better, and save time as well, sinxe you only need to change one thing. CitationsFreak (talk) 23:26, 19 January 2024 (UTC)[reply]

@RichardW57m your statement that "Rejecting forms by RfV would give me cover for fixing some of the problems" confuses me a little. Who or what are you seeking "cover" from? Can't you just go ahead and dig in yourself? Are there others who you anticipate objections from? If not, I seriously doubt anyone would challenge your Pali expertise. This, that and the other (talk) 23:57, 19 January 2024 (UTC)[reply]

@This, that and the other: From @Octahedron80. He seems to want to support what Geiger calls the 'artificial poetry'. Late Pali is also poorly supported - most of the easily accessible printed material is in 'canonical Pali', so for example I haven't been able to lay my hands on quotations for the late consonants 'ś' and 'ṣ' imported from Sanskrit. Unfortunately, we don't have much of an active community nowadays, so there's not much checking of my work. --RichardW57 (talk) 10:32, 21 January 2024 (UTC)[reply]

Informing you about the Mental Health Resource Center and inviting any comments you may have

Hello all! I work in the Community Resilience and Sustainability team of the Wikimedia Foundation. The Mental Health Resource Center is a group of pages on Meta-wiki aimed at supporting the mental wellbeing of users in our community.

The Mental Health Resource Center launched in August 2023. The goal is to review the comments and suggestions to improve the Mental Health Resource Center each quarter. As there have not been many comments yet, I’d like to invite you to provide comments and resource suggestions as you are able to do so on the Mental Health Resource Center talk page. The hope is this resource expands over time to cover more languages and cultures. Thank you! Best, JKoerner (WMF) (talk) 21:35, 18 January 2024 (UTC)[reply]

Vote on the Charter for the Universal Code of Conduct Coordinating Committee

You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hello all,

I am reaching out to you today to announce that the voting period for the Universal Code of Conduct Coordinating Committee (U4C) Charter is now open. Community members may cast their vote and provide comments about the charter via SecurePoll now through 2 February 2024. Those of you who voiced your opinions during the development of the UCoC Enforcement Guidelines will find this process familiar.

The current version of the U4C Charter is on Meta-wiki with translations available.

Read the charter, go vote and share this note with others in your community. I can confidently say the U4C Building Committee looks forward to your participation.

On behalf of the UCoC Project team,

RamzyM (WMF) 18:09, 19 January 2024 (UTC)[reply]

phonemic /ç/, /œ/ etc in English

A user added /ç/, /œ/~/ø/ and /y/ as phonemes to Appendix:English pronunciation and as pan-dialectal pronunciations to entries. How do we feel about this? I know speakers (not just of English!) who know the original-language pronunciation of a word may use that foreign-language pronunciation, and I'm sympathetic to this(!), e.g. I find it jarring when English speakers pronounce Königsberg /keɪn-/ and I would like to list a less-jarring pronunciation there. OTOH, it seems like we've tended to list (e.g.) German pronunciations in ==German== sections, not ==English==, similar to how we handle code-switching in writing (we don't have ἄρχων#English just because people who know it use it). And if we want Appendix:English pronunciation to cover foreign sounds, we need more: as much as people code-switch to use German vowels, people code-switch to use e.g. Arabic sounds like /ɣ/ (look at Youglish examples of ghusl).
I also question who really uses e.g. /ˈvyɹ.təmˌbɛɹk/: it's presented as the pan-dialectal pronunciation as if even non-rhotic dialects use it, but who decides to use non-English phonemes but not the native German phonemes, /ˈvʏʁtəmbɛʁk/ with /ʏ/? IMO we need to label such things if we keep them: presenting that or e.g. /œ-/ as the primary pan-dialectal phonemic pronunciation with no label is surely not right... (@Inqvisitor.) - -sche (discuss) 00:58, 20 January 2024 (UTC)[reply]

Any borrowing can potentially be pronounced in a variety of code-switching manners; the rabbit-hole here is as endless as it is pointless. Delete it all. Nicodene (talk) 01:12, 20 January 2024 (UTC)[reply]

Just for the record, I did not add any phonemes. That is total misrepresentation. The online table itself is the AHD Pronunciation Key; it was just missing a couple of sound/symbol correspondences from the AHD Pronunciation Key itself. Both the symbols and the pronunciations in question are simply representing what is the published in the American Heritage Dictionary of the English Language, 5th Edition; which is accessible online, both the individual entries (with pronunciations) and the key (p. xxviii in the print version): https://www.ahdictionary.com/application/resources/misc/pronkey.pdf

These are not purported to be English phonemes but are in fact explicitly identified as foreign sounds occasionally found in English words loaned from foreign sources (most often French words, sometimes German). The absence of these foreign symbols was not even noticed until recently, since the number of relevant words that contain these sounds is minuscule (making this argument all the more ridiculous). OP is reading far too much into the fact that these symbols are included in the AHD Pronunciation Key of AHD symbols (with their corresponding IPA symbols) as explicitly foreign sounds (which are identified in the pronunciations provided by every individual word entry in AHD). They are not English phonemes; they are there for help pronouncing a literal handful of foreign words.

Inqvisitor (talk) 01:17, 20 January 2024 (UTC)[reply]

Ah! My apologies, I didn't realize you merely misunderstood what Appendix:English pronunciation was for and what /phonemic/ notation was; that's my bad—and it's not uncommon that people don't grok that, so maybe we should explain /phonemic/ (and [phonetic]) notation somewhere more prominent, and state the scope of the appendix more explicitly than just "for English dialects […] This vowel table below lists the standard phonemic vowel notations". If you agree these aren't English phonemes, we can just remove them from English pronunciation sections and from that appendix—since it is, again, for the notation we use for English phonemes in our own /phonemic, slash-enclosed/ IPA and our own en-PR, and is not intended to be a copy of any other list, neither a list of "every IPA symbol" nor of "every AHD symbol" (people can find those lists elsewhere if they want them). - -sche (discuss) 02:13, 20 January 2024 (UTC)[reply]

Yea, appears this is perhaps rooted in misunderstanding. AHD/enPR symbols constitute a simplified phonetic transcription system: easier to use for Anglophones who don't understand /IPA/; (enPR = English PRronunciation). Each English Wikt entry's Pronunciation section links to Appendix:English pronunciation as practical user-friendly "key" to deciphering AHD/IPA symbols with examples as a guide. AHD transcription is meant to help the average person look up how to pronounce English words as quickly and easily as possible; AHD is by no means intended to be any sort of scholarly inventory of English phonemes (probably better fit for Wikipedia articles on English phonology). Unlike IPA, all AHD symbols are intended for English usage, befitting The American Heritage Dictionary of the English Language. So, naturally, all AHD symbols fit in the Appendix which is itself an online reproduction of the AHD Pronunciation Key.

AHD/enPR is a sort of phonetic re-spelling, using symbols naturally intuitive to readers of English, e.g. AHD symbol /y/ as in English /yes/ = IPA /j/, whereas English /j/ is often IPA /d͡ʒ/–hence AHD /yĕs/ = IPA /jɛs/. Likewise, young English students are taught a 'long e' matches the sound of AHD /ē/ (as in Eng. be)–which is IPA /i/.

AHD symbols were not designed to represent English phonemes, nor are they fit for such a task. They were meant to provide fast easy phonetic transcription of English words. AHD symbols can be listed in a table alongside IPA symbols as realizations of phonemes, but they are not themselves phonemes. Whoever misinterpreted AHD symbols as representing English phonemes should re-read the AHD Pronunciation Key and Guide to the Dictionary. AHD = simple phonetic symbols, not unlike those employed by other English dictionaries that seek to make themselves more accessible to those who don't understand IPA.

And so, lastly, there exist a handful of symbols for words with foreign sounds. Due to history, English is especially littered with French words/sayings meant to be pronounced as close to the original French as possible, e.g. pronouncing the last syllable of French disant with silent /t/ and nasalization of preceding vowel as [zɑ̃] is not English (contrast final syllable in English dissent [sɛnt]). But English borrowed French adjective soi-disant, given in AHD as /swä′dē-zäɴ′/ — /ɴ/ listed in Foreign section of AHD Pronunciation Key indicating French-style nasalization of preceding vowel.

Theoretically, as with oeuvre, an Anglophone can try to force all native English phonemes onto soi-disant: pronounce disant like English dissent, render soi like English soy (AHD /soi/, IPA /sɔɪ/), however that will sound incorrect and uneducated at best to English-speakers. In such cases, imitating the unadapted source French phonetic realization is actually the correct English pronunciation, even if that means not sticking to native English phonemes. The foreign section of the AHD key serves that limited role, viz. transcribing those rare words that are correctly pronounced according to their foreign language of origin, rather than conforming to standard English phonemics. Inqvisitor (talk) 05:51, 20 January 2024 (UTC)[reply]

[Citation needed] for the normal pronunuciation of any borrowing in English having phonemes alien to English. Nicodene (talk) 16:15, 20 January 2024 (UTC)[reply]

OK, I reduced /ç/ to being a footnote rather than a phoneme in the pronunciation appendix, since even Inqvisitor acknowledges it's not an English phoneme. I don't want to just completely remove things like /ˈkøːnɪçsˌbɛɹk/ if we can agree on some way of labelling them and some standard for which ones to include... we do already have labels such as "emulating French" and "like Spanish", would it work to do something like this for oeuvre, this for ghusl, etc, or do we feel (as Nicodene puts it) the rabbit-hole of code-switching pronunciations is too endless and French (Arabic, etc) pronunciations using non-English sounds are better left to ==French== (==Arabic==, etc) sections? - -sche (discuss) 03:05, 22 January 2024 (UTC)[reply]

Declension tables for different etymologies

Consider the case of three homophonous homographic nouns, each with its own etymology section. If the first and third are declined differently, but the second is declined as one of the others but no explicit indication of the declension is given, should the user be expected to assume that it is declined as one of the other two? If so, should he work out which one it is declined like by comparing their attributes, or should he use some rule such as ascend or descend, and if so, which? I'm asking here because @Vininn126 and I don't agree over what to do for Polish lump. --RichardW57 (talk) 13:34, 20 January 2024 (UTC)[reply]

You should add inflection tables to each etymology separately. Doesn't matter if they are the same or different. Thadh (talk) 15:31, 20 January 2024 (UTC)[reply]

I personally have no dog in this race, I merely was maintaining something that was in place when I started editing. If it's normal to include a second, repeated table, I have no problem doing so. Vininn126 (talk) 15:42, 20 January 2024 (UTC)[reply]

Paleo-Balkan language family code

I'd like to propose ine-blk as a new language family code encompassing the "Albanian language family" sqj, Dacian xdc, Illyrian xil, Messapic cms, Thracian txh, and lastly Paeonian ine-pae purely for geographical reasons. These are all languages we know little of and hence their classification is uncertain, however the term "Paleo-Balkan" is used in modern Anglophone literature to refer to them as a group, and it would certainly be helpful to have this for etymologies and categorisation. Mysian yms and Phrygian xpg are usually linked to the group but are not in the Balkans but so I'd leave them out for now. Excluding Liburnian xli as it is likely closer to Venetic. Ancient Greek is without a doubt excluded, despite it occasionally falling under this category in some sources. Another option could have been excluding Paeonian and calling the group Thraco-Illyrian, however, that is not my proposal. Catonif (talk) 10:19, 21 January 2024 (UTC)[reply]

Forgot to mention this would lead to the deletion of the code qsb-bal with its unnecessarily long "a pre-Roman substrate of the Balkans" name. Catonif (talk) 10:43, 21 January 2024 (UTC)[reply]

Oppose Far too controversial of a grouping. Why not just create an etymology-only code, or rename qsb-bal? --{{victar|talk}} 20:42, 21 January 2024 (UTC)[reply]

By etymology-only code do you mean without any children? I'd like to make it so that terms derived from e.g. Illyrian are in a subcategory of a greater category with derivations from any unspecified Paleo-Balkan languages, which at the moment would be qsb-bal. With a mere renaming this subcategorisation thing wouldn't occur. Most importantly, qsb-bal is currently just a "substrate", and it isn't specified anywhere that it refers to IE languages. Catonif (talk) 20:55, 21 January 2024 (UTC)[reply]

Yes, that's what I mean. I think it should only be used in etymologies to refer to a sprachbund, not a family grouping. --{{victar|talk}} 21:32, 21 January 2024 (UTC)[reply]

I can see your reasoning and agree we should not try to endorse relatedness where we can't prove it. The reason why I proposed a Paleo-Balkan group rather than an Thraco-Illyrian one is that Thraco-Illyrian grouping actually assumes these languages are closely related, while Paleo-Balkan by definition refers to languages of unclear classification, making it essentially a greographical grouping, much useful in categorisation. I don't like, from a technical point of view, treating Paleo-Balkan as just a "substrate" however, as the term refers specifically to IE languages. Catonif (talk) 22:00, 21 January 2024 (UTC)[reply]

(edit conflict) Technically speaking, an etymology-only code functions as an alias of a part of something so it can be referred to separately in etymologies, even though the term links go to the section for the thing it's a alias of. For instance, we would have an etymology-only code for Early Modern English, which would allow etymological categories for terms derived from Early Modern English, even though the terms themselves would be under English.

As to the issue at hand: my impression is that we're looking at a series of poorly documented languages that are probably all Indo-European, but whose relationship to any other language (including to each other or to any modern language) is impossible to determine with any certainty. Yes, they're geographically located where we might expect an ancestor to Albanian to be, and they were in the same region with each other, so one could guess that they're related to each other and to Albanian, but that would always be a guess.

On top of this, Albania is still recovering in some ways from a totalitarian government that cut it off from the outside world and manipulated things to the point that no one there knew what to believe. The result is people making up all kinds of stuff about Albanian's glorious Illyrian past and Albanians accepting it because they don't have anything better to compare it with. You have no idea how many problem editors we've had coming up with all kinds of imaginative nonsense about Albanian, Illyrian, etc.. Off the top of my head Nemzag (talk • contribs • global account info • deleted contribs • nuke • abuse filter log • page moves • block • block log • active blocks) and Balltari (talk • contribs • global account info • deleted contribs • nuke • abuse filter log • page moves • block • block log • active blocks) are two of the worst, with Torvalu4 (talk • contribs • global account info • deleted contribs • nuke • abuse filter log • page moves • block • block log • active blocks) taking honorable mention as a non-Albanian who played games with Albanian etymologies.

That's just to explain why we have to be very careful with Albanian etymologies, not to say that everything Albanians come up with is wrong. Albanian is unfamiliar to most Indo-Europeanists and quite divergent in many ways, so there's not as much depth to the literature compared to that of the other branches. That just makes it worse.

Perhaps we could come up with an etymology-only code split off from Indo-European, though I'm not sure how useful it would be. I don't have any expertise in the area, so I'll leave that to others. Chuck Entz (talk) 22:37, 21 January 2024 (UTC)[reply]

No. Dacian/Moesian and less so Thracian are also closer to Balto-Slavic than Albanian, at least I observed it from some plant-names, e.g. *divizna. Geographical reasons are poor and here likewise relate to Balto-Slavic. A term referring to them does not mean it is a family, there are also terms for Sprachbünde. Of course Albanians have been really interested to max out the discovered similarities. Wikipedia List of Dacian plant names hides the facts. If we were to edit them they would soon descend like flies upon us to leave no semblance of Albanian not having great ancestors; tedious enough to ward insanity off Wiktionary. I have observed this dictionary enough to know exactly what happens if we have a family for Albanian, some will feel very motivated to make more spurious comparisons, with material hard to control, and add badly translated terms they have never found in use instead of filling the plausible gaps in Albanian coverage. Fay Freak (talk) 22:02, 21 January 2024 (UTC)[reply]

These are fair points, I understand. I'd like to shorten qsb-bal's name then, possibly to "Paleo-Balkan" if that's not an issue. Alternatively to "a Paleo-Balkan language", though having "language" in the language name is a bit weird. Catonif (talk) 16:08, 22 January 2024 (UTC)[reply]

Found this related discussion: Wiktionary:Beer_parlour/2022/March#Categories_for_sprachbünde. @Fytcha, Thadh --{{victar|talk}} 09:09, 23 January 2024 (UTC)[reply]

Went ahead and renamed a pre-Roman substrate of the Balkans to Paleo-Balkan, that's what d:Q1815070 it was linking to is about anyways. Let me now if it's a issue and it can be reverted. (Again, to clarify, the term "Paleo-Balkan" implies no linguistical relationship, but is rather a term to refer generically to pre-Roman IE languages spoken in the Balkans.) Catonif (talk) 21:32, 25 January 2024 (UTC)[reply]

"Paleo-Balkan" refers to this: Paleo-Balkan languages, the theoretical family we're avoiding. The substrate code qsb-bal should be renamed to Balkan sprachbund. --{{victar|talk}} 22:00, 25 January 2024 (UTC)[reply]

"Balkan sprachbund" refers to far more than the pre-Roman Balkan substrate(s), so I'd oppose that. Theknightwho (talk) 22:08, 25 January 2024 (UTC)[reply]

Then have two substrate codes if you need them. The point is "Paleo-Balkan" refers to something else specifically. --{{victar|talk}} 22:59, 25 January 2024 (UTC)[reply]

I'm trying to understand what's being sought here. When you say "I'd like to make it so that terms derived from e.g. Illyrian are in a subcategory of a greater category with derivations from any unspecified Paleo-Balkan languages", I take it you want terms from Illyrian, Dacian, Thracian, etc (and terms from qsb-bal?) to be grouped in a super-category like the former Category:Terms derived from Caucasian languages? That would entail a family code. Iff people don't consider these languages to form a genetic or a geographic family, then IMO backdooring the grouping by making an etymology-only language code for the family seems ill-advised, both on a "is it appropriate to do" level and on a "will it work" level: how would an etymology-only code help with grouping derived terms? Was the idea that we would, as an ongoing maintenance task from here on out, add the "Balkan Family" language as another ancestor to every existing and newly-added etymology section that mentions Dacian, Illyrian, etc so that instead of saying "from Dacian" etc they say "from Dacian, from the Balkan Geographic Family" (whatever name we use for it)? That's a maintenance nightmare. Or was the idea to set a language called "Balkan Family Language" (or Paleo-Balkan or whatever) as the ancestor of Illyrian, Dacian, etc? That wouldn't help with "mak[ing] it so that terms derived from e.g. Illyrian are in a subcategory of a greater category".
If the need is only to have a code to use when it's unclear which Balkan language a term is from, then the "a pre-Roman substrate of the Balkans" code seems fine for that, unless the issue is that "pre-Roman" is too limiting (e.g. if some words are thought to have been derived from a Balkan language in the post-Roman era). (Anyway, I would undo the rename of qsb-bal to "Paleo-Balkan" because Paleo-Balkan is something different from what qsb-bal is; if someone added an incorrect Wikidata link, let's remove that.) - -sche (discuss) 02:27, 26 January 2024 (UTC)[reply]

Yes, the need for a code is exactly to have a code to use when it's unclear which Balkan language a term is from. For example, many Hesychian glosses which may be explained as borrowings from an IE language with *bʰ *dʰ *gʰ → *b *d *g, or occasionaly satem outcomes, are often explained as "Illyrian" derivations, however there is little reason to assume the donor language was indeed Illyrian out of the many ancient Balkan languages which share the shift. For this reason I'd rather say the term is borrowed, more vaguely, from a Paleo-Balkan language. You're right at saying that qsb-bal does the job well for this, as I have been convinced not to have a family code, essentially giving up on the categorisation.

It looks like I somehow did not get the message across, so I will try clarify: "Paleo-Balkan" refers to ancient IE pre-Roman languages of the Balkans no matter their internal relation; the term hence does not refer to a language family. By looking at the Wikipedia page at a first glance one might get the idea it is a language family, but please do read closer: they are said to be a grouping, not a family, then it is stated that the internal relationships are still debated and finally it is called Paleo-Balkan linguistic area in the ==Classification== tree. "A pre-Roman substrate language of the Balkan" is Paleo-Balkan by definition: that is exactly what the term is for. The fact that some scholars endorse a close relationship between the Paleo-Balkan languages is another issue: me potentially proposing relationship between the Caucasian languages doesn't make the term "Caucasian languages" a language family. A "substrate code" handles this kind situation decently, as was also pointed out.

As a third and final point, why is the Balkan Schprachbund coming up in this? That is an entirely different thing. Catonif (talk) 22:32, 26 January 2024 (UTC)[reply]

Should derived terms include derivable terms?

For instance the English replication, its history aside, is synchronically derivable from replicate + -tion. Should it be indicated as a derived term under replicate?

The question applies to any language with an earlier stage, really. Cf. the Spanish caminar, which is derivable from camino + -ar but corresponds to a 'Vulgar Latin' *camminare, the various descendants of which are also synchronically derivable like the Spanish one. Nicodene (talk) 20:30, 21 January 2024 (UTC)[reply]

@Nicodene In my opinion, it should only be for derived terms, which means we would need to sift through actual etymologies to put the word in either "derived terms" or "related terms". Terms that are not actually derived but "derivable" could be put under "related terms" — my current practice.

This helps readers identify what terms are formed from which, and which ones originated in parallel to the given word. It means we have to put in the work when editing, but that's better than making readers do that work every time they want to know the relationship, in my personal opinion. Kiril kovachev (talk・contribs) 23:30, 21 January 2024 (UTC)[reply]

I agree with Kiril kovachev's reply. Admittedly I have not always tried to implement that approach myself. Wiktionary is not yet close to following that approach consistently, which is why I haven't tried to implement it consistently myself; doing so would require changing many existing entries, in a minor way. Admittedly we do not know, without historical/diachronic research, which words are derived versus related, diachronically speaking. Nonetheless, it is both factually correct and operationally easy to treat them all as related unless specifically known to be derived. If we all were going to be serious about sticking to that approach consistently, then every Etymology section should explicitly mark any surface analysis as being such (for example, with template:surf), and it should also give the diachronic history if known — that is, both synchronic and diachronic should be given, when the latter is known at all, and the former should be explicitly marked as being what it is. It would be great if a consensus could be reached that established this as the standard practice. I'm not enough of a metapedian to try to organize a vote or whatever about it. I don't much understand metapedian rules about votes that aren't actually votes versus votes that are actually votes, votes that are actually votes but don't count, recommendations that aren't policies, policies that aren't mandatory, and so on. Quercus solaris (talk) 00:39, 22 January 2024 (UTC)[reply]

I think in many cases, it isn't actually accurate to state that the term is derived solely from the composed ancestral form. Mentioning both etymologies seems best in cases like "replication" where the synchronic analysis is fully transparent and involves still-productive derivational processes (as in the case for the correspondence in English between verbs in -ate and nouns in -ation). Given that derivation from 'replicate' is one possible etymology of 'replication' (even if not the only one) I do think this particular instance should be included under derived terms. In contrast, it seems reasonable to me to not include "decision" as a term derived from "decide", since the differences in form cannot be easily explained using any highly productive process of modern English. Then again, it seems like overkill to list a word both in "derived terms" and in "related terms", and it's clear that these are related terms, so I guess the current state of replicate#English seems OK.--Urszag (talk) 01:09, 22 January 2024 (UTC)[reply]

Completely agree with the above. Derivable terms are also derived terms in the speakers' minds. Thadh (talk) 19:13, 1 February 2024 (UTC)[reply]

replication is constantly reinvented, it is both derived in English and inherited from Middle English while being derived in another language. Derivable terms quite usually are derived in the strict sense if you just don’t construct the origins of word as points in time, though points in time are of course information readers ask for, too. It’s living people employing them according to the corpora and grammars available to them, and assuming such a basis the barrier to invent a word, if a speaker intends to use it, may actually be lower than for him to assure oneself of its possible attestation chain.

A Turkish abstraction on -luk is formed before checking it in the modern literature or even that of the Ottomans, who had a literacy rate of like 10 %, and now for their literature even lower in spite of higher literacy of the people in the identical language. Words can be borrowed from the same language and the ancestor, inherited by words remembered to have been spoken to the speaker or writer or encountered in media and derived simultaneously, and we squash all into one entry, quite intentional on Atatürk’s part who surely didn’t have the opinion that we should feel the need to investigate the Ottoman situation before explaining Turkish, that was supposed to stand by itself in the Latin alphabet. The truth of the “derived term” section does not depend on the circumstance of us distinguishing language headers, like between Turkish and Ottoman Turkish and not Bulgarian and pre-1945 Bulgarian, it is your construction with premises you barely remember the origin of. Fay Freak (talk) 01:21, 22 January 2024 (UTC)[reply]

Oof, good question. Realistically, I don't know our entries will ever have any consistent approach; few people research whether (e.g.) an English -ify or -ish word is from 1475 or 1515 before adding it, and to some it's intuitive to consider terms Derived unless they're shown to have been precomposed in an earlier stage (or still consider them Derived even if they're from an earlier stage) whereas to others it's intuitive to consider them Related unless they can be shown to not be precomposed. Sometimes it's unclear; the MED has a 1400s cite of coldisch but seems to consider it a separate word from coldish that's "first" attested in 1589. And consider 1, 2, 3 about whether e.g. Marxism counts as derived from Marx + -ism or borrowed precomposed from French, or whether "antisemitism" had "anti-" added in English or German; some would say "Marxism" and "antisemitism" were calqued and use modern English -ism and anti- rather than French -isme and German anti-, but then doesn't e.g. versify use modern English -ify rather than Middle English -ifien?
I tend to agree with Urszag that users of still-productive processes are probably not considering themselves to have formed the verbs of e.g. "Lin-Manuel Miranda versified and songified Hamilton's life into verses and songs for a modern public" by two different processes. But I would not want to list lots of terms as both "derived" and "related". Frankly, enough casual users and readers misunderstand "Related terms" as meaning semantically-related that perhaps we should take this opportunity to think about overhauling our setup more dramatically. One drastic idea would be to merge "Derived terms" and "Related terms" into "Etymologically related terms", but I doubt there's support for that. - -sche (discuss) 19:31, 22 January 2024 (UTC)[reply]

I wouldn't be inclined to that approach, if only because the combined section becomes a dumping grounds with synchronically-linked forms equally intermixed with all sorts of distant relations.

A tidy model would be having derived terms as a mirror image of 'surface analysis'. Then its scope becomes perfectly clear. Nicodene (talk) 03:46, 23 January 2024 (UTC)[reply]

@Nicodene: That's a nice idea, but it doesn't account for the definitive example of pendant and pennant! --RichardW57m (talk) 17:58, 25 January 2024 (UTC)[reply]

@RichardW57m What is it about those words? I don't quite understand. Nicodene (talk) 18:09, 25 January 2024 (UTC)[reply]

@Nicodene: Sorry. my mind jumped to "related terms", whereas you were talking about "derived terms". What you're saying is that "derived terms" should include both historical and synchronous derivation. --RichardW57m (talk) 10:43, 26 January 2024 (UTC)[reply]

I see. Yes, that is what I mean. Nicodene (talk) 18:14, 26 January 2024 (UTC)[reply]

I would be inclined to say yes, for the reasons Vahag laid out. I do think the way Derived/Related terms are handled needs to be made more rigorous, for what it's worth. Vininn126 (talk) 19:42, 22 January 2024 (UTC)[reply]

Where did Vahag give these reasons? RichardW57m (talk) 18:00, 25 January 2024 (UTC)[reply]

Brainfart; Urszag not Vahag (both ending in -ag). Vininn126 (talk) 18:01, 25 January 2024 (UTC)[reply]

I will comment that when I'm editing, I usually do think about this literally--- even if it were "derivable", I actually do consider if a "word is from 1475 or 1515". But it's a cool question no doubt, I think it could go either way. --Geographyinitiative (talk) 19:01, 26 January 2024 (UTC)[reply]

Hey, I just realized "derivable" terms could mean ANYTHING. This discussion is really about whether "Derived terms" means "Component terms". Because look at this: Taipa#Derived_terms. You might imagine (this is likely incorrect) that the two island names Taipa Grande and Taipa Pequena were not derived from Taipa (as an island), but that Taipa (as a name for the combined island) was derived from those two terms. (That's probably not how it happened, but imagine it could happen somewhere.) If that were true, why not put "Taipa" on both the "Taipa Grande" and "Taipa Pequena" pages, and then take "Taipa Grande" and "Taipa Pequena" off the "Taipa" page? If the standard is mere "derivability", then you might not even remove the two islands from the Taipa page while simultaneously keeping Taipa on both of those pages! And that could technically be true under a pure "Derived terms" scheme, if you were looking at how different senses were derived. Anyway. --Geographyinitiative (talk) 20:00, 27 January 2024 (UTC) (Modified)[reply]

If anyone can help add Derived terms, we have a nice manageable list at Wiktionary:Todo/compounds not linked to from components.P. Sovjunk (talk) 20:41, 27 January 2024 (UTC)[reply]

My answer is to this question is yes, terms derivable by surface analysis should be permissible under ====Derived terms====, in instances where the surface etymology is based on productive morphology (per Urszag), thereby serving as a mirror-image of {{surf}} in etymology sections (per Nicodene). That has largely been my practice to date. In instances where a word is also provably derivable by inheritance or borrowing from another language, maybe some kind of tooltip or visually unobtrusive qualifier could optionally be placed next to it in the ====Derived terms==== section to indicate that its inclusion under that heading is on surface-analysis grounds only. Voltaigne (talk) 16:05, 1 February 2024 (UTC)[reply]

Inclusion of reduplicants?

It seems we have neither policy nor infrastructure to include reduplicants. Do we just not include them?

What I mean is stuff like the “jim” in slim jim: not a word by itself, but just a modified copy of the word “slim”. This is very common in Vietnamese, and given the difficulty with word-parsing in Vietnamese, people (not only noobs, happens to me as well) may not realise these reduplicants are part of a larger whole and look them up separately. The entry ngợm seems to have been originally created by someone who got this wrong; it is not a word, but a reduplicant (under -ơm reduplication) of người (in người ngợm) or of nghịch (in nghịch ngợm).

Would it make sense to define such reduplicants as {{form of|vi|reduplicant|người}}, {{only used in|vi|người ngợm}} or some such? MuDavid 栘𩿠 (talk) 03:06, 23 January 2024 (UTC)[reply]

We do seem to exclude English reduplications, e.g. we deleted shmork (and work schmerk), but shm- reduplication is considered "obvious"; if it's hard for people to tell that something is a reduplication in Vietnamese, there's more of a case for having a reduplication-of template to point people to the relevant word, like we already have {{only used in}}. (Frankly, I sometimes wonder why we exclude some of the things we exclude, like shmork; sure, the meaning may be "obvious" — "pork" — but English third-person singulars are obvious too, and we include those...) - -sche (discuss) 04:37, 23 January 2024 (UTC)[reply]

BTW, jim seems like a good candidate for an {{only used in}} (or at least ===See also===) link to slim jim. It doesn't seem to be reduplication, either: Etymonline says Slim Jim originally meant a thin person and only came to denote a snack a century later, so I'd guess Jim is just the name, albeit chosen for rhyme, like Debbie Downer. - -sche (discuss) 04:37, 23 January 2024 (UTC)[reply]

Vietnamese has an analogue of schm- reduplication (-iếc reduplication), and we also don’t include those. There’s many other reduplication patterns that are far from “obvious” and are generally considered to yield fully different lemmata, though. We’re still working on the details. Our entry on reduplication lists “slim jim” as an example of rhymed reduplication. Not correct? MuDavid 栘𩿠 (talk) 02:25, 24 January 2024 (UTC)[reply]

IMO, yes, and I've asked for more input about that. - -sche (discuss) 20:24, 24 January 2024 (UTC)[reply]

(necroposting) What PoS is a reduplicant? MuDavid 栘𩿠 (talk) 07:29, 5 March 2024 (UTC)[reply]

New gadget: Catch My Attention

This simple gadget, Catch My Attention, lets you view {{attention}} templates in read mode as well as edit mode.

For example, {{attention|en|reason goes here}} is normally invisible, but it would appear as ⚠️ {{attention|en}} reason goes here after you turn the gadget on.

If this is something you would find useful, it's available in the "Appearance" section of your gadget preferences.

This, that and the other (talk) 11:53, 24 January 2024 (UTC)[reply]

May lead to more use of {{attention}}, both inserting and tending to them. It may make the topic parameter worth restoring and using. DCDuring (talk) 13:14, 24 January 2024 (UTC)[reply]

Nice addition to the gadgets, thanks! JeffDoozan (talk) 15:00, 24 January 2024 (UTC)[reply]

@This, that and the other: For some reason, this is showing up in CAT:E. Chuck Entz (talk) 16:03, 24 January 2024 (UTC)[reply]

@Chuck Entz fixed, sorry. This, that and the other (talk) 21:53, 24 January 2024 (UTC)[reply]

Neat. I've turned it on. Thanks for creating it. - -sche (discuss) 02:30, 26 January 2024 (UTC)[reply]

Certain characters, such as quotes, do not show up correctly (the HTML entities get shown instead). — SURJECTION ^{/ T / C / L /} 07:36, 1 February 2024 (UTC)[reply]

@Surjection: That was because we were escaping the text before putting it into the mw.html functions, which do their own escaping. This edit appears to fix it. — Eru·tuon 01:35, 2 February 2024 (UTC)[reply]

user Rua is using ə which actually is not a letter of the Slovene alphabet

Hello,

I am requesting for help.

This user who has not much knowledge of Slovene language has reverted my edit two times : https://en.wiktionary.org/w/index.php?title=pehati&diff=77731738&oldid=77730508

Yours sincerely, Wisdood (talk) 12:30, 25 January 2024 (UTC)[reply]

@Wisdood: See the paragraph of WT:ASL starting "In the headword line". @Rua is defending the Wiktionary standard. --RichardW57m (talk) 16:04, 25 January 2024 (UTC)[reply]

User:Rua is defending his own standard he created himself in his edit of WT:ASL. See for explanation : Wiktionary talk:About Slovene Wisdood (talk) 09:03, 26 January 2024 (UTC)[reply]

@Wisdood: I also found this entry: https://www.termania.net/slovarji/slovensko-nemski-slovar/2494056/phati?query=trpka&sl=61&tl=61 in a dictionary. Anatoli T. ^{(обсудить}/^вклад) 07:54, 9 February 2024 (UTC)[reply]

Separate question, about WT:ASL: diff by Garygo golob changed it from

"Slovene is a South Slavic language spoken in Slovenia, and also in some parts of Austria, Hungary and Italy."

to

"Slovene is the northernmost South Slavic language spoken in Slovenia, and also in some parts of Austria, Hungary, Italy, and Croatia, however the latter are unfamiliar with Slovene Standard and rather speak Standard Croatian."

but if we treat Croatian as a separate language then there's no reason to have added "people who speak Standard Croatian" to the list of "people who speak Slovene", is there? Or if they do not, in fact, speak "Standard Croatian" but instead Croatian-dialectal Slovene, then change it to say that... - -sche (discuss) 16:28, 25 January 2024 (UTC)[reply]

@-sche: I agree the wording is bad, but I think he means that while their basilect is Slovenian, their acrolect is Croatian. --RichardW57m (talk) 17:49, 25 January 2024 (UTC)[reply]

@Wisdood: You had little exposure to the linguistic literature. The character is used in the literature about the language for tonemic spellings. Additionally in view of your global contributions your Slovene knowledge is also not demonstrated but you might have the peculiar one of an Italian national. Fay Freak (talk) 17:56, 25 January 2024 (UTC)[reply]

Please provide linguistic literature scientific references about that. Wisdood (talk) 09:25, 26 January 2024 (UTC)[reply]

Who is an Italian national here ? The discussion is about Slovene. Slovene is little spoken nowadays in Italy, as stated by Wiktionary:About Slovene... Wisdood (talk) 09:28, 26 January 2024 (UTC)[reply]

Many a dictionary the Slovene nation has published on Fran. I even find it for Čakavian in my first paper I just searched for you about Slovene tonemes, so it is a Slavist standard. Your whole argument about “letters of the Slovene alphabet” is garbage from the beginning anyway. Like diacritics, a letter can have variants employed in a linguistic work, like Wiktionary, for disambiguation. You can’t just come here and, introducing inconsistency, do it differently than previously maintained for the collaborative work. Fay Freak (talk) 10:14, 26 January 2024 (UTC)[reply]

@Fay Freak, Wisdood: To be fair, most of our deviations from orthography don't change the base characters. At one level, our Slovenian entries are inconsistent with Wiktionary as a whole. If you're sufficiently word-blind, or an Austronesianist, then schwa does look like an 'e', but to others of us it's quite a different letter. There probably isn't a better alternative to using schwa in our Slovenian entries. --RichardW57m (talk) 11:02, 26 January 2024 (UTC)[reply]

@RichardW57m: This reminds me of the discussion that was had about Somali a few months back at Wiktionary:Beer parlour/2023/November § Somali Orthography (CC:@Thadh). My stance is the same here. @Rua: Is there significant precedent for using a schwa in dictionaries on the headword line? I see Fay Freak's link above, but it doesn't put the schwa in the actual lemma/headword (which it does with the acute accent), but instead as a pronunciation addition, which is very important difference. If that's the premier monolingual dictionary, then that in itself is pretty evident. If there's evidence against that, I'd recommend adding sources to WT:ASL.

If not, then I'd personally be against adding characters that aren't used. Some orthographies simply are just defective, and we shouldn't change them to match how we want them to look. Such ambiguities should be shown in the pronunciation section if they're not reflected in the orthography. Ex: We don't add accent marks to the headword line for English, change French to match pronunciation spellings, or lemmatize Chinese at pinyin, so we shouldn't do the same for other languages. The headword line should be reserved for the orthography that people actually know about (unless there is no common or standard orthography). AG202 (talk) 14:54, 26 January 2024 (UTC)[reply]

@AG202: The page title is already reserved for the orthography - though the rendering can be badly defective, as for Sinhala. For Latin, we actually see macrons in the headline, and that is usual, at least nowadays. When I was growing up, the headlines in the Concise Oxford English Dictionary were generally marked with a vaguely generic indication of the pronunciation, most typically splitting vowels into long, short and schwa. None of these marks are part of English orthography, so they were easy to ignore if one was just interested in the spelling. Thus I disagree with your blanket objection. --RichardW57m (talk) 15:33, 26 January 2024 (UTC)[reply]

The page title is for the common/everyday way of spelling. The headword lemma line is for any additional disambiguations permitted by the standard orthography and/or used in dictionaries in their lemmas. Ex: Tone marks in Yoruba or additional accent marks in Italian. If the schwa is not a part of the standard orthography used by Slovene, in dictionaries especially, then it should not be in the headword line, similar to how we don't put accent marks or other vowel marks for English. AG202 (talk) 15:41, 26 January 2024 (UTC)[reply]

@AG202: I agree with not adding characters that aren't used, but I'm not familiar with Slovene and whether or not the schwa is used in e.g. children's books and dictionaries. Thadh (talk) 15:42, 26 January 2024 (UTC)[reply]

That standard is not forged by Rua, it is used by Fran, our main source for Slovene: pəháti. Sławobóg (talk) 16:00, 26 January 2024 (UTC)[reply]

It doesn't look to be universally used by Fran's dictionaries as seen with Fay Freak's link above, and perusing through some of the other dictionaries listed there; how common is it actually? AG202 (talk) 16:10, 26 January 2024 (UTC)[reply]

Just not all dictionaries note tones I guess. It is commonly used by Slavists, e.g. Vasmer (1987), Snoj (2020). I don't really know if theres newer standard (Snoj's book is pretty new), but argument "letter is not used" is just wrong. It is also in Slovene_phonology#Tone. Sławobóg (talk) 16:31, 26 January 2024 (UTC)[reply]

Note: It's specifically the schwa that is being discussed, not the tone markings. Also, I'm asking specifically about commonness, not about whether it's used in absolute. As for the ones you've linked, Snoj (2020) doesn't use the schwa in the headword line, but with the pronunciation, which, as I stated, is a different thing (we're not also thinking of putting [u̯] or /ʋ/ in the headword line too are we? in the example of pekel). Vasmer (1987) does use it, but it's also not a dictionary of Slovene itself. AG202 (talk) 16:47, 26 January 2024 (UTC)[reply]

I'm guessing tones are used in children's books and primers, no? If so, is the schwa used in those contexts, too? Because if not, I'm not sure how we can justify using it in headwords, as it is obviously a pronunciation feature, not an orthographical one. Thadh (talk) 17:38, 26 January 2024 (UTC)[reply]

As an example of a primer written in English, Slovene: A Comprehensive Grammar (2016) uses tone-marks but not the schwa. AG202 (talk) 17:58, 26 January 2024 (UTC)[reply]

@Rua: Once again pinging you to give clarification and proper resources on this issue. Otherwise, I'd be inclined to agree with @Wisdood's original comments. AG202 (talk) 18:56, 2 February 2024 (UTC)[reply]

I’m guessing that tones aren’t used in children’s books since they have become extinct in major regiolects, and now it is “debated whether formal Slovene is a tonal language or not”, → Slovene national phonetic transcription: the article makes the interesting claim that “mid central vowel can also be written with ə and when l is pronounced as [u̯], it can be represented with ł, however such representation is mostly reserved for dictionaries and study books meant for non-native speakers.” Foreigners may need to be told what natives aren’t, and we are a bilingual dictionary. Fay Freak (talk) 20:11, 2 February 2024 (UTC)[reply]

@AG202 I disagree with you here and I think you're pushing your opinion to the point where it becomes counterproductive. This is not a case like English or Portuguese where the lemma spelling matches the page title and there is a lot of unexpressed phonological information. Here, the lemma contains quite a bit of phonological info; basically it appears to be a complete representation of the phonology. The pronunciation of l as /ʋ/ *IS* indicated by writing ł. It seems counterproductive to insist that we remove one piece of this while still keeping all the tone, stress and length marks just because "some dictionaries don't use it". How does this help our readers? Benwing2 (talk) 20:58, 2 February 2024 (UTC)[reply]

I am far from the only one with this opinion, and I don't appreciate that characterization at all. I have consistently given space to people to give proper sources for their claims, but in both cases (Somali and here), they haven't given them yet. And even in the case of Somali, the original poster agreed with me in the end. I have a right to respond and discuss like anyone else, and if commenting & asking for sources in a public forum is "pushing my opinion," then everyone else is doing that too. That characterization frankly feels very targeted, combined with your other characterizations of me, just as in Wiktionary:Beer parlour/2023/September § Splitting Quechua, and that discussion and lack of reply/apology to this day have unfortunately left a very sour taste in my mouth. You read way too much into what I say and come at it from an antagonistic perspective, and I genuinely don't know where it comes from as I've tried to be understanding when it comes to working with you.

To respond to what's relevant to this discussion, if we're helping the users, we should be following what the language actually uses and what is a part of the standard orthography, rather than creating our own or using a linguistic one. I have only been asking for sources and commonness on the usage of the schwa, not for any other characters. Though, the other ones such as tone, do look to be used in dictionaries which have been cited. The schwa itself, does not look to be common whatsoever, nor is it a part of the standard orthography. I have given time and space for Rua to give sources to show that it in proper usage. AG202 (talk) 21:44, 2 February 2024 (UTC)[reply]

I'm sorry I haven't been very active in this discussion, as in most discussions on-wiki, as I have been focusing on editing the mainspace. I do agree that we should aim to reduce the number of inconsistencies between the headword and the pagename, same for reducing the number of phonetic elements in transliterations - we have a pronunciation header specifically for the function of adding info on pronunciation, and at some point having a header contain ə and ł will confuse readers into thinking that these are part of the language's orthography, which they aren't.

Many other dictionaries often make choices I wouldn't support, and I'm sure there are plenty of English dictionaries using various diacritics and/or respellings in the headword as well, but just like we wouldn't add those to English, I don't think we should for other languages either.

Accent, length and tone very often do appear as parts of the (optional) orthography: If I were to write a Russian letter consistently adding stress marks to every word, people would look at me funny, but they would accept it as Russian. I'm not so sure about a Slovenian text including schwas. If the same is also true for the accent marks, then I guess we should get rid of those, too. Thadh (talk) 22:48, 2 February 2024 (UTC)[reply]

@AG202 I am sorry if I come across as antagonistic; that is not my intent at all. What I do see, however, is that you have come into an area you don't know much about and are trying to change a long-established status quo using strongly worded opinions that, frankly, do seem a bit pushy to me. IMO if you express a strong opinion, you should not be surprised to get some pushback. As for Quechua, I know there were some strongly worded opinions but I'm not sure why you feel a sour taste in your mouth. Can you clarify what you feel I should have replied to and apologized about? If I recall, I originally proposed a minimalist split (ala Occitan or potentially Chinese) while you proposed a maximalist 44-way split; I backed off that and proposed splits based on mutual intelligibility (and User:-sche also made a similar and more refined proposal), but you kept insisting on the maximalist way. The discussion eventually died out, as often happens. Benwing2 (talk) 23:02, 2 February 2024 (UTC)[reply]

@Benwing2: Thanks for clarifying. In this case, it doesn't really seem like there was any consensus, rather Rua making a standard and enforcing it without discussion, as seen by the comments here.

In the Quechua discussion, I never proposed a 44-way split. From my first comment, I said that we should focus on mutual intelligibility and/or use the split at Wikipedia as a compromise, but I was opposed to doing it like Chinese. What I felt you should've apologized for were the statements: "Yes I figured you would make this argument, but I am somewhat offended you are implying I don't care about minority languages." & "I am trying to think creatively about how to deal with what is a very real issue and your response is to cast aspersions. Yes, you feel strongly about this but please keep the emotions out of the discussion.", when I had given a solution in my initial comment and never accused any one person of not caring about minority languages. It felt like a complete 180 from what I actually said, and I was completely taken aback. AG202 (talk) 23:09, 2 February 2024 (UTC)[reply]

@AG202 Thanks for letting me know the comments you were concerned about. I'm too tired right now to respond in detail but I will post a response tomorrow after getting some sleep. Benwing2 (talk) 08:47, 3 February 2024 (UTC)[reply]

@AG202 Apologies for the delay in responding. I went back and took a look at the exchange in question, and what triggered me was this statement you made in response to my initial suggestion of using an approach somewhat like Chinese: "I know it's far from the intent, but it makes it feel like we care about underrepresented languages less." I took that as an accusation that my suggestion of using a Chinese-like approach meant I didn't care about minority languages, with some hedging that read to me at the time like you were casting an aspersion while trying to make it seem like you weren't. However I seem to have misinterpreted your intent, and for that I do apologize. I think the reason I push back sometimes against you is that your style of writing is strongly worded (or at least it comes across to me that way), which to me sometimes comes across as pushy. I imagine (in fact I'm 99% sure) this is due to the nature of the written medium and would not happen if we were speaking face to face. Benwing2 (talk) 07:06, 9 February 2024 (UTC)[reply]

Also, be aware that Rua is somewhat inactive these days and tends to shy away from controversy, so it's not surprising she hasn't responded here. Benwing2 (talk) 07:09, 9 February 2024 (UTC)[reply]

@Benwing2

I remember a long time ago, @Rua asked publicly about implementing the tonal orthography for Slovene. I took part in the discussion and said yes to the proposal. I assume it would be similar to our Serbo-Croatian tonal orthography but it went further. I was particularly surprised about "ə" character used but I haven't complained about, since I don't know if there is any standard about this spelling. I would be surprised if there is, although there is some similarity with

Russian use of Cyrillic ё vs е. Use of accents. самолёт vs самолет, не́бо vs небо
Hindi nuqta'ed letters vs nuqtaless letters - e.g. ग़रीब vs गरीब
Arabic hamzated alifs vs hamza-less alif, dotted final yaa' vs undotted final yaa'. Use of diacritics. E.g. إِيطَالِيَا vs إيطاليا vs عَرَبِيّ ,ايطاليا vs عربي vs عربى, etc.

However, dictionaries tend to use a more precise spellings, than general fluent writing. Some go further than other. For Arabic, we provide both the vocalised spellings and romanised transliterations. For Japanese, we give both kana spellings and rōmaji, in some cases both katakana and hiragana, if both applicable.

I don't have a link to the poll but I'll say Rua did the right thing by asking but we should study the proposal more thoroughly.

As @Sławobóg pointed out - there are dictionaries that use that notation. Anatoli T. ^{(обсудить}/^вклад) 07:49, 9 February 2024 (UTC)[reply]

And rightly so here. AG202 is visibly, if only by intuitive habit to favour such structures, trying to make a grassroots democracy idpol kind of thing from our portrayal of Slovene. As if we would gatekeep the latter by forcewise using the letter in question, which in view of known use and intelligibility, to what ever extent requiring special familiarization, is no hindrance. It is a hindrance if we are expected to do a market study on community appeal before deciding for an orthography from our outcountry armchair that nobody paid us to make dictionaries in. One makes a standard because people dread ambiguity more than Columbus tapping an egg. Fay Freak (talk) 07:53, 9 February 2024 (UTC)[reply]

I think that all main spellings should be based on how the speakers spell it in writing or IPA, if unwritten. It should honestly already be apart of our policy. CitationsFreak (talk) 23:02, 2 February 2024 (UTC)[reply]

I agree with the first statement but strongly disagree with the second. I think it should be based on how speakers would spell it when a language is "unwritten" (which it almost never is, fully). Thadh (talk) 12:28, 9 February 2024 (UTC)[reply]

Hi, a native speaker here. Sorry I haven't taken part in the discussion before. This notation also surprised me as I was not familiar with it. The main source of grammar information on Wikipedia used to be from this where some words are written like š[ə̀]l (page 65), from where the simplification could have occurred. The same notation (just with old tone diacritics) occurs in Pleteršnik's dictionary, e.g. ábəł but is otherwise absent from other dictionaries, in which the pronunciation of l as [ł] is denoted only separately in the pronunciation section. [ə] appears separately from the word, too if the stress tone system is used. However, if tone diacritics (as in Wiktionary) are used, [ə] is usually written like that, e.g. here, here (page 40: dobər, pəs; bottom of page 47: -ək, -ən). The new Pravopis (in making) uses tonal diacritics only in brackets (see here under tonemsko naglaševanje) while the current pravopis shows təma written as tȅma / tèma (§625, table) with tonal diacritics.

The lemmas in SSKJ (the biggest dictionary) and Pravopis (the standard) use stress accent with no distinction, followed by pronunciation and then the tone in brackets where /ə/ is distinguished, e.g. temà² -è in tèma -e [təma] ž, rod. mn. tèm (ȁ ȅ; ə̀); here I marked the relevant letters in bold.

Personally, I don't mind this system as it is deeply engrained in both Wiktionary and Wikipedia and is backed up by at least one major Slovene dictionary (Pleteršnik dictionary, probably the last major dictionary that used tone diacritics in the headword). The only major problem I see with that, which I have also encountered, is the fact that it is very hard to find the tone for words that are not in SSKJ or Pravopis, like many proper nouns. The inclusion of tone cuts the number of speakers able to determine that in half (not counting L2 speakers, which probably don't learn about it at all), but even that is a stretch as the tone heavily varies from place to place and only "educated tonal speakers from Ljubljana" speak with the right tones as that was used for the basis by Rigler, which introduced the current tonal system. Apart from that, inflection tables in most lemmas (more or less those that were not edited by me and do not use the template {{sl-decl-noun-table3}}) use stress diacritics and some are the same as the tonal diacritics, making the whole system unnecessarily confusing and giving the impression that the tone does not change during the inflection (which it does). Most Slovenes are not familiar with the tonal diacritics, let alone people trying to learn Slovene and the number of tonal speakers is also declining.

Perhaps the best way would be to limit tonality to the pronunciation section and clarify that it is tonal using the qualifier. There is also a template {{SNPT}} I made for the Slovene transcription and the headword could be mimicked from SSKJ. If anyone is willing to change all this that is… There are also many other problems regarding the headword template, such as not supporting the genitive and plural forms for nouns which are listed in the documentation and the present and past forms for verbs, also listed and would be really beneficial. The pronunciation is severely outdated and imprecise, too. However, that is shared with Wikipedia as well as I couldn't find enough people to give their thoughts so Help:IPA/Slovene on Wikipedia would be revised to match the Slovene phonology.

Hope that helps. Garygo golob (talk) 14:46, 11 February 2024 (UTC)[reply]

By the way, "northernmost South Slavic language" - do we consider Porabian language as a dialect of Slovene or not? Tollef Salemann (talk) 17:06, 9 February 2024 (UTC)[reply]

Talk page - New section / Internal error ?

I have that message when I try to talk at page User talk:Rua :

> Internal error > [421a41d1-baed-475d-bc37-3a4f5db6dbfc] 2024-01-25 09:14:02: Fatal exception of type "TypeError" Wisdood (talk) 12:35, 25 January 2024 (UTC)[reply]

This type of error can't be fixed by Wiktionary admins, and if it's still showing up for you, it can be posted as a bug report at Phabricator. — Eru·tuon 18:31, 25 January 2024 (UTC)[reply]

Alternative forms

Hi, everybody
I was thinking: Classical Latin vōcem was borrowed in Italian as voce; there is also an obsolete term boce (with strengthening of the labiovelar glide), of the same origin.
My question is: considering the sound change in boce, which qualifies it as an inherited term—as opposed to the borrowed one—should it be listed as an alternative form of voce (using the {{altform}} template), or rather—despite sharing the same meaning—as its own separate entry?
Thanks in advance for your time. —— GianWiki (talk) 15:53, 25 January 2024 (UTC)[reply]

@GianWiki I would say, if the meaning is exactly the same, the obscure form should be listed as just an alternative form. People will also know that this form is definitely less typical and be able to refer to the common synonym. Kiril kovachev (talk・contribs) 00:09, 26 January 2024 (UTC)[reply]

As for the issue generally, yes, I believe two alternative forms are to be considered as such based on their usage, they need not have the same exact etymology: if a word is used as a variant of another, then it is an alternative form. (As for the example you provided though, note voce is not a borrowing, but rather an inherited form of the term that just happens to be unaltered.) Catonif (talk) 17:43, 26 January 2024 (UTC)[reply]

It is a mistake to assume that if a Romance word occurs in more than one form then the one(s) more similar to Latin must, due to that fact alone, be borrowed. The outcome /ˈvotʃe/ is regular for Latin vōcem, and a borrowing of it would rather have ended up with the vowel /ˈɔ/.

Here it is instead the form boce that calls for a special explanation. One possibility that comes to mind is that it has to do with syntactic doubling, which in Central Italy at least widely results in /v/ > /bb/. Fittingly enough a bboce occurs in one of the earliest inscriptions in Italian, from Rome. From that it is possible to backform a non-doubled boce. Nicodene (talk) 17:55, 26 January 2024 (UTC)[reply]

Your question is an interesting one, though - there are of course examples of inherited versus borrowed doublets. The general approach that I've used is to consider whether such pairs have any significant semantic difference and whether they are different enough in form to no longer be easily recognised as 'the same word' (e.g. alma :: anima). If the answer to both questions is 'no', then to me it is a question of altforms and not doublets. Nicodene (talk) 18:07, 26 January 2024 (UTC)[reply]

Bulk Deletion of Hieroglyphs

A number of hieroglyphs, such as Egyptian 𓎽 were deleted by @Vorziblix yesterday with log entries starting like this

"(Deleted per RFD, RFDO; do not re-enter: content was: "{{character info|gardiner=W12}} ==Egyptian== ===Glyph origin=== A jar"

What does this mean? Does it mean we're prohibited from creating that entry with fresh content, or just from uncritically restoring the old content? What's it got to with RFDO? Shouldn't it be RFDN? I did some repair work on the entry in the past week or so, but I can find no evidence of a deletion request in the page itself. --RichardW57m (talk) 17:45, 25 January 2024 (UTC)[reply]

No, it means one is prohibited from making bad entries, including restoring the old content, and cluelessly and hence unreliably making content even if it may or may not be bad, including restoring the old content. Fay Freak (talk) 17:58, 25 January 2024 (UTC)[reply]

I see you subsequently figured this out, but for anyone else reading, this was per Wiktionary:Requests for deletion/Non-English#Every_Egyptian_hieroglyph_entry_by_User:Loukus999.
The "Deleted per" notice is picked from a dropdown list of deletion reasons (MediaWiki:Deletereason-dropdown), if we think it's necessary to expand the "RFV" and "RFD" lines to explicitly name each RFV and RFD subpage (WT:RFVCJK, etc), I or another interface admin can edit it to. - -sche (discuss) 20:06, 25 January 2024 (UTC)[reply]

@RichardW57m: Yep, User:-sche has it right. Maybe picking the dropdown rather than writing a custom message wasn’t the ideal choice on my part; it’s just the default message for deletions of any kind resulting from RFDs. I didn’t mean to imply that such entries should never be re-created; my apologies for implying otherwise. Unfortunately tagging literally hundreds of pages properly, while definitely the preferable option, would also be incredibly time-consuming (frankly it would take about as much time as the user who created these pages apparently spent on them). If you want any of these entries re-created, I can restore the lost content; but as noted in the RFD, it really would take as much work to fix these entries as to make new ones from nothing. — Vorziblix (talk · contribs) 20:21, 25 January 2024 (UTC)[reply]

@-sche: Obscure and confusing messages will either confuse or simply get ignored. As a generic message, it would be better without the ', RFDO'. On the other hand, I would suggest expanding 'do not re-enter' to 'do not re-enter as was', though possibly that requires too good a grasp of English by those encountering it. --RichardW57m (talk) 10:38, 26 January 2024 (UTC)[reply]

'do not re-enter as was' would make sense if people generally knew what was entered, which is unprovided. Fay Freak (talk) 10:42, 26 January 2024 (UTC)[reply]

The need for a Notes section in entries

The ongoing Notes section vote is headed for failure, or at best, no consensus. I am partly responsible for that, so I feel a moral obligation to try to find a way forward, particularly with @Vininn126, who was the only one to make arguments in favour of the proposal.

At the vote, Vininn wrote:

I have mentioned, there are some older texts that are not entirely legible and could be read (litearlly) a few different ways, and adding a note about the form would be useful. It doesn't belong in Etymology, and it doesn't belong in Usage notes, as it's not about usage, and it doesn't belong in Trivia.

It sounds like we need a place to put non-usage-related notes regarding the reading or orthography of the headword itself, particularly in historical LDLs.

We already have a place for all other kinds of notes. For example, notes regarding pronunciation can just be placed as free text under the Pronunciation header, notes regarding inflections go under the relevant inflection header, and so forth. This makes me think we can do better than just adding a generic "Notes" header to EL.

Are there entries that include this type of note already? Where has it been included? If we add another header to EL, can we make it more specific than "Notes" (perhaps restricting it to LDLs)? Are there other types of notes that need to be included in an entry for which our current entry structure offers no good home? This, that and the other (talk) 09:10, 27 January 2024 (UTC)[reply]

For me this is something I deal with in Old Polish. A word being illegible but probable deserves a note And IMO none of the current sections fit. Vininn126 (talk) 10:16, 27 January 2024 (UTC)[reply]

@Vininn126 could you give some examples of relevant entries? What would you think of a header such as "Reading notes"? This, that and the other (talk) 11:37, 27 January 2024 (UTC)[reply]

It's a fine workaround I suppose; I still think there's a lot of types of information that might be excluded that falls outside of "Trivia", which I think we should make defunct. Currently I think I'm the only one using it simply because "Statistics" is not officially supported by ELE. Vininn126 (talk) 11:14, 29 January 2024 (UTC)[reply]

@This, that and the other: You could write that a term is "only attested in manuscript X and partially illegible; some have suggested that the word should be read as A, B, or C." This would go in the etymology section. Ioaxxere (talk) 16:47, 28 January 2024 (UTC)[reply]

I tend to agree, but Vininn was of the view this wasn't appropriate. This, that and the other (talk) 01:38, 29 January 2024 (UTC)[reply]

@Ioaxxere How is that etymology? It has to do with reading, nor etymology. I can't remember what Old Polish term has this at the moment, but it crops up rather frequently. Vininn126 (talk) 11:09, 29 January 2024 (UTC)[reply]

@Vininn126, This, that and the other: In Ancient Egyptian entries I’ve previously put such information under the ‘Alternative forms’ header, listing alternative suggested readings as a kind of alt-form entry. I don’t think this way of doing things is ideal either, however. — Vorziblix (talk · contribs) 19:54, 31 January 2024 (UTC)[reply]

@Vininn126 @Ioaxxere @Vorziblix I would encourage any of you if you would like to draft a vote, but I won't be doing so myself. Over to you. This, that and the other (talk) 04:41, 5 February 2024 (UTC)[reply]

over with in the etymology of over and done with

How can we show that over with is a discontinuous part of the etymology of over and done with ? Ideally, when you hover over the words over or with, a hyperlink would lead to the entry of over with . JMGN (talk) 16:42, 28 January 2024 (UTC)[reply]

What I think you are imagining can't be done. I would just write a simple etymology like "augmentation of over with". The etymologists on here might know a better term than "augmentation".

You don't need to mention done in the etymology; it's already linked from the headword line if anyone is dying to view that entry. The key etymological info here is over with. This, that and the other (talk) 09:31, 29 January 2024 (UTC)[reply]

I meant in the headword, in this case

Adjective

over and done with (not comparable) JMGN (talk) 16:04, 29 January 2024 (UTC)[reply]

Unfortunately {{en-adj}}, as complicated as it is, is apparently not flexible enough to do what you want. However, good ol' {{head}} can:

{{head|en|head=over and done with}}

BTW, several dictionaries, including MWOnline, Collins, and two idioms dictionaries, have entries for done with. DCDuring (talk) 18:13, 29 January 2024 (UTC)[reply]

You should be able to use |head= with {{en-adj}} the same way as with {{head}}. Benwing2 (talk) 00:50, 30 January 2024 (UTC)[reply]

Indeed, as the documentation indicates. DCDuring (talk) 15:58, 30 January 2024 (UTC)[reply]

Appendix:Largest cities in China

Hey, I was alerted to the existence of Appendix:Largest cities in China created in 2015 by @Atitarev by @SpAway. I have never seen anything like this- should this be expanded or removed or left as is or what? I plan to do nothing else with this page unless there is some consensus to expand it. --Geographyinitiative (talk) 00:43, 30 January 2024 (UTC)[reply]

@Geographyinitiative: The page needs a cleanup, e.g. use traditional script and pinyin. I made it in 2015. Not sure if it is of any value. Anatoli T. ^{(обсудить}/^вклад) 01:55, 30 January 2024 (UTC)[reply]

@Geographyinitiative I see this page as a todo list for creating missing entries. The page has no value for this project beyond that, as Wikipedia will always do a better job with this type of content (w:List of largest cities in China). When all the red links are blue the page can be deleted. This, that and the other (talk) 09:32, 30 January 2024 (UTC)[reply]

Last days to vote on the Charter for the Universal Code of Conduct Coordinating Committee

You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hello all,

I am reaching out to you today to remind you that the voting period for the Universal Code of Conduct Coordinating Committee (U4C) charter will close on 2 February 2024. Community members may cast their vote and provide comments about the charter via SecurePoll. Those of you who voiced your opinions during the development of the UCoC Enforcement Guidelines will find this process familiar.

The current version of the U4C charter is on Meta-wiki with translations available.

Read the charter, go vote and share this note with others in your community. I can confidently say the U4C Building Committee looks forward to your participation.

On behalf of the UCoC Project team,

RamzyM (WMF) 17:01, 31 January 2024 (UTC)[reply]

Hyphen for closing RFD and RFV

Perhaps a smaller issue but currently in RFD fora we have the text "Adding a comment to the discussion here with either RFD-deleted or RFD-kept, indicating what action was taken." but in RFV fora we have "Adding a comment to the discussion here with either RFV failed or RFV passed (emboldened), indicating what action was taken." Now I'm sure editors do as they please, but does it bother anyone else that one has a hyphen and one does not? I propose we add a hyphen to after RFV. Vininn126 (talk) 18:02, 31 January 2024 (UTC)[reply]

From searching the Talk: namespace, the hyphen is used 30% more often than the space. Calling it a "smaller issue" is an understatement... This, that and the other (talk) 22:02, 31 January 2024 (UTC)[reply]

By smaller I mean "not having an ultimately huge impact". Unless it's causing people to overlook potentially closed threads... Vininn126 (talk) 22:03, 31 January 2024 (UTC)[reply]

What difference does the hyphen make, anyway? Isn't it still the same message...? (If we're changing one, I personally think not having a hyphen makes more sense, but that aside) Kiril kovachev (talk・contribs) 23:12, 31 January 2024 (UTC)[reply]

That's why I said it's a small issue, but consistency is nice. Vininn126 (talk) 23:19, 31 January 2024 (UTC)[reply]

Sure, I guess I support, then! Kiril kovachev (talk・contribs) 00:43, 1 February 2024 (UTC)[reply]

We should start a long vote to choose between the hyphen and space. I'm-a-staunch-anti-spaceist-&-so-are-most-of-my-alternate-accounts. Demonicallt (talk) 00:51, 1 February 2024 (UTC)[reply]

I'm vehemently opposed - the difference helps distinguishing the two procedures. P U C – 01:10, 1 February 2024 (UTC)[reply]

You're joking, right? (Has anyone ever actually consistently closed RFVs with a space but RFDs with a hyphen? AFAICT people use hyphens for both, or use spaces for both, or vary for both, according to their own habits, so the only thing the inconsistency in the two headers accomplishes is making it look very slightly unprofessional, like each one was copyedited by a different person—which, admittedly, they probably were.) I was going to just copyedit them both into consistency before I saw your comment. - -sche (discuss) 09:02, 1 February 2024 (UTC)[reply]

Yes, I was joking 🙃 P U C – 21:02, 2 February 2024 (UTC)[reply]

While consistency is a poor argument, there is another reason for adding a hyphen. One way of viewing an RfV is as a claim that the term does not meet the CFI. If it cannot be shown that the term meets CFI, then the RfV has been upheld, and and we currently record the verdict as 'RFV failed'. What? Now, 'RFV-failed' seems more likely to be read as the alleged term failing against the RfV.

Apparently, if one believes that a term meets CFI, but that the evidence provided is woefully inadequate, we should be using {{rfquote}}, and not {{rfv}}. --RichardW57m (talk) 12:17, 1 February 2024 (UTC)[reply]

Ah yes, consistency is not something we should strive for... Vininn126 (talk) 12:25, 1 February 2024 (UTC)[reply]

If we want consistency, why would both RFV and RFD conclude with the action taken: -kept and -deleted (and -resolved)?

FWIW, I always liked RfV and RfD. DCDuring (talk) 13:20, 1 February 2024 (UTC)[reply]

[1] Schrijver, Peter C. H. (1995) Studies in British Celtic historical phonology (Leiden studies in Indo-European; 5), Amsterdam, Atlanta: Rodopi, pages 268-276

[2] Maciej Mętrak ((Can we date this quote?)) “Unrecognised languages of Poland?”, in ENGHUM summer school poster session‎^[1]

[1]

Wiktionary:Beer parlour/2024/January

Proto-Berber

User:USERNAME for confirmed group

Deprecating Latnx

Deprecating xzh-Tibt

Deprecating pjt-Latn

Petition to upgrade Medieval Greek

Removing Old Galician-Portuguese references/further readings in Galician entries

List of verbs by conversion of final voiceless /s/ into voiced /z/

Google Groups to stop archiving new Usenet posts

The Winter/Summer 2024 Competition is here!

Bit concerned about User:Mynewfiles

Affix segmentation with hyphens in derived terms lists in proto-languages

References

Reusing references: Can we look over your shoulder?

Words formed by substitution: new template suggestion

Moravian

Adding non-English names for languages to the language data

Formatting of Hesychian glosses

Extending Cantonese Jyutping

Notability for conlangs

Theknightwho adding new language codes again

Middle English and Scots to WT:RFVE and WT:RFDE

Goral

References

Red and Black-Link Disverifications

Informing you about the Mental Health Resource Center and inviting any comments you may have

Vote on the Charter for the Universal Code of Conduct Coordinating Committee

phonemic /ç/, /œ/ etc in English

Declension tables for different etymologies

Paleo-Balkan language family code

Should derived terms include derivable terms?

Inclusion of reduplicants?

New gadget: Catch My Attention

user Rua is using ə which actually is not a letter of the Slovene alphabet

Talk page - New section / Internal error ?

Alternative forms

Bulk Deletion of Hieroglyphs

The need for a Notes section in entries

over with in the etymology of over and done with

Appendix:Largest cities in China

Last days to vote on the Charter for the Universal Code of Conduct Coordinating Committee

Hyphen for closing RFD and RFV

Navigation menu

Search

Deprecating `Latnx`

Deprecating `xzh-Tibt`

Deprecating `pjt-Latn`