11
$\begingroup$

I've noticed that a specific user has engaged in a campaign to add the tag to a number of posts that do not mention machine learning, particularly posts about .

I don't think these edits are useful. I also think that there are a couple of overlapping problems which are at work.

  1. The tag description for is

    Methods and principles of building 'computer systems that try to automatically improve with experience.'

    which is not obviously distinct from a . A novice in the field could reasonably think of the two terms as interchangeable, and I think that is exactly what's caused this user to start adding the tag to questions.

    In my view, "machine learning" is the popular name for what Hastie et al call "statistical learning" in their landmark book Elements of Statistical Learning. Statistical learning, and its synonym "machine learning," both include neural networks as a special case.

  2. I think the description sounds a lot more like , which strives to create self-teaching algorithms to learn from experience, than it describes statistical learning.

  3. I don't think it makes sense to apply a more general tag when a more specific tag is sufficient. Elements of Statistical Learning discusses cubic splines, and you can use splines in a machine learning project, but I wouldn't add the tag to a question solely about .

    On the other hand, the tag is useful if you're asking a soft question, or a question about the general field. For example, the question "Why is cross-validation important in machine learning?" does not need to be concerned with random forest or SVMs or neural networks in particular. Likewise, "What is overfitting and how do you measure it?" is a question generally about statistical learning and not any specific method.

  4. The edit queue doesn't seem to be working. In my view, these edits add irrelevant tags, which is a reason to reject; however a number of these edits were approved anyway.

  5. The overall usage of the tag needs some attention. The usage varies from to to general statistics questions. I don't know what the best way is to make this tag useful; a large part of the problem is machine learning has become a marketing buzzword that means, essentially, "One of our employees did statistics on a computer." I think it would be technically correct but quixotic to force the usage of "statistical learning" in the place of "machine learning." Moreover, such a move might give people the impression that "machine learning" is not on-topic here, which is something we already have enough trouble with.

My questions for the community are

A. What, if anything, needs to be done to improve the description of the tag?

B. What should be done to improve the usage of the tag?

C. How can we best educate reviewers to thoughtfully consider edits which propose adding irrelevant tags?

$\endgroup$
16
  • 5
    $\begingroup$ (A) and (C) are good questions that are worth discussing. But I suspect that the answer to (B) is that nothing can be done: the tag is too big and too generic/vacuous. $\endgroup$
    – amoeba
    Commented Mar 30, 2019 at 22:11
  • 2
    $\begingroup$ @amoeba I fear that you might be right about (B). We don't have a statistics tag because all on-topic questions are, in some way, about statistics (broadly construed). Statistical learning is also on topic here, and machine learning is the popular name and marketing buzzword for statistical learning. $\endgroup$
    – Sycorax Mod
    Commented Mar 30, 2019 at 22:20
  • 2
    $\begingroup$ Here is an earlier related question. Note that statistical-learning is a synonym for machine-learning. To be honest, I am so unhappy with the sheer vacuosity of "machine learning" that I am close to suggesting we burninate it. $\endgroup$ Commented Mar 31, 2019 at 13:49
  • 6
    $\begingroup$ Some users will add “mathematical statistics” to their questions just because they don’t know what it means but it does have the word “statistics” in it. A similar thing happens to a portion of “machine learning” questions. $\endgroup$
    – Sycorax Mod
    Commented Mar 31, 2019 at 14:22
  • $\begingroup$ What's the problem with a question on neural networks also receiving the tag "machine learning"? As you said yourself, neural networks are contained within the larger field of machine learning / statistical learning. If somebody wants to follow the ML tag, they should then expect to see questions on neural networks. $\endgroup$
    – zxmkn
    Commented Apr 3, 2019 at 10:33
  • $\begingroup$ @zxmkn That's addressed in (3): adding the machine learning tag to questions not about machine learning as a field makes the tag pointlessly broad. Nearly all questions would also be eligible for the tag. $\endgroup$
    – Sycorax Mod
    Commented Apr 3, 2019 at 14:29
  • 1
    $\begingroup$ @Sycorax You assume that people following the ML tag are following it in order to see meta questions about the field. Maybe, however, they are following it, since they'd generally like to have posts about any ML topic highlighted for them. They then don't need to follow every ML-related tag individually to get all the ML posts highlighted. Statistical learning is a large field, so that would be a lot of tag collecting. Maybe the solution should be two separate tags: "machine learning field" (for posts of the nature you're referring to) and "machine learning" (as a catch-all). $\endgroup$
    – zxmkn
    Commented Apr 3, 2019 at 14:40
  • 4
    $\begingroup$ @zxmkn The same logic would justify creating a "statistics" tag. The community already made the decision that the "statistics" tag was over-broad and burninated that tag. I think that "machine learning" is in the same boat. The small benefit of making this niche usage more convenient needs to be balanced against the minimal organizational utility of a tag so broad that nearly all the site can reasonably gain the tag. $\endgroup$
    – Sycorax Mod
    Commented Apr 3, 2019 at 14:42
  • 2
    $\begingroup$ @Sycorax If that's your standpoint, then renaming the tag "machine-learning-meta" (or similar) would likely help. In addition, the tag description should clearly identify the tag as marker of posts about the field in a broader sense. $\endgroup$
    – zxmkn
    Commented Apr 3, 2019 at 14:49
  • 3
    $\begingroup$ @zxmkn That sounds like a reasonable Answer to me. $\endgroup$
    – Sycorax Mod
    Commented Apr 3, 2019 at 15:12
  • $\begingroup$ Just noticed, there is a "meta-regression" tag, so a "meta-machine-learning" tag wouldn't be without precedent. $\endgroup$
    – zxmkn
    Commented Apr 3, 2019 at 15:21
  • 3
    $\begingroup$ @zxmkn That's not a good analogy. The tag meta-regression is referring to a specific kind of regression modeling strategy per the tag description $\endgroup$
    – Sycorax Mod
    Commented Apr 3, 2019 at 15:23
  • 1
    $\begingroup$ AFAIK [statisics] tag was blacklisted from the outset by SE admins because that's the topic of this forum. We, as a community, have burninated and/or blacklisted exactly zero tags since then. I don't see this happening with [machine-learning], so I'd say practically speaking it is not an option. I think we'll just have to leave with it, for better or for worse. $\endgroup$
    – amoeba
    Commented Apr 3, 2019 at 15:50
  • $\begingroup$ Hey, so what do you think about the excerpt I suggested? $\endgroup$
    – amoeba
    Commented Apr 8, 2019 at 8:53
  • 1
    $\begingroup$ @amoeba I think the new tag description is a marked improvement over the old one. I was fretting about how to positively describe ML, but I suppose it's not entirely necessary if we're urging users to add a more specific topic to the question. $\endgroup$
    – Sycorax Mod
    Commented Apr 8, 2019 at 14:33

1 Answer 1

9
$\begingroup$

My opinion is that tag is borderline useless because it is too unspecific and too vacuous. It is not the only tag like that. For example we have ×600 and ×900, which are also vacuous and could well be burninated & blacklisted as far as I am concerned. But with over 12k threads is the most prominent example of this.

That said, based on my experience with trying to improve our tag system, there is nothing we can do about it: our community has never ever requested SE admins to burninate a tag. And manual deletion is not an option. So is here to stay. Also, given its popularity, I don't think we can really influence how people use it.

A. What, if anything, needs to be done to improve the description of the machine-learning tag?

You are right, the current excerpt does not make sense and we should change it. My suggestion:

Machine learning algorithms build a model of the training data. The term "machine learning" is vaguely defined; it includes what is also called statistical learning, reinforcement learning, unsupervised learning, etc. ALWAYS ADD A MORE SPECIFIC TAG.

Update: I went ahead and made the edit to the excerpt. What do you think?

B. What should be done to improve the usage of the machine-learning tag?

As I argued above, nothing can be done.

C. How can we best educate reviewers to thoughtfully consider edits which propose adding irrelevant tags?

I've noticed that a specific user has engaged in a campaign to add the tag [machine-learning] to a number of posts that do not mention machine learning, particularly posts about neural-networks. I don't think these edits are useful.

Is this the only edit that they are doing? I think if there are multiple edits and [machine-learning] is added alongside, then it is fine. If the sole purpose of an edit is to add the [ML] tag to a Q on neural networks, then I agree this is not useful and actually harmful because it clogs the queue and the front page.

As a first attempt to settle this, can you perhaps get in touch with this user and friendly point them to this discussion?

$\endgroup$
0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .