13
$\begingroup$

We have no less than 13 tags containing "learning". Some of these are certainly fine, e.g. , , , , , , or . My issues are with the following one:

  • looks completely vacuous to me. Sometimes another of the *learning tags would be far better, and sometimes it seems to be used as a synonym for . We currently have 63 questions tagged . I'm afraid that if we simply remove the tag, it will pop up again. Should it be burninated?

  • : the wiki excerpt says that "SL refers to the statistical perspective on machine learning." To be honest, I am less than convinced. Shouldn't all questions on CrossValidated that carry the tag have at least some "statistical perspective"? Then again, the posters of 133 questions seem to disagree with me. Thoughts, anyone?

  • (25 questions), (3 questions), (3 questions): all these have no tag wiki. Can someone create one?

$\endgroup$
24
  • 8
    $\begingroup$ See meta.stats.stackexchange.com/questions/4485 about [learning]. It's in the process of being removed. $\endgroup$
    – amoeba
    Commented Jan 4, 2017 at 17:12
  • 6
    $\begingroup$ I agree that [statistical-learning] is a bad tag. Sometimes people use it when mentioning Hastie et al. book which has "Statistical learning" in its title. The book is very popular and this partially explains the tag's popularity. I would either eliminate the tag entirely, or make it a synonym of [machine-learning]. $\endgroup$
    – amoeba
    Commented Jan 4, 2017 at 17:15
  • 4
    $\begingroup$ Making [statistical-learning] a synonym of [machine-learning] seems like a good idea. I think in principle SL could be a valid tag, but it's very unlikely to work out--it'll just create more fragmentation instead. $\endgroup$ Commented Jan 5, 2017 at 13:22
  • 1
    $\begingroup$ [q-learning] is a particular type of [reinforcement-learning]. It should probably be made a synonym. $\endgroup$ Commented Jan 5, 2017 at 13:23
  • 1
    $\begingroup$ I wrote an excerpt for [q-learning]. I don't think it should be made a synonym of [reinforcement-learning] (@gung), it is specific enough & important enough to stay separate. But I am not an expert. $\endgroup$
    – amoeba
    Commented Jan 10, 2017 at 9:46
  • 1
    $\begingroup$ I am not sure that we need transfer-learning and representation-learning tags. I might look into that. $\endgroup$
    – amoeba
    Commented Jan 10, 2017 at 9:48
  • $\begingroup$ @amoeba, I notice that there are 174 [RL], & only 24 [QL], of which 22 threads have both. The other 2 could probably be differently tagged. Strictly QL is not a synonym of RL--it is more of a part-whole relation--but I suspect we could do fine w/o it. $\endgroup$ Commented Jan 10, 2017 at 13:22
  • $\begingroup$ @gung Yes, it's a proper subset. We probably could do without, I agree, it's just a question of how few questions per tag we are still tolerating. To me 25 threads sounds fine. It's a shame that we have so few RL questions anyway; these days it's a super hot topic, with DeepMind winning in Go last year etc. (they used Q-learning, by the way). So I would leave this tag if only to advertise that RL is actually on-topic here. $\endgroup$
    – amoeba
    Commented Jan 10, 2017 at 13:29
  • $\begingroup$ @gung I've just added RL tag to the 2 QL questions without it :) $\endgroup$
    – amoeba
    Commented Jan 10, 2017 at 13:31
  • $\begingroup$ @amoeba, per SE policy, 1 is too few. AFA I'm concerned, 2 is OK if there's an excerpt, it's w/i our purview, & they are really appropriate. I've never come up w/ a good way of thinking about how to deal w/ subset relations b/t tags, but if it seems like people always feel the need to add the superset as well, I wonder if the subset tag is superfluous. I'm not 100% here, but it seems to me that we could do w/o it. $\endgroup$ Commented Jan 10, 2017 at 14:02
  • 2
    $\begingroup$ @gung The issue of subset tags is tricky, I agree. Our general approach seems to be that when a smaller tag is popular enough, it is fine. For example, [bonferroni] (150) is a proper subset of [multiple-comparisons] (900), but with 150 threads it seems to me definitely useful. QL and RL are in the same kind of relationship, but with 25/175 threads. The ratios are similar, btw. I guess it is worth discussing what can be our general guideline about it. $\endgroup$
    – amoeba
    Commented Jan 10, 2017 at 14:54
  • 1
    $\begingroup$ I removed [representation-learning] as unclear. I think it's basically supervised-learning/deep-learning/feature-construction/etc. $\endgroup$
    – amoeba
    Commented Jan 10, 2017 at 18:57
  • 2
    $\begingroup$ Hmm, I've been reading my sources and I can't find the distinction between [statistical-learning] and [machine-learning] anymore... it there was one to begin with. Perhaps I'm misremembering and there's no distinction whatsoever. $\endgroup$
    – Firebug
    Commented Jan 12, 2017 at 0:41
  • 2
    $\begingroup$ Eg, if someone wanted to ask about the standard error of an estimated splitting point in a CART model, that is a statistical way of thinking about a ML algorithm. It isn't clear to me that a typical ML researcher would much care about that. (NB, I'm not saying either field is better or worse.) So such a question could be tagged SL. But I doubt the tag would ever be really used that way; I think it is inevitable that it would end up creating more noise than clarity. $\endgroup$ Commented Jan 13, 2017 at 16:27
  • 1
    $\begingroup$ @Firebug, I'm not sure I would say it's undesirable. My perspective is largely pragmatic: I think tags serve to organize the information on the site more effectively. If the tag stands for something that people recognize & it is used to enhance the organization of the site (or it can be made so), then I would say it's good & we should keep it. I think it won't be well used (perhaps the idea is too nuanced, or my understanding of the term is off somehow), & so will harm the site's organization on balance. $\endgroup$ Commented Jan 13, 2017 at 16:39

2 Answers 2

11
$\begingroup$
  1. The tag is in the process of being removed, see the answer in Understanding the use of the [education] tag.

  2. I wrote wiki excerpts to and . The former is IMHO a valid & useful tag, about the latter I am not sure as it only has 5 threads, but the concept seems clear enough so I'd say let it be.

  3. I removed [representation-learning] from all threads as too vague and unspecific.

  4. should become a synonym of , see also the discussion in the comments above where everybody agrees with that.

    Update: The synonym has been created, thanks to @Scortchi. Case closed.

$\endgroup$
10
  • $\begingroup$ About 4, could we start with a synonym? $\endgroup$
    – Firebug
    Commented Jan 13, 2017 at 12:49
  • 1
    $\begingroup$ @Firebug, sure. Merging just means making a synonym and transferring all the existing threads from child tag to the parent tag. One can make a synonym without merging or a synonym with merging. Usually the idea is that all synonyms should be merged, at least once they are "well established" and it's clear that nobody would want to roll it back. Because it's hard to roll back the merging. $\endgroup$
    – amoeba
    Commented Jan 13, 2017 at 12:53
  • $\begingroup$ Alright, I'm not totally clear on the merging, I'm writing an answer about the (possible) distinction. Perhaps a question in the main site is warranted, but I'm under the impression it has been answered already. $\endgroup$
    – Firebug
    Commented Jan 13, 2017 at 12:55
  • 1
    $\begingroup$ If there is a case against merging, then we should not make a synonym either! Making a synonym without merging does not make sense at all (only as a temporary transition phase). So yes, do bring it up if you have concerns. $\endgroup$
    – amoeba
    Commented Jan 13, 2017 at 12:56
  • $\begingroup$ Oh I'm not sure a merging shouldn't performed either! That's why I proposed you act on the synonym first, because I know that can be undone. But I'm collecting some definitions, let's see. $\endgroup$
    – Firebug
    Commented Jan 13, 2017 at 12:59
  • $\begingroup$ I'd say even [machine-learning] itself is a very vague tag and a case could even be made that we don't need it. Is regression machine learning? Is logistic regression machine learning? Is any other binary decoding machine learning? Etc. Borders are not clear. Sometimes people put this tag, sometimes not, it's very inconsistent and the scope is unclear. I guess we can't really remove it, it would be too drastic, but splitting hairs even further between ML and SL just seems too much, even if some authors make some distinction between these two terms. $\endgroup$
    – amoeba
    Commented Jan 13, 2017 at 13:03
  • 1
    $\begingroup$ Alright, added an answer, please give it a read. Basically, I found no distinction yet. $\endgroup$
    – Firebug
    Commented Jan 13, 2017 at 13:23
  • $\begingroup$ Off-topic ping: amoeba, now when I'm a mod, feel free to ping me on tag issues that need action, I'll be happy to help. $\endgroup$
    – Tim Mod
    Commented Aug 30, 2017 at 8:37
  • $\begingroup$ @Tim Remembering your last comment here: what about synonym merging? Can you help with that? See stats.meta.stackexchange.com/questions/2790. I think gung started doing that after he became the mod but did not finish; I asked him once why he does not continue but he did not respond, so I assume he was busy with other stuff. I don't know how the interface looks like but I am guessing that each synonym can be merged with a couple of clicks so it's a job for 10 minutes if at all... $\endgroup$
    – amoeba
    Commented Oct 20, 2017 at 8:40
  • $\begingroup$ @Tim So what do you think? $\endgroup$
    – amoeba
    Commented Oct 25, 2017 at 12:26
2
$\begingroup$

EDIT BELOW (August 25th, 2017)

I concur with what @amoeba proposed, except point 4.

  1. should be merged into , see also the discussion in the comments above where everybody agrees with that.

@amoeba changed point 4, so now we are in agreement

  1. should become a synonym of , see also the discussion in the comments above where everybody agrees with that.

@gung had already said in the comments

Making [statistical-learning] a synonym of [machine-learning] seems like a good idea. I think in principle SL could be a valid tag, but it's very unlikely to work out--it'll just create more fragmentation instead. – gung Jan 5 at 13:22

For now I'm against the merging and in favor of proposing as synonym. But I can't still pinpoint if SL and ML warrant two tags. As @gung commented in this answer perhaps this warrants a separate question as well.

Below, I collected some evidence SL might not be simply ML.


Alright, found a somewhat (blurry) contrast between and . While I was searching for it in Elements, it was actually in An Introduction.

Right at Chapter 1 - Introduction (emphasis mine):

Statistical learning refers to a set of tools for modeling and understanding complex datasets. It is a recently developed area in statistics and blends with parallel developments in computer science and, in particular, machine learning. The field encompasses many methods such as the lasso and sparse regression, classification and regression trees, and boosting and support vector machines.

In "A Brief History of Statistical Learning" (sorry, long quote)

By the end of the 1970s, many more techniques for learning from data were available. However, they were almost exclusively linear methods, be-cause fitting non-linear relationships was computationally infeasible at the time. By the 1980s, computing technology had finally improved sufficiently that non-linear methods were no longer computationally prohibitive. In mid 1980s Breiman, Friedman, Olshen and Stone introduced classification and regression trees, and were among the first to demonstrate the power of a detailed practical implementation of a method, including cross-validation for model selection. Hastie and Tibshirani coined the term generalized additive models in 1986 for a class of non-linear extensions to generalized linear models, and also provided a practical software implementation.

Since that time, inspired by the advent of machine learning and other disciplines, statistical learning has emerged as a new subfield in statistics, focused on supervised and unsupervised modeling and prediction. In recent years, progress in statistical learning has been marked by the increasing availability of powerful and relatively user-friendly software, such as the popular and freely available R system. This has the potential to continue the transformation of the field from a set of techniques used and developed by statisticians and computer scientists to an essential toolkit for a much broader community.

In Chapter 2 - Statistical Learning there's also some definition of the term.

Following the next chapters, you also have staples in Machine Learning: Linear Regression, Classification (LR, LDA, QDA, KNN), Resampling, Linear model selection and Regularization (subset selection, shrinkage, PCR, PLS), Non-linear regression (regression splines, GAM), Trees (CART, Random Forest), Support Vector Machines (SVC, SVR), Unsupervised Learning.

Sadly, nowhere SL and ML are directly compared one against the other.

I'd like to foment some discussion on the term, because it's not sure in what does it deviate from machine learning, if in anything at all.

*Now I'm under the impression it's a synonym (i.e. the ML framework under the statistics culture and jargon), but why not use the more vendible term then? Though in the scientific literature SL is a really popular term.

Edit:

Perphaps the difference is simply cultural, like many discussions in the main site pointed. Consider Stanford, where two courses are taught: Stats 315a/315b - Statistical Learning and CS 229 - Machine Learning. Apart from being named different and being in different concentrations areas, they also attract different students.

Tibshirani even shares his views in his page comparing both courses and then both terms:

Machine learning research focusses more on low noise situations, eg engineering applications like robotics and physical sciences

Statistical learning focusses more on high noise, observational data like medicine and genomics, and problems where interpretation of the fitted model is important

But more and more overlap in application areas!


EDIT:

I've come to the conclusion Statistical Learning is the application of Learning algorithms to classical statistical problems (I think the small phrase at the Machine Learning article in Wiki and the ISLR book description corroborate this notion). The distinction to Machine Learning is better shown with examples:

  • Machine Learning is concerned with optimizing generalized predictive power. So the focus is mostly on loss functions. Eg.: Studies trying to predict if a person has Alzheimer from neuroimaging, thus producing biomarkers of Alzheimer, but not focused on the biological meaning on the features, just on performance.

  • Statistical Learning on the other hand wants to make inference over this scenario. Eg.: "How do learning algorithms trained to predict the biological aging from neuroimages of healthy people perform on the presence of Alzheimer? Why?"

Another possible scenario for Statistical Learning is predicting states, such as task paradigms, from neuroimaging using linear models with shrinkage, such as SVMs, producing interpretable weight maps. Yet another scenario is in the introduction of a new imaging technique, where Statistical Learning can help the scientific community to uncover if said technique improves the diagnosis of a disorder.

*I'm mostly talking about neuroimaging because that's my area of expertise.

Said all that, I'm of the opinion the tag would be mostly useless here on CV, and wouldn't be used for it's true meaning.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 6). New York: Springer.

$\endgroup$
17
  • 1
    $\begingroup$ This doesn't seem to me to really be an answer to this question. It seems more like this should be a new question. I am not sure if I think this would be more appropriate as a meta.CV Q or a main site Q, but this does seem more like the setup for a different question than an answer to this one. $\endgroup$ Commented Jan 13, 2017 at 16:20
  • $\begingroup$ @gung kinda, but I guess I'm solely disputing here the merging of SL and ML (will edit to keep on-topic). That's why I wanted your input, as you edited the SL tag. $\endgroup$
    – Firebug
    Commented Jan 13, 2017 at 16:22
  • $\begingroup$ FTR, I don't think this contribution is bad. I just think it is more appropriate elsewhere. $\endgroup$ Commented Jan 13, 2017 at 16:30
  • $\begingroup$ @gung oh I understood. $\endgroup$
    – Firebug
    Commented Jan 13, 2017 at 16:36
  • 2
    $\begingroup$ To be honest, I don't think these distinctions are actually made in practice. For instance, I have never heard of the distinction Tibshirani makes between SL and ML. I'd assume rather that 98% of the people get "imprinted" on using either "SL" or "ML" and then use that term exclusively, regardless of where their specific circumstances fall in Tibshirani's classification. Reminds me of the discussion about how "forecasting" and "predictions" differ. $\endgroup$ Commented Jan 13, 2017 at 20:00
  • $\begingroup$ @StephanKolassa Agreed, perhaps the difference is simply cultural then. That's why I proposed the synonym instead of the merge. I'm elaborating a question to the main site though. $\endgroup$
    – Firebug
    Commented Jan 13, 2017 at 20:13
  • $\begingroup$ And we have this question already stats.stackexchange.com/questions/179021/… . Perhaps no question is needed in the main site then, but this one could still branch out into meta. $\endgroup$
    – Firebug
    Commented Jan 13, 2017 at 20:17
  • $\begingroup$ I am confused by your suggestion of making a synonym but not merging. As I said earlier, I think this only makes sense as a temporary measure, in case we decide to roll the synonym back. Otherwise this arrangement does not make sense. So do I understand correctly that you suggest to make a synonym, wait for some time (how long?), and then if nobody complains to merge? Or do you have something else in mind? $\endgroup$
    – amoeba
    Commented Jan 14, 2017 at 0:02
  • $\begingroup$ @amoeba basically that. gung already clarified what I asked earlier. But I find the idea of a tag to the statistical details of machine learning algorithms great (I've been using it these days), opposed to procedures covered by ML. Without a tag these questions are a bit harder to find (they are already hard to find because people don't use the tag).. $\endgroup$
    – Firebug
    Commented Jan 14, 2017 at 2:00
  • $\begingroup$ I am sorry, @Firebug, I am still not really getting your suggestion here. If you "find the idea of a tag to the statistical details of machine learning algorithms great", then why do you advocate creating a synonym? Are you aware how tag synonyms work? After a synonym is created, nobody will be able to post a question tagged with statistical-learning anymore. $\endgroup$
    – amoeba
    Commented Jan 16, 2017 at 18:55
  • $\begingroup$ @amoeba I do understand that. The keypoint is that I think SL means that. I'm mostly worried we might not be able to undo the change in a merge. That's why I suggest a synonym, so perhaps someone (me?) might collect enough evidence SL is not simply ML and both tags can coexist undoing the synonym. As you can see by my answer, I'm nowhere near proving SL conveys what I think it conveys. $\endgroup$
    – Firebug
    Commented Jan 16, 2017 at 19:06
  • $\begingroup$ @amoeba Basically, while I like the utility of a tag like that, I'm not sure SL is that at all. $\endgroup$
    – Firebug
    Commented Jan 16, 2017 at 19:06
  • $\begingroup$ I find it strange to advocate a synonym while hoping that it will eventually be undone, but okay. I edited my answer to focus bullet #4 on creating the synonym (and not on merging). $\endgroup$
    – amoeba
    Commented Jan 16, 2017 at 19:16
  • $\begingroup$ @amoeba it isn't like I hope it'll be undone, just that I'm in doubt it is the same thing. Under that consideration, a synonym seems like the most cautious approach. $\endgroup$
    – Firebug
    Commented Jan 16, 2017 at 19:33
  • $\begingroup$ Update: Just noticed that the synonym has been created. $\endgroup$
    – amoeba
    Commented Jan 18, 2017 at 10:11

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .