Measuring the Complexity of Learning Content to Enable Automated Comparison, Recommendation, and Generation

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11597))

Included in the following conference series:

International Conference on Human-Computer Interaction

1832 Accesses

Abstract

Learning content is increasingly diverse in order to meet learner needs for individual personalization, progression, and variety. Learners may encounter material through different content, which invites a measurable comparison method in order to tell when delivered content is sufficient or similar. Content recommendation and generation similarly motivate a fine-grained measure that enhances the search for just the right content or identifies where new learning content is needed to support all learners. Complexity offers a fine-grained way of measuring content which works across instructional domains and media types, potentially adding to existing qualitative and quantitative content descriptions. Reductionist complexity measures focus on quantifiable accounting which practitioners and computers in support of practice can use together to communicate about the complexity of learning content. In addition, holistic complexity measures incorporate contextual influences on complexity that practitioners typically reason about when they understand, choose, and personalize learning content. A combined measure of complexity uses learning objectives as a focus point to let teachers and trainers manage the scope of reductionist elements and capture holistic context factors that are likely to affect the learning content. The combined measure has been demonstrated for automated content generation. This concrete example enables an upcoming study on the expert acceptance and usability of complexity for differentiating between hundreds of generated scenarios. As the combined complexity measure is refined and tested in additional domains, it has potential to help computers reason about learning content from many sources in a unified manner that experts can understand, control, and accept.

You have full access to this open access chapter, Download conference paper PDF

Complexity and Difficulty of Items in Learning Systems

Article 04 May 2021

A Cognitive Load Theory Approach to Defining and Measuring Task Complexity Through Element Interactivity

Article Open access 02 June 2023

Not New, but Nearly Forgotten: the Testing Effect Decreases or even Disappears as the Complexity of Learning Materials Increases

Article Open access 16 May 2015

Keywords

1 Introduction

Learning content accomplishes many tasks in modern practice: from simple presentation of material to guided practice with feedback, from formative assessment to high-stakes credentialing, and from ongoing learning for personal interest to ongoing training on the job. Learning content may also be presented in different modes ranging through text on a page, video, interactive simulations, or performance observations situated in a real or realistic context. Interactive learning content such as training simulations can also change internally to give learners different levels of support or create variable conditions that make performance easier or harder.

Intuitively, each type of learning content is likely to be well suited for different learners at different stages of learning. As a result, expert teachers and trainers can identify what specific learning content will ensure learners meet a required level of readiness, or what content will best help an individual to progress. Offering different learning content on an individual basis makes sense for reasons including personalization, progression, and variety.

First, personalized content can vary the support and challenge available to individual learners so that they get the help they need to succeed on the target content (Wray et al. 2009; Wray and Woods 2013) or confront specific misconceptions (Folsom-Kovarik et al. 2018). Learning content can be presented with accommodations that cater to learning differences. The same content can also be couched in personalized terminology or contexts, in order to align with an individual learner’s past experience. Personalization motivates comparison between personalized learning content, to help ensure all learners received commensurate training or received some required minimum of training.

Second, variable progression has been demonstrated in settings such as mastery learning (Bloom 1984) and others, where learners master a topic before moving on, even if additional learning content is needed. Controlling how learners progress also reduces time and resources that would be wasted on presenting content for which the learner is not yet prepared (Dunne et al. 2014). Personalization and variable progression introduce the need for recommendation, matching each learner with the optimal learning content to suit their needs.

Third, variety of presentation can keep learners interested or motivated. During assessment, varying the learning content can help test learners’ ability to generalize and assess near or far transfer. Variety is also crucial for retest validity of assessments that will be presented to the same learners repeatedly over time, or in settings where learners may be expected to share information that would bias how others interact with the content. Increasing variety without a burden on authors is the goal of content generation which can automate some tasks for authors and make many more variants of learning content.

Teachers, trainers, and learners all value having a range of content for reasons of personalization, progression, and variety. However, challenges exist in understanding the specific ways in which various content is similar or different. Measures are needed which provide insight beyond surface differences which may not change learning, and instead express facts about how the content can be expected to affect learners and learning. Such measures could help to answer important questions. When two learners received assessments under different conditions, were their outcomes comparable? Which types and amount of support or challenge does a particular learner need at this time? Given a history of content that a learner has experienced, what is the best next step to progress?

To answer these questions, various measures of complexity have been explored. Describing learning content with various dimensions of complexity suggests a quantifiable approach to expressing what it is that expert teachers and trainers think about whenever they choose to vary learning content. If successful, a measure of complexity would enable computational support for learning such as recommender systems that suggest what content a learner needs, metadata standards that enable reusing and exchanging learning content, and generative programs that create new learning content to fit a given need.

2 Reductionist Definitions of Complexity Offer Quantifiable Descriptions of Learning Content

As a foundation for these new capabilities, a clear definition of complexity is needed which includes an explanation of the senses in which complexity can be objective rather than subjective. Several such measures from past work offer expressions of complexity. Some examples are considered with the goal of reflecting how practitioners such as teachers and trainers could use the measure without needing theoretical or technical knowledge to understand and control complexity in learning content.

Complexity is a characteristic that can describe any learning content and has been studied in many forms in the past. For example, Piaget (e.g., 1936, 1972) theorized about how children develop from predominantly interacting with just the physical world to being able to understand relatively complex abstract concepts. Bloom et al. (1956) formally ranked learning activities in a taxonomy that could be related to complexity (e.g., inferring that knowledge or recall acts at a lower level of cognitive complexity than application, which is a lower level of complexity than evaluation).

A reductionist perspective on task complexity involves breaking a task down into its component processes and rating the complexity of each process. One first step to creating a definition for complexity is to list the possible dimensions of complexity in a given component process. Wulfeck et al. (2004) built a list of these dimensions based on previous work by Feltovich and colleagues (e.g., Feltovich et al. 2012), and that list is summarized here in Table 1.

Table 1. Dimensions of task complexity (Wulfeck et al. 2004)

Full size table

Other possible dimensions of task complexity include (Dunn 2014): Number of required acts, number of information cues, number of sub-tasks, number of inter-dependent sub-tasks, number of possible task paths, number of criteria to satisfy a task, number of task paths that conflict or are unknown, and level of distraction. These dimensions, along with those outlined in Table 1, can form the basis for evaluating whether a task is complex in a reductionist manner. The dimensions can be related additively or through a more sophisticated calculus accounting for mediators and moderators. Reductionist perspectives tend to favor labeling and counting dimensions of complexity. Nuance can be obtained by gauging the degree to which a task fulfills a dimension and adjusting the importance level of a given dimension. An example of a reductionist perspective on measuring training is the U.S. Army’s “Leader’s Guide to Objective Assessment of Training Proficiency”, or “Objective T” (Department of the Army 2017), in which training is not considered complex unless all four operational variables (terrain, time, military threat, and social population) are considered dynamic.

As an alternative to task complexity, another accounting measures mental processes that must be recruited to accomplish learning. Mental processes include perception, attention, memory, and executive function. Together these mental processes drive underlying cognitive skills and are evidenced in behaviors. The complexity of content can be inferred based on the behaviors required of operators while engaging with the content. Table 2 gives a sampling of some operator behaviors that are often associated with complexity in the content, environment, or task. The more of these behaviors an operator is required to perform during learning, the more complex the learning content likely is.

Table 2. Common operator behaviors in complex environments

Full size table

The aforementioned elements of complexity (task complexity, required behaviors of operators) can be used, in a reductionist manner, to determine whether a given piece of content is complex. Many measures of complexity use reductionist perspectives because they feel more objective and are more easily quantifiable. However, reductionist perspectives require a theoretical understanding of cognitive processes which may or may not be sensitive to differences in competing theories or to application of theory in a practical setting. The many dimensions and their definitions can feel inaccessible for teachers, trainers, and instructional designers who need to assess and control learning content in order to achieve a target outcome. In addition, the measures might not capture the totality of factors that create complexity in a task or environment.

3 A Combined Measure of Complexity Reflects Expert Practitioner Understanding

When discussing how to evaluate the complexity of learning content, two views of complexity include a reductionist accounting based in mental processes (discussed previously) and a more holistic or macrocognitive accounting of complexity as a feature of sensemaking situated in context and culture. Taken together, these views, although in tension with each other, define a combined measure of complexity that aligns with practitioner understanding of learning content and thus is usable by teachers and trainers.

In contrast to reductionist perspectives, holistic perspectives on complexity emphasize less strictly defined factors that can contribute to the complexity of a task. In this accounting, factors can operate individually but complexity increases when factors collectively produce “emergence” in the task or environment. Emergence describes cumulative effects that vary widely based on small initial differences and are therefore difficult to predict just from knowing about each component process individually (Paries 2006). The relationships between the factors can give rise to unpredictable operator behaviors, which further increase the complexity of a task. The number and structure of relationships can be counted to help measure complexity.

Holistic complexity can also include contexts external to learning content such as students’ learning careers, current expertise, or learning objectives (LOs) because that information will affect how a task should be presented and how content should be described. Any cognitive model related to situational factors (“situated cognition”; Brown et al. 1989) is relevant to a holistic perspective on complexity because it considers knowledge imparted by training as inextricable from the contexts in which it is presented. Holistic complexity also accounts for ways in which learners understand the gist and deeper structural meaning of information as they progress in their mastery of a task (sensemaking; Boulton 2016).

A combined approach to measuring complexity could be informed by these existing holistic measures in considering those parts of external context which instructors already use to describe learning content, such as learning objectives. For example, a single scenario in a virtual environment might be differentially complex based on the learning objective of the user. If the scenario contains numerous and fast-changing air combat, it might be complex for the purpose of teaching air support but not at all complex for instructing infantry maneuver under a contested airspace.

Complete formal approaches to holistic perspectives are rare, but accounting for contextual factors and technical interactions between elements is often at the core of any holistic perspective. In the Army’s “Objective T” guide to training assessment (Department of the Army 2017), one way in which contextual factors are accounted for is in conditions for performance. Nighttime training is rated as more complex than daytime training due to lower visibility.

When some or all component processes of a task interact in a way that is above and beyond what could be predicted from knowing the component processes alone, such added complexity is a part of holistic complexity that might not be accounted for when using a reductionist approach. An example of interacting processes from the sports realm is dribbling up the court and shooting a basketball – the component processes of dribbling and shooting are each complex at a certain level, but the entire task is additionally complex because of the footwork required to transition between dribbling and shooting (additionally complex compared to just the summative complexities of dribbling and shooting considered separately).

Although the individual complexities of component processes may be easier to quantify than holistic complexities, both types of complexity are instrumental in determining the true complexity level of learning content. The reductionist-perspective dimensions and behaviors (Sect. 2) provide some relatively objective and concrete criteria for complexity. Holistic complexity characteristics might be difficult for practitioners to precisely measure, but a combined measure of complexity does not necessarily require precise measurement to show improvement over reductionist or holistic perspectives alone.

From the perspective of reproducing and reinforcing expert teaching and training practices, a computer-accessible measure of complexity should strive to account for both types of complexity. Expert teachers and trainers incorporate both perspectives when evaluating learning content. Instructors may present component processes separately at first so that emergent complexities can be taught separately as well, or they may start with an overview and explore separate factors in detail later. Both approaches are chosen with intention and contribute to managing complexity in the training.

One manner of combining reductive and holistic complexity measures recognizes the key role played by domain knowledge in determining context. The example presented below captures domain knowledge about what is important to learning and what is only a surface feature in the form of learning objectives. Some variations in training scenarios could change complexity, but other variations might not produce relevant changes in complexity. The key insight is that the resulting complexity might be influenced by countable factors such as the number of cues or distractors, but only a domain expert can determine what the cues or distractors are. The factors an expert identifies are likely to vary by instructional domain, population, and expected level. Therefore, a combined measure is needed which is able to quantify those complexity factors that a practitioner highlights, differentiate what learning objectives they impact, and express broad strokes or fine-grained detail in a manner that is robust to human imprecision.

Measuring complexity in a combined fashion ensures that the measure is useful and understandable to practitioners. Furthermore, given that human tutors are considered by many to be a “gold standard” of instructional effectiveness (e.g., Graesser et al. 2001), strategies used by human tutors are worthy of emulation in intelligent training systems. The possibilities for comparing, recommending, and generating content would expand significantly with the ability to perform automated assessment of task complexity in a way that combines reductionist and holistic perspectives. An example is the scenario generation algorithm described in the following section.

4 Measuring Complexity Improves Comparison and Recommendation of Learning Content

Computer support for humans in teaching, training, and designing instruction is common today. Three areas in which a complexity measure can contribute to learning impact include comparison across learning opportunities, recommending how to learn most effectively or efficiently, and generating new variations on learning content.

Modern teaching and training encompass much more than formal or classroom learning. Learners may seek out a how-to video online at a moment’s whim of personal interest, or may encounter a new technology to learn on the job even after years of building expertise. Comparing learning across all these formal, nonformal, and informal opportunities is driving a recent renaissance of interest in standards that describe learning.

Representative examples of existing standards that describe learning include the IEEE standard for Learning Object Metadata (LOM; IEEE 2018) and the Shareable Content Object Reference Model (SCORM; ADL Net 2018a), among others. The goal of these standards at a high level is to let more than one computer system reason about a unit of learning content. Authors of these and other standards defined fields to describe learning content both qualitatively and quantitatively. For example, existing standards can express language, grade level, media type, and expected duration. These help to understand what learning content will be like before a learner attempts it, and to communicate what the learner did after the content is completed. As a result, it becomes possible to search for relevant content and to piece together content into a sequence or program of instruction.

It should be noted that in the field of learning, an inherently human enterprise, these standards necessarily simplify the full understanding of an expert instructor into categorical descriptions that combine similarities and blur subtle details. Their goal is not to express every possible idea, but to express enough information for shared understanding. In the same way, some values defined in existing standards require subjective determination or allow for disparate definitions. For example, the grade level assigned to content might differ between countries or, in the U.S., between states. Categories that can be described in words but not in math, such as the interactive multimedia instruction (IMI) level, also have fuzzy boundaries but are widely used and useful. All these examples suggest that a definition of complexity can be useful in describing learning content without necessarily being mathematically precise or agreed by all parties, as long as the definition provides enough agreement to add detail to the existing descriptions.

The need for comparing learning content has recently increased beyond what could have been envisioned in the early days of standards development. Future learning ecosystems (Raybourn et al. 2017) are being created that work to unify a learning experience in an entire career-long or lifelong learning, deployed across the boundaries of separate computer systems from many different vendors. New standards such as Experience API (xAPI; ADL Net 2018b) are emerging to share much more detailed information about how people learn as it occurs, from second to second. The Experience API is part of a movement to acknowledge and act on the subtle differences in how individuals interact with learning content. Clearly, content that has the same grade level and IMI level can still vary widely in the experience it presents and the impact it can have on learning. To prepare for this future requirement, most existing standards provide methods for extension that would be compatible with one or more complexity measures. Along with other fine-grained metadata, a complexity measure is likely to express needed information about learning content that will support finding, using, and interpreting the learner experience in a future learning ecosystem.

If the capability to objectively measure complexity is developed, such a capability will improve the methods computers have available to recommend, or support experts recommending, content based on learner needs. Learning content that covers different topics or different levels can still progress through well-known methods. When learning content is available that has substantially similar topic and level, then recommender systems can use complexity as another characteristic to differentiate learning content options.

With the addition of information about individual learners, recommenders will be able to use complexity to predict the subjective and apparent difficulty of a piece of content, estimating how likely a learner is to succeed or how much effort will be required. These predictions will be useful in recommending content that is located within a learner’s “zone of tolerable problematicity,” the range of task difficulty that a learner is willing to engage with because the task is neither too complex nor too simple (Elshout 1985; Snow 1989).

Instructional order of content, enabling progression, is one area that would improve substantially. For example, a common instructional strategy is scaffolding and fading, which refer to the gradual withdrawal of learner support over time so that a learner who initially needs the support to perform the task eventually learns how to perform the task without support (Vygotsky 1978). Scaffolded tasks possess support structures that are associated with decreased complexity; a scaffolded task might ask the learner to consider the effects of just one variable (as opposed to multiple), or require the learner to only consider his or her actions (as opposed to coordinating with other actors). Measuring these types of complexity therefore would facilitate the ordering of tasks such that relatively simple scaffolded tasks can gradually give way to relatively complex unscaffolded tasks.

For fine-grained recommendations, tasks could be categorized not just as complex in a general sense, but complex in particular competencies (and not other competencies). For example, some concepts in physics are complex phenomena to understand, but the underlying mathematics are actually quite simple (e.g., do not require more than basic algebra). This would enable recommending different versions of content that contain support only for the learning objectives that are not familiar to an individual learner.

The inherent dependencies and interactive nature of complex content inform recommendations about content presentation strategies as well, such as presenting new information in various and full contexts (Feltovich et al. 1988). Learners must be provided with many different contexts for information so that they do not overgeneralize from incidental features or happenstance. Relatedly, content should not be oversimplified – not only is overgeneralization a risk with oversimplified content, but learners can also create false senses of security in how well they grasp information (Hoffman et al. 2013), leading to overconfidence that can hinder learning in the future. When new information is presented, it will often affect entire mental models that learners have constructed, due to the dependencies and interactive nature of complex content (Klein and Baxter 2006). Therefore, for complex learning content, it is of even more importance for all foundational information to be presented early on (even at the risk of overwhelming the learner initially), and for new information to not disrupt that foundation. Measuring the complexity of learning content enables informed recommendations regarding both what learning content to present and how to present it.

5 Operationalizing an Example Complexity Measure to Enable Learning Content Generation

With complexity as a measured characteristic of learning content, automation for generating content will become more viable. It will be possible, for example, to generate several versions of content covering the same concepts with varying levels of complexity.

An example of a combined complexity measure was demonstrated in a system for generating variants on a training scenario. A sophisticated training scenario offers several examples of the combined complexity measure as applied to different aspects of the scenario learning content. As a result of operationalizing the combined complexity measure in this training scenario, much more content can be generated and labeled in order to improve automated recommendation and support instructors who use the training.

For the purposes of this example, training was selected that targets U.S. Army small-unit employment of unmanned air systems (UASs). At the Army Squad or Platoon level, infantry units operate hand-launched UASs with a wingspan approximately 1–2 m and a flight time of 1–2 h. A small UAS is useful for reconnaissance and surveillance tasks within the immediate area of operations. However, a key need is for small unit leaders to understand proper utilization of their UAS assets. These learners need training to plan, prepare, and execute UAS missions employing proper tactics, following required procedures, and coordinating with other units.

Training was created for small unit leaders consisting of initial and final assessments, introductory and remedial text documents, and two adaptive training scenarios. Out of this content, the scenarios are the focus of the research and development.

The UAS training scenarios use instructional principles that are relevant to many typical automated and instructor-led training settings. Learners’ decisions can trigger immediate feedback, change scenario events, and possibly end a scenario prematurely (followed by remediation and a later attempt). The training can be delivered through the Army’s Generalized Intelligent Framework for Tutoring (GIFT), a computer system that helps adapt training in a way that can be reused in different instructional domains and is not specific to any one training system (Sottilare et al. 2012; Sottilare et al. 2017).

Some barriers to adaptive training exist in the operational setting. First, learners can have a wide range of learning needs at the start of training. Some may be more expert while others are novices. Furthermore, the training contains 48 learning objectives as defined by subject matter experts (SMEs). Based on their past experience, learners sometimes need support only in some learning objectives while having previously mastered other learning objectives. Finally, learners need to progress through training at different rates rather than wasting time on content that is too basic or moving into advanced content when unprepared. To help address these barriers, complexity measures can help describe content in a way that enables recommendation algorithms to predict how scenarios will combine support or challenge for different learning objectives and pick the best match to learner needs.

Whether delivered through GIFT or under instructor control, adaptive training can only recommend the content that most nearly fits learner needs within the range of learning content that exists. When choices are too few, adaptation may not find a perfect fit or might settle for a suboptimal choice. Furthermore, learners can memorize scenarios that are delivered more than once, or can share information to increase performance while actually avoiding a deep understanding of the target material. To increase the range of scenarios available to choose from, content generation tasks should be automated. The complexity measures enable content generation algorithms that produce varied scenarios because they help the algorithm determine how each variation fits into the library of all existing content, and whether it is similar to other content or offers a new combination of support and challenge.

Recommendation in GIFT can occur between scenarios or, for quick response to learner needs, during a scenario in response to learner performance. Detailed descriptions of the technical basis that lets GIFT deliver adaptive training in any instructional domain are available in Sottilare et al. (2013). A key concept is that GIFT encodes instructional strategies which inform the selection of instructional tactics. Instructional strategies are general and work across instructional domains, such as choosing to support or challenge a particular learning objective. Instructional tactics are specific to an instructional domain, such as presenting a unit in a different location for the UAS training domain.

In parallel, the content generation example aims to give GIFT scenarios that offer every combination of support or challenge across learning objectives. When more combinations are available, GIFT is better able to select and execute its instructional strategy. The manner in which each generated scenario delivers support or challenge is defined in domain-specific rules. These rules capture expert knowledge about what makes content more or less complex. They are structured in the same manner as the domain conditions which help GIFT assess learners and choose between instructional tactics.

Multiple past approaches to authoring content variation exist. Some examples are content templates, learner cognitive models or simulated students, and simple enumeration or random changes of content.

Content templates are useful in, for example, the Cognitive Tutor Authoring Tools (Aleven et al. 2006). Templates make it possible for a practitioner to create content with variables which have ranges of possible values chosen to provide similar complexity. For example, a math tutor can generate infinite addition problems. If the number of digits being added changes the complexity of the problems, or the presence of specific addends that require carrying between columns is important, then the templates must expressly contain those limitations. The reasoning behind the range limits is also implicit, not available for computers to reason about in comparing or progressing between content generated from different templates.

Learner cognitive models use a cognitive theory basis to predict how learners will interact with content. One widely used example is GOMS (John and Kieras 1996) and, more recently, SimStudent (MacLellan et al. 2014) which models learning and novice errors as a focus. SimStudent is a principled extension of CTAT and can work with GIFT. An important consideration that applies to the cognitive model approach in general is that models with few factors are limited in predictive power, while sophisticated models can present usability and acceptance challenges to non-technical practitioners. It would seem that a model is needed that captures only factors practitioners want to reason about, and at a functional level rather than a cognitive processes level.

Finally, random or enumerative changes do not provide a way to capture what variation is important to teaching or training. It does not help to produce all possible locations of a hostile unit in a training simulation when most locations happen to be the same in terms of challenge to the learner. Instead, a method is needed to generate the unit in those few locations where complexity is affected. For example, if the unit is located in a few locations with tree cover, then that might increase the scenario’s challenge of one learning objective, locating and surveilling a hostile force.

The example scenario generation method uses domain rules to express as many or few factors as practitioners care to use in describing the complexity of scenarios. Each rule is considered separately and no calculus for adding or multiplying separate factors is needed, although future work may explore such functions. Instead, the combined complexity measure captures emergent behavior by directly describing its observable effects within the scenario. The example of the tree cover illustrates how a small change in location, just out from under a tree, can greatly impact complexity. This method illustrates an example in which experts are empowered to capture exactly those factors or emergent behaviors which they consider important to teaching and training.

Once domain-specific rules are in place, the resulting complexity measures give a straightforward and domain-general way to describe and compare different scenarios. The outputs of the domain-specific rules are domain-general values which the GIFT architecture can reason about because they only express support or challenge values as dimensionless quantities. As a result, GIFT is able in real time to select a particular scenario which might support several learning objectives and challenge several others.

The example complexity measures are useful in two contexts. First, they provide a high-dimensional search space in which to apply a novelty search algorithm. Novelty search is used to quickly generate new content based on the criteria of maximizing novelty or difference from examples that have been seen before (Lehman and Stanley 2011). Crucially, novelty is not defined at any surface level which could be easily enumerated, such as locations of units. Instead novelty is defined as providing different combinations of high, medium, and low complexity in each of the many instructor-defined dimensions (Folsom-Kovarik and Brawner 2018). This ensures that the generated scenarios give GIFT a wide selection of choices that are different in instructionally relevant ways.

A second context in which the example complexity measures play a role is in describing learning content to teachers, trainers, and instructional designers. User interfaces are being designed which summarize the complexity measures and allow drill-down, content previews, and paradata or usage pattern collection. These features are hypothesized to enable practitioners to search and select from thousands of training scenarios based on criteria they find important for the current learning need. Another key feature of practitioner interaction is requesting new content when the available scenarios do not meet the need. The complexity measures give an easy way to specify what gap the new variants should fill.

A final consideration for future work is the most useful and usable manner in which practitioners can change or control the complexity scores assigned to content. It seems likely that letting instructors assign complexity scores directly will introduce challenges with generalizing from individual inputs, and with justifying changed values to other instructors in a shared system. Therefore, a proposed direction for future work is to allow non-technical teachers, trainers, and instructional designers to edit and select the rules that assign complexity scores. Currently, GIFT offers a capability to edit domain-specific rules for learner assessment. One important difference is that assessments have binary outcomes (pass or fail), while currently the complexity rules are continuous valued functions which are likely to be less user-friendly. One approach to address this concern would be to present rules in the authoring interface via low, medium, and high thresholds which might be easier to define correctly.

In future work, a data collection opportunity is upcoming to help determine which presentations of complexity align best with expert instructors’ needs. The study will explore to what extent, and under what conditions, the instructors accept complexity as a tool to help differentiate and search among many training scenarios.

6 Conclusions

Adding complexity as a measure to describe learning content will enable more detailed comparison, recommendation, and generation by making computers better able to communicate and reason about an important kind of difference and differentiator in learning content. Expert teachers, trainers, and instructional designers may be able to use a complexity measure to both express their understanding of learning content in more depth and also remove unimportant surface details from consideration. The complexity measure is being explored in a real-world training domain and has enabled computer generation of many training scenarios for fine-grained recommendation to meet learners’ individual needs.

Immediate next steps will include validating usefulness and usability of the complexity measure in the context of a scenario generation example as presented to expert instructors in the given domain. This study will help to identify effective ways to present instructors with learning content options and to compare the options using their complexity. The research question in focus for this study is to identify the manner and conditions under which teachers and trainers accept and prefer to use complexity in describing or choosing learning content.

Another key research question which must also be answered in relation to the combined complexity measure is agreement between teachers and trainers about complexity. Agreement is likely to be improved using domain-specific rules such as those described. One statistic to help answer the question of practitioner agreement will be capturing statistics such as inter-rater reliability.

In addition, the ability to measure complexity provides opportunities for future work in improving training methods.

Are there differential effects of various complexity dimensions on training effectiveness? For example, are numbers of cues and distractors linearly additive or do they themselves interact in more interesting ways? Are there multiple interaction functions, which might change how measures interact depending on expertise or other context? Measuring combined complexity is an important first step, but knowing each complexity dimension’s effects on training would enable learning content descriptions that are more effective without requiring additional effort from instructors.

When faced with a given contributing factor or dimension of complexity, do learners tend to use certain heuristics or biases, or are there certain types of mistakes that are common? If so, then measuring complexity and particular dimensions of complexity enables recommendations of learning content that can combat the mistakes commonly associated with high measures in a certain dimension of complexity.

Initial learning content generally needs to be presented for novices in a manner that is not overwhelming. However, oversimplification in training can create misconceptions and hinder learning transfer in real-world environments. Can a task be simplified sufficiently for novices while maintaining its essential complexities? A hypothesis from Feltovich et al. (1988) is that novices who are exposed to full complexity at the start might not achieve as much right away and might be less satisfied early on in the learning process, but might also possess greater “horizons of understanding.” This hypothesis would be consistent with the “desirable difficulty” hypothesis, which states that there exists an optimal amount of initial learning difficulty that produces long-term gains (Bjork 2013).

In conclusion, the measure of complexity that reflects how practitioners think about learning content offers a valuable perspective and an opportunity to improve how computers can support teachers, trainers, and instructional designers to maximize learning.

References

ADL Net - SCORM 2004, 4th edn. https://adlnet.gov/adl-research/scorm/scorm-2004-4th-edition/. Accessed 1 Jan 2018
ADL Net - xAPI. https://www.adlnet.gov/xapi/. Accessed 1 Jan 2018
Aleven, V., McLaren, B.M., Sewall, J., Koedinger, K.R.: The cognitive tutor authoring tools (CTAT): preliminary evaluation of efficiency gains. In: Ikeda, M., Ashley, K.D., Chan, T.-W. (eds.) ITS 2006. LNCS, vol. 4053, pp. 61–70. Springer, Heidelberg (2006). https://doi.org/10.1007/11774303_7
Chapter Google Scholar
Bjork, R.A.: Desirable difficulties perspective on learning. In: Pashler, H. (ed.) Encyclopedia of the Mind. Sage Reference, Thousand Oaks (2013)
Google Scholar
Bloom, B.S.: The 2 sigma problem: the search for methods of group instruction as effective as one-to-one tutoring. Educ. Res. 13(6), 4–16 (1984)
Article Google Scholar
Bloom, B.S., Engelhart, M.D., Furst, E.J., Hill, W.H., Krathwohl, D.R.: Taxonomy of Educational Objectives: The Classification of Educational Goals. Handbook I: Cognitive Domain. David McKay Company, New York (1956)
Google Scholar
Bochner, S.: Defining intolerance of ambiguity. Psychol. Rec. 15(3), 393–400 (1965)
Article Google Scholar
Bonanno, G.A.: Loss, trauma, and human resilience: have we underestimated the human capacity to thrive after extremely aversive events? Am. Psychol. 59(1), 20 (2004)
Article Google Scholar
Boulton, L.: Adaptive flexibility: Examining the role of expertise in the decision making of authorized firearms officers during armed confrontation. J. Cogn. Eng. Decis. Mak. 10(3), 291–308 (2016)
Article Google Scholar
Branlat, M., Morison, A., Woods, D.D.: Challenges in managing uncertainty during cyber events: lessons from the staged-world study of a large-scale adversarial cyber security exercise. In: Human Systems Integration Symposium, Vienna, VA, 25–27 October 2011
Google Scholar
Brown, J.S., Collins, A., Duguid, P.: Situated cognition and the culture of learning. Educ. Res. 18(1), 32–42 (1989)
Article Google Scholar
Department of the Army: Leader’s Guide to Objective Assessment of Training Proficiency (2017)
Google Scholar
Dunne, R.: Objectively Defining Scenario Complexity: Towards Automated, Adaptive Scenario-Based Training. (Unpublished doctoral dissertation). University of Central Florida, Orlando, FL (2014)
Google Scholar
Dunne, R., Cooley, T., Gordon, S.: Proficiency evaluation and cost-avoidance proof of concept M1A1 study results. Paper Presented at the Interservice/Industry Training, Simulation & Education Conference (I/ITSEC), Orlando, FL (2014)
Google Scholar
Elshout, J.J.: Problem solving and education. Paper Presented at the First Conference of the European Association for Research on Learning and Instruction, Leuven, Belgium (1985)
Google Scholar
Feltovich, P.J., Spiro, P.J., Coulson, R.L.: The nature of conceptual understanding in biomedicine: The deep structure of complex ideas and the development of misconceptions (Technical report No. 440). University of Illinois at Urbana-Champaign: Center for the Study of Reading (1988)
Google Scholar
Feltovich, P.J., Spiro, R.J., Coulson, R.L.: Learning teaching, and testing for complex conceptual understanding. In: Frederiksen, N., Mislevey, R.J., Bejar, I.J. (eds.) Test theory for a new generation of tests, pp. 181–218. Routledge, Hillsdale, NJ (2012)
Google Scholar
Folsom-Kovarik, J.T., Boyce, M.W., Thomson, R.H.: Perceptual-cognitive training improves cross-cultural communication in a cadet population. Paper Presented at the 6th Annual GIFT Users Symposium, Orlando, FL (2018)
Google Scholar
Folsom-Kovarik, J.T., Brawner, K.: Automating variation in training content for domain-general pedagogical tailoring. Paper Presented at the 6th Annual GIFT Users Symposium, Orlando, FL (2018)
Google Scholar
Graesser, A.C., VanLehn, K., Rose, C.P., Jordan, P., Harter, D.: Intelligent tutoring systems with conversational dialogue. AI Mag. 22(4), 39–41 (2001)
Google Scholar
Gulati, R., Wohlgezogen, F., Zhelyazkov, P.: The two facets of collaboration: cooperation and coordination in strategic alliances. Acad. Manag. Ann. 6(1), 531–583 (2012)
Article Google Scholar
Hoffman, R.R., Ward, P., Feltovich, P.J., DiBello, L., Fiore, S.M., Andrews, D.H.: Accelerated Expertise: Training for High Proficiency in a Complex World. Psychology Press, New York (2013)
Google Scholar
IEEE 1484.12.1-2002 - IEEE Standard for Learning Object Metadata. https://standards.ieee.org/findstds/standard/1484.12.1-2002.html. Accessed 1 Jan 2018
John, B.E., Kieras, D.E.: The GOMS family of user interface analysis techniques: comparison and contrast. ACM Trans. Comput. Hum. Interact. (TOCHI) 3(4), 320–351 (1996)
Article Google Scholar
Klein, G., Baxter, H.C.: Cognitive transformation theory: contrasting cognitive and behavioral learning. In: Interservice/Industry Training Systems and Education Conference, Orlando, Florida, December 2006
Google Scholar
Lehman, J., Stanley, K.O.: Novelty search and the problem with objectives. In: Riolo, R., Vladislavleva, E., Moore, J. (eds.) Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, pp. 37–56. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1770-5_3
Chapter Google Scholar
MacLellan, C.J., Koedinger, K.R., Matsuda, N.: Authoring tutors with SimStudent: an evaluation of efficiency and model quality. Paper Presented at the 12th International Conference on Intelligent Tutoring Systems, Honolulu, HI (2014)
Chapter Google Scholar
Paries, J.: Complexity, emergence, resilience. In: Hollnagel, E., Woods, D.D., Leveson, N. (eds.) Resilience Engineering: Concepts and Precepts, pp. 43–53. Ashgate, Burlington (2006)
Google Scholar
Piaget, J.: Origins of Intelligence in the Child. Routledge & Kegan Paul, London (1936)
Google Scholar
Piaget, J.: The Psychology of the Child. Basic Books, New York (1972)
Google Scholar
Raybourn, E.M., Schatz, S., Vogel-Walcutt, J., Vierling, K.: At the tipping point: learning science and technology as key strategic enablers for the future of defense and security. Paper Presented at the Interservice/Industry Training Simulation and Education Conference (I/ITSEC), Orlando, FL (2017)
Google Scholar
Schatz, S., Bartlett, K., Burley, N., Dixon, D., Knarr, K., Gannon, K.: Making good instructors great: USMC cognitive readiness and instructor professionalization initiatives (Technical report No. 12185). Marine Corps Training and Education Command, Quantico, VA (2012)
Google Scholar
Snow, R.E.: Aptitude-Treatment Interaction as a framework for research on individual differences in learning. In: Ackerman, P., Sternberg, R.J., Glaser, R. (eds.) Learning and Individual Differences. W.H. Freeman, New York (1989)
Google Scholar
Sottilare, R.A., Brawner, K.W., Goldberg, B.S., Holden, H.K.: The generalized intelligent framework for tutoring (GIFT). US Army Research Laboratory Human Research & Engineering Directorate, Orlando, FL (2012)
Google Scholar
Sottilare, R.A., Brawner, K.W., Sinatra, A.M., Johnston, J.H.: An updated concept for a Generalized Intelligent Framework for Tutoring (GIFT). US Army Research Laboratory, Orlando, FL (2017)
Google Scholar
Sottilare, R., Graesser, A., Hu, X., Holden, H. (eds.): Design Recommendations for Intelligent Tutoring Systems. Learner Modeling, vol. 1. U.S. Army Research Laboratory, Orlando, FL (2013). ISBN 978-0-9893923-0-3. https://gifttutoring.org/documents/42
Vygotsky, L.S.: Mind and Society: The Development of Higher Psychological Processes. Harvard University Press, Cambridge (1978)
Google Scholar
Wray, R.E., Lane, H.C., Stensrud, B., Core, M., Hamel, L., Forbell, E.: Pedagogical experience manipulation for cultural learning. Paper Presented at the Conference on Artificial Intelligence in Education, Brighton, England (2009)
Google Scholar
Wray, R.E., Woods, A.: A cognitive systems approach to tailoring learner practice. Paper Presented at the 2nd Advances in Cognitive Systems Conference, Baltimore, MD (2013)
Google Scholar
Wulfeck, W.H., Wetzel-Smith, S.K., Dickieson, J.L.: Interactive multisensor analysis training. Paper Presented at the RTO HFM Symposium on Advanced Technologies for Military Training. Space and Naval Warfare Systems Center, Genoa, Italy (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Soar Technology, Inc., Orlando, USA
Jeremiah T. Folsom-Kovarik, Dar-Wei Chen & Behrooz Mostafavi
U.S. Army Combat Capabilities Development Command, Orlando, USA
Keith Brawner

Authors

Jeremiah T. Folsom-Kovarik
View author publications
You can also search for this author in PubMed Google Scholar
Dar-Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Behrooz Mostafavi
View author publications
You can also search for this author in PubMed Google Scholar
Keith Brawner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeremiah T. Folsom-Kovarik .

Editor information

Editors and Affiliations

Soar Technology, Inc.,, Orlando, FL, USA
Robert A. Sottilare
Fraunhofer FKIE, Wachtberg, Germany
Jessica Schwarz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Folsom-Kovarik, J.T., Chen, DW., Mostafavi, B., Brawner, K. (2019). Measuring the Complexity of Learning Content to Enable Automated Comparison, Recommendation, and Generation. In: Sottilare, R., Schwarz, J. (eds) Adaptive Instructional Systems. HCII 2019. Lecture Notes in Computer Science(), vol 11597. Springer, Cham. https://doi.org/10.1007/978-3-030-22341-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-22341-0_16
Published: 14 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22340-3
Online ISBN: 978-3-030-22341-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Measuring the Complexity of Learning Content to Enable Automated Comparison, Recommendation, and Generation

Abstract

Similar content being viewed by others

Complexity and Difficulty of Items in Learning Systems

A Cognitive Load Theory Approach to Defining and Measuring Task Complexity Through Element Interactivity

Not New, but Nearly Forgotten: the Testing Effect Decreases or even Disappears as the Complexity of Learning Materials Increases

Keywords

1 Introduction

2 Reductionist Definitions of Complexity Offer Quantifiable Descriptions of Learning Content

3 A Combined Measure of Complexity Reflects Expert Practitioner Understanding

4 Measuring Complexity Improves Comparison and Recommendation of Learning Content

5 Operationalizing an Example Complexity Measure to Enable Learning Content Generation

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Measuring the Complexity of Learning Content to Enable Automated Comparison, Recommendation, and Generation

Abstract

Similar content being viewed by others

Complexity and Difficulty of Items in Learning Systems

A Cognitive Load Theory Approach to Defining and Measuring Task Complexity Through Element Interactivity

Not New, but Nearly Forgotten: the Testing Effect Decreases or even Disappears as the Complexity of Learning Materials Increases

Keywords

1 Introduction

2 Reductionist Definitions of Complexity Offer Quantifiable Descriptions of Learning Content

3 A Combined Measure of Complexity Reflects Expert Practitioner Understanding

4 Measuring Complexity Improves Comparison and Recommendation of Learning Content

5 Operationalizing an Example Complexity Measure to Enable Learning Content Generation

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation