Can Multimedia Tools Promote Big Data Learning and Knowledge in a Diverse Undergraduate Student Population ?

Background and Purpose: Multimedia tools are an integral part of teaching and learning in today’s technology-driven world. The present study explored the role of a newly-developed video introducing the emerging field of big data to a diverse undergraduate student population. Particularly, we investigated whether introduction of a multimedia tool would influence students’ self-perceived knowledge related to various big data concepts and future interest in pursuing the field, and what factors influence these. Methods: Students (n = 331) completed a survey on-line after viewing the video, consisting of Likerttype and quantitative questions about students’ learning experience, future interest in big data, and background. The dataset was analyzed via ANOVA and multiple linear regression methods. Results: Gender, major, and intended degree were significantly associated with students’ learning experience and future interest in big data. Moreover, students who had no prior exposure to big data reported a better learning experience, although they also reported less likelihood to pursue it in the future. Conclusion: Multimedia tools may serve as an effective learning tool in introducing and creating interest in a diverse group of students related to introductory big data science concepts. Both similarities and differences were observed regarding such behaviors among different student sub-groups. © 2018 Californian Journal of Health Promotion. All rights reserved.


Introduction
Big data refers to data sets that are so large and complex that they cannot be processed and analyzed by traditional applications and software (Holmes, 2017).As technology continues to evolve and grow at a rapid pace today, driven mainly by the development of smart devices, sensors, and cloud computing, more and more data collected from various sources in different industries including healthcare, education, and finance, are becoming accessible to researchers in an unprecedented way.The huge social and academic impact of such developments caused a worldwide buzz for "big data" as new technologies slowly started to emerge that are able to store, process and analyze such data.The concept of big data, commonly characterized by volume (amount of data), variety (diverse and complex nature of variables included in the data such as text, images, videos), velocity (the speed or rate at which the data become available), and veracity (how much noise and uncertainty are there in the data), goes far beyond traditional data types, as well as statistical analyses using common descriptive and inferential methods.According to an article in Forbes magazine (Marr, 2015), by the end of 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.This is equivalent to about 11-12 standard digital photo files, thus showing the enormity of the data explosion happening in today's society.
Driven by the need to improve healthcare delivery and cost in efforts to improve health, these massive amounts of data hold the promise of supporting a wide range of medical and healthcare needs/areas related to big data science and/or big data biomedicine, such as, clinical decision support, disease surveillance, neuroimaging/brain health, patient data in electronic health records (EHRs), insurance claims data, and population health management (Dembosky, 2012;Huang et al, 2015).Given this widespread use and need, it is of utmost importance to train students of today in this fastevolving area of the applications of big data in biomedicine.Although concepts of statistical and analytical methods are included in the curriculum of most college degree programs in the sciences today, not many students have a good understanding of what big data or data science is or how different they are from traditional data that are typically utilized in their courses and projects and what the underlying issues and challenges are in handling and analyzing such data (Frost & Sullivan, 2012).
Multimedia is defined as content that uses a combination of audio and video representations such as text, audio, images, animations, video, and interactive media (Liaskos & Diomidus, 2002).Multimedia tools play a big role in today's higher education as technology is increasingly being integrated in college courses in the form of primarily videos and interactive activities like games or animations in order to foster more active learning.According to a report published in Ed Tech (Ed Tech, 2016;Ilan & Oruc, 2016), some benefits of multimedia learning include deeper understanding, improved problem-solving skills, increased positive emotions, and access to a vast variety of information.It provides rich opportunities to augment current educational techniques and creates content that are attractive and engaging for students.Multimedia is used on a daily basis by the current generation of learners, like millennials, today, thus making it indispensable and in keeping with the progression of the world today (Tang, 2014).Moreover, psychologists have shown that information obtained through visualization is retained 83% of the time compared to only 11% retained through hearing and 5% through other senses (Yang, 2010).Similar experiments have also established that people can remember 50% of content presented via audio and images combined (as in a video) whereas only 10% is remembered via reading and 20-30% via only audio or video.Chen and Xie (2011) states that the use of videos and interactive media allow students to apply the knowledge learned and stimulates creativity, enthusiasm and involvement.Therefore, we explored the role of a multimedia tool, namely a video that incorporates audio, images and text, in educating undergraduate students on introductory big data science concepts, including defining big data, identifying big data, challenges and applications of big data, as well as potential opportunities in big data/data science.Given that big data is a newly emerging field and its significance in today's world where data are generated in massive amounts, there is a need to educate students about the topic and its relevance and applications in different fields.Therefore, we sought to test whether the implementation of a short video during class increases perceived data science knowledge for our diverse undergraduate students and other sub-groups.Further, other sub-aims included investigating whether familiarity with big data prior to watching the video had any effect on students' learning experience with the video, as well as on their subsequent interest in exploring this field in the future, whether to just learn more using additional resources or even pursue career options.

Setting
A video was developed and shown in introductory statistics, mathematics and epidemiology courses.After watching the video, students in those classes filled out an online survey questionnaire about their perceived learning/educational experience related to the video.The questionnaire consisted of both Likert-type and quantitative questions regarding the students' learning experience, future interest in pursuing the topic, as well as questions about their academic and demographic background and prior exposure to the topic.The study protocol review was conducted and approved by the Internal Review Board (IRB) of the California State University, Fullerton (HSR# 16-0221).

Participants and Inclusion/Exclusion Criteria
Participants in the study included undergraduate students enrolled in introductory statistics, mathematics and epidemiology courses from across different departments such as Mathematics, Biology, Geology, Communications, Business, Health Science, and Computer Science.Although video viewing in these courses may have constituted a convenience sample (not probability-based), it served the purpose of our study as our aim was to educate students in these disciplines about big data, because they are and will be most likely exposed to such data in their respective fields.Students were from diverse demographic backgrounds, and were at least 18 years old.No willing student/participant was excluded from viewing the video and providing responses.
Approximately 800 students were enrolled in the classes that were invited to participate in the study.Of these students, 331 responses were received, out of which 21 participants were excluded because they did not answer at least half of the questions on the survey.This produced an analytic sample size of n = 310 for our study.The participation rate was not calculated because it was unknown to the researchers as to how many instructors actually showed the video in their classes.
A majority were female students (63.6%).With regards to ethnicity, Hispanics (32.6%) and Asians (30.9%) made up the majority of the survey respondents, while 22.4% were Caucasians.The mean age of the students who participated in the survey was 22.13 years (with a standard deviation of 3.6 years).Thus, this sample was a good representation of the overall undergraduate student population in the university which is dominated by underrepresented minority groups (Office of Institutional Research and Analytical Studies, CSUF).
As for academic background, nearly 54% were Juniors, 25% Seniors, and 20% Sophomores.Moreover, 36% of student participants wanted to pursue a Master's degree, and 27% aspired to earn a doctoral or other advanced professional degree.As to the students' intended or declared majors, majority of them were from Health Science (48%), followed by Biological Science (12.3%) and Mathematics (6%).Moreover, there were students from other disciplines such as, Computer Science, Business, Communications, and Geology.

Multimedia Tool Development and Content
The multimedia production predominantly focused on developing the video: Big Data Science: An Introductory Tutorial.The making of the video involved a comprehensive process, beginning with collaborative discussions between CSUF faculty, students and staff, and an external video production team.Through an iterative process, the production team and the program director outlined the final video structure, highlighted important themes, articulated specific educational goals, crafted an interview questionnaire for the faculty/experts who participated in the video, began to create a production schedule, and completed interviews with faculty/experts in their respective big data science fields about their own experiences and specific issues and challenges in working with such data in their fields.A video treatment, taking into consideration the feedback of all stakeholders was created which was reviewed and revised for any final changes.The final educational product was thus the culmination of a creative and analytical process designed to produce a film/video that would educate and engage students in introductory big data concepts.The video is posted on YouTube and can be accessed via the following link: https://www.youtube.com/watch?v=25z-iALT_KM&t=161s.
The content included the following: Part 1: What is Big Data, including discussion on the advent of Big Data, defining Big Data.Part 2: Big Data Sources: General Sources -Open Sources, and a specific focus on genomics and epidemiologic data types.

Data Collection: Survey Instrument
A survey instrument was developed by the research team in consultation with an external evaluation firm primarily to assess perceived student learning related to introductory big data science concepts after viewing the video.Specifically, the survey queried on perceived understanding/learning of big data, underlying issues and challenges and strategies to handle big data, and also to investigate their interest in pursuing the field further in the future as a result of watching the video.These form the two main constructs of our study.
The construct for measuring students' perceived learning experience via the video consisted of 10 Likert-scale type questions (6-point scale: 6: Strongly agree, 5: Agree, 4: Slightly agree, 3: Slightly disagree, 2: Disagree, 1: Strongly disagree) to assess students' self-perception about the level of learning regarding various aspects of big data, such as: i.
understanding the importance of big data in health science today; ii.
ability to define big data; iii.
understanding what the different elements of big data are -the 4 V's; iv.
understanding the challenges posed by big data environments, and some potential ways to address them; v.
understanding the various sources of big data in different fields, such as, health, biomedicine, mathematics and business; vi.
understanding the applications of big data in health and biomedical sciences; vii.
understanding research on big data in health and biomedical sciences; viii.
understanding how to manage big data in the context of health and biomedical sciences; ix.
understanding some statistical tools necessary to visualize, summarize and analyze big data in the context of health and biomedical sciences; and x.
overall learning experience.
The survey also queried students about their level of familiarity with big data before watching the video using a 4-point Likert scale (1: Very familiar, 2: Somewhat familiar, 3: Not very familiar, 4: Not at all familiar).Moreover, it sought information about the students' demographic and academic background like age, gender, ethnicity/race, major, and class status.
Finally, the survey concluded by asking students regarding their interest in exploring additional materials (e.g., courses and tutorials) and career pathways in the field of big data applications in health and biomedical sciences.This construct was measured with the help of a similar 6-point Likert scale (6: Strongly agree, 5: Agree, 4: Slightly agree, 3: Slightly disagree, 2: Disagree, 1: Strongly disagree) that was used for the learning experience construct.

Data Analyses
Prior to our data analyses, we ran Cronbach's alpha (Cronbach, 1952) to test the reliability of the scales used to measure students' learning experience and their interest in pursuing the field of big data further in the future, both of which consisted of multiple item sets.The values obtained were respectively 0.936 and 0.871, hence demonstrating a high level of internal consistency and reliability.
Apart from initial descriptive statistics to summarize the demographic and academic composition of our sample, we used statistical inference methods: analysis of variance (ANOVA) and multiple linear regression to test the hypotheses.To study students' learning experience about big data after watching the video, we created two outcome variables: (1) overall learning experience rating, and (2) average rating for the first nine items on the Likert scale.The rationale for using the combined measure instead of nine separate outcome variables was that our preliminary analyses revealed that there was not much difference in the ratings for the nine individual questions with a significant amount of inter-item correlation.
ANOVA was performed to investigate if differences existed in the outcome variables representing students' learning experience by background academic and demographic variables namely, gender, ethnicity, major, class level, and intended degree (all categorical variables).ANOVA was also performed to examine the differences in prior familiarity with big data and interest in pursuing the big data field (both for exploring additional resources and career pathways), also by academic and demographic variables.
The overall learning experience construct was chosen as the dependent or outcome variable for the multiple regression model to study the effect of prior familiarity with the topic, along with adjusting for the independent variables, including age (continuous variable), gender, ethnicity (categories used: Hispanics, Whites, Asians, and "other" as the reference category), major (categories used: Mathematics, Health science, Biology, Chemistry/Biochemistry, and "other" as the reference category), class level (categories used: sophomore, junior, senior, and freshman as the reference category), and intended degree (categories used: Master's, PhD and other professional, and Bachelor's as the reference category).
To study the relationship between prior big data familiarity and interest in pursuing the big data field in the future, the dependent or the outcome variable was formed by the mean of the responses to two survey questions -(i) level of interest in exploring additional materials and resources and (ii) level of interest in exploring career pathways in the area.Prior familiarity was included as the dependent variable in the model while adjusting for the same covariates with same categories (including reference) as those used in the case of the studying students' perceived learning experience.
Learning experience from the video can also be a driver of students' interest in future pursuance of the field; however, both variables denoting learning experience and prior familiarity were not included in a regression model together because of the significant association between the two as evident from the correlation analysis (r = 0.577, with p < 0.05*), introducing multicollinearity in the model.
All the analyses were performed using the statistical software package IBM-SPSS Version 24 on a Macintosh computer (IBM SPSS website).For all the analyses, 0.05 was chosen as the criterion for statistical significance based on p-values, and findings with p-values between .051 and 0.10 were discussed as demonstrating borderline significant patterns as well.

Learning Experience
The mean overall rating for students' perceived learning experience was 5.74 (out of a 6-point Likert scale) with a standard deviation of 1.19 (n = 290), showing that students generally agreed that watching the video was a valuable experience in relation to big data science.The average score for the ten individual items included in the survey instrument was 5.50 (also on a 6-point Likert scale) with a standard deviation of 1.005, also showing a positive learning experience of students about all the aspects of big data covered in the video.Further, the low standard deviation indicated consistency of the students' ratings regarding understanding of different aspects of big data and its applications.
Table 1 shows the primary descriptive statistics (frequencies, mean  standard deviation) of the two outcome variables representing students' perceived learning experience (overall rating and average of the ratings for the 9 individual items) by gender, ethnicity, major, and class level.Main findings related to the different student sub-groups can be summarized as follows: (i) Females reported a statistically significantly (p = 0.052) greater overall perceived learning experience than males.Students intending to pursue a PhD or other higher professional degree reported borderline significant higher mean ratings for both outcome variables mentioned above (p values of 0.083 and 0.077 respectively), the lowest mean ratings being for those pursuing a Bachelor's degree alone.No significant differences were observed with respect to ethnicity, class level, and major.This showed that the effectiveness of the video was uniform across student sub-groups defined by these background factors.Male students were significantly (p = 0.047) more familiar with big data prior to watching the big data video.Similarly, familiarity with big data varied significantly across students' major field of study (p = 0.041).Psychology students were the least familiar with this emerging field while chemistry and biology students were the most familiar.A negative correlation with age (continuous variable) indicated that younger students were less familiar with the field prior to watching the video in their classes.
Table 3 shows that prior familiarity had a statistically significant relationship (p = 0.003) on the overall perceived learning experience of students watching the video, while controlling for the background demographic and academic factors.The positive coefficient showed that students who were less familiar (recall that higher values of this variable indicated less familiarity) with big data before viewing the video reported having a better learning experience than those who were more familiar.In addition, intended degree and gender had a significant effect on the overall perceived learning experience of students, which corroborates our earlier ANOVA results included in Table 1.

Interest in Pursuing the Field of Big Data
Science in the Future Data on factors influencing students' interest in pursuing the field of big data, particularly whether prior familiarity with the topic and a positive learning experience via the video contributed to pursuing big data are shown in Table 4. Thirty-five percent of the student participants reported that their interest in the field of big data increased after watching the video.The overall mean rating for their interest in exploring additional resources and materials on the topic was 4.42 (standard deviation = 1.645) and the mean rating for their interest in exploring career pathways in the field was 4.25 (standard deviation = 1.729).Both these variables were measured using a 6-point Likert scale as described earlier, and hence these mean values show overall positive responses from students.Table 4 shows the descriptive statistics for these two constructs by the different background demographic variables.
Results indicated that statistically significant differences existed among students with respect to pursuing the field of big data in the future by class level (p = 0.001), ethnicity (p = 0.004) and major (p = 0.08).Asians, Hispanics, and multiethnic groups had the highest interest in pursuing both additional materials and career pathways in this newly emerging area, while African Americans, American Indians/Alaskan Natives and other ethnic groups expressed least interest.Students with Chemistry, Biochemistry and Mathematics majors showed overall high interest in exploring this field further upon watching the video, whereas students with Psychology and Health Science majors showed relatively lower interest.Freshmen and Seniors showed overall higher interest in pursuing the field of big data than Juniors and Seniors.Men were found to have a slightly higher level of interest in pursuing the topic of big data in the future than women, although this difference was not statistically significant (p = 0.87).
Table 5 shows the multiple regression model to study the effect of prior familiarity with big data on pursuing the field further by exploring additional courses or career pathways.The results clearly indicated that prior familiarity had a statistically significant effect on overall learning experience (p = 0.001), while adjusting for other factors.The negative coefficient showed that students who were more familiar with big data before seeing the video reported higher interest in exploring the field of big data in the future than those who were more familiar.This also establishes the utility of this tutorial video as a motivating factor for students to consider a new emerging field for future studies or careers.In addition, ethnicity, class level, major, and intended degree had a significant effect on students' overall interest to explore the big data field further.

Discussion
Our findings revealed that students perceived the video as an overall very good experience and viewed it as a valuable learning tool about big data, as is evident from the mean rating of 5.74 (out of 6-point Likert scale) from over 300 students.Female students reported a significantly better learning experience, and so did students who intended to pursue higher graduate degrees in the future.Moreover, students who had little or no familiarity with big data prior to watching the video (for instance, did not even know the word or had heard the term only without knowing what it meant and entailed) mentioned having a better learning experience than those students who had some familiarity with the topic.This strongly established the utility of this introductory video tutorial as a learning tool for first-time exposure to this important topic in today's world for a diverse group of undergraduate students.
Because females are less likely to enter STEM fields, we showed that potentially introducing data science programs to females in upper division courses, and those interested in higher education, may provide additional educational benefits for these students.Further, the video is beneficial overall to students with little familiarity, and therefore introducing the video in lower division, introductory Statistics, Computer Science and Health Science courses may promote learning for this group.Effect sizes computed using eta squared values (Richardson, 2011) showed moderate to high significance of the results (values > 0.45).
Results also showed that very few students (less than thirty percent) were familiar with the concept of big data science before watching the video beyond just knowing the term.Male students and students with Biology and Chemistry majors were more familiar with the concept of big data in terms of having some idea what it meant than others who had either not heard the term at all or just heard the term without any additional knowledge about the topic.The latter seems reasonable because those fields often give rise to large and complex datasets that students may have had exposure to in some of their previous courses.Another positive outcome of our study was that a large proportion of students who watched the video (around 65%) expressed a high level of interest in either exploring additional courses and tutorials in the area of big data or even exploring career pathways in that field, particularly those belonging to underrepresented student groups, such as females, and ethnic minority groups like Hispanics and Asians.Thus introducing the video in introductory science courses may help motivate females and underrepresented minority ethnic groups to pursue a career in big data, a concept that is unfamiliar to most prior to the video, thus narrowing the gap in the STEM workforce.Just as with the learning experience, effect sizes also showed moderate to high significance of these results (values > 0.35).
Big data is a fast-growing field, and has widespread applications today in business, healthcare, education, security, among other fields (Mayer-Schonberger & Cukier, 2014).
Although the importance of big data is already understood and concepts are being gradually integrated in college curriculum, a potentially quick and easy way to introduce a topic to a large group of college students is to use multimedia tools that make use of video, audio, animation, etc. in order to create attractive and engaging content.Multimedia has been shown to be a very effective tool for today's learners in different fields like education and science, as demonstrated by recent research studies (Aloraini, 2012, Miller et al., 2011).Babiker (2015) stated that educators must create their own multimedia applications to be effective in higher education.Our current study leveraged the potential benefits of multimedia learning in the form of a video created by faculty to introduce a diverse group of students in an undergraduate institution to the different aspects big data science, including issues and challenges that make big data different from traditional datasets that people are typically familiar with.The research goal was to understand how effective and potentially beneficial the tool was in self-perceived understanding of big data concepts and its application in health and biomedical sciences, a topic that is not studied much in the current literature, hence filling the gap.
A potential future direction of study includes looking at causal connections between student learning and multimedia based on social, affective, and cognitive factors, which has not been researched much yet.Opportunities to develop similar other multimedia tools for big data education will also be explored.

Limitations
The study was limited only to students enrolled in specific courses in Mathematics, Biology and Health Science.So, although we had an adequate sample size and relatively reasonable representation of the student population on our campus, showing the video and obtaining feedback from additional courses on campus from various disciplines might give us more accurate insights about the role of multimedia as a learning tool for a diverse undergraduate student population at our university.Further, we did not have any knowledge of actually how many students viewed the video, we only had data on those that participated in the survey.Thus complete participation rates are unavailable and although this can potentially affect results, we do not anticipate any significant bias because the behaviors and backgrounds of the non-participants are unknown.Although one reason for nonparticipation maybe disinterest in the video and the topic of data science, there could potentially be several other reasons driving a student's decision to not complete the survey such as, busy schedules, technical issues (no Internet connection, say) or they simply forgot.Thus future studies should consider non-participant characteristics.Another limitation of the study is that it assessed only students' self-perceived knowledge to understand the impact of the multimedia tool in learning about big data concepts as the first step; in the future, we plan to investigate actual knowledge gains based on a pre-post type study.Finally, although many students expressed interest in pursuing the field of big data in the future, it is not known exactly how they would do so or whether they would do so at all.So it might be interesting to assess the actual future actions of such students via follow-up surveys using the names and email addresses they had shared on the survey to be contacted in the future; nonetheless, this is beyond the scope of the present study.

Conclusion
We showed that a multimedia educational tool on the newly emerging topic of big data can be a successful tool in disseminating knowledge about big data science and its application in biomedicine among undergraduate students and creating interest for additional exploration, and that these experiences and interests varied by demographic and other factors.This can thus aid in creating a pipeline for more underrepresented students to choose careers in the field of big data science, hence narrowing the gap in the STEM workforce.We also observed that both prior familiarity with big data and overall learning experience from the video had significant effects on students' future interest in the field.However, students with less familiarity with big data prior to watching the video had better overall learning experience, whereas those with a greater level of prior familiarity with big data were more inclined to pursue the area more in future.
Part 3: Introduction to Data Synthesis, including data management, mining and visualization, and summarization.Part 4: Solutions using Big Data.Part 5: Big Data Career Pathways.The entire video was 34 minutes and 53 seconds long.

Table 2
Descriptive Statistics for Familiarity with Big Data Prior to Watching the Video by Students' Demographic and

Table 3
Multiple Regression Output to Study the Effect of Prior

Table 5
Multiple Regression of Students' Interest inPursuing the Field of Big Data in the Future