Research Themes

Data Science Literacy

Data Science Literacy is one of the seed initiatives for the Data Science Institute.  To support innovation at the University of Arizona, the Data Science Institute works to provide expertise and training in Data Science, and to develop avenues for researchers to best leverage advanced cyberinfrastructure resources at local and national levels. The Data Science Institute actively coordinates and communicates training activities in cooperation with the Data Science Resources & Training group.


Applied Focus Areas

The Data Science Institute fills the gap between research software and domain science by working with research teams at the cutting edge of data-driven discovery.  The Data Science Institute targets emerging computational technologies to help a wide array of science.  Our team of data scientists have expertise in these tools and techniques, ensuring that the best people are available to work on cutting edge research problems.  Currently, we are focused on the following areas: 

Natural Language Processing – (NLP) 
Many research areas involve large quantities of text such as transcribed speeches, corpora of works, and internet-scale text (e.g., Tweets). Natural Language Processing enables us to use computational linguistics do a Sentiment Analysis, topic modeling, word embeddings of the text to extract and summarize otherwise difficult concepts such as, concepts, meaning, and intent.

Machine Learning – (ML)
Machine Learning allows researchers to classify data, extract patterns, inferences, and features, and make predictions of future outcomes.  For example, machine learning can be used to identify features in images, develop predictive models of health outcomes, and infer changes in local climate.  

Large-Scale Data Visualization
Complex, multidimensional data often cannot be visualized in a bar chart.  Researchers often require advanced, high performance, and interactive data visualizations to make sense of these datasets.  Large-Scale Visualization (Viz) allows researchers to visualize and interact with multidimensional data without compromising the underlying dimensionality of the dataset. Importantly, it also enables researchers to manipulate the data through zoom-and-filter applications, revealing complex relationships across data types and enabling them to derive meaning of underlying processes.

Image Informatics
Imaging technologies are allowing researchers to generate high-resolution 2D/3D/4D/5D image stacks.  These datasets quickly grow to sizes that are impossible to manually curate.  Image informatics provide a suite of advanced algorithms to extract features, make measurements, and classify patterns among tens of thousands of images.



We solicit one-page white papers describing projects that fit the mission and goals of the Data Science Institute and fall into our focus areas:

  • Machine Learning
  • Natural Language Processing
  • Image Informatics
  • Large-Scale Data Visualization

Projects should combine medium- to large-scale, multidisciplinary research teams with the intention of pursuing external funding in six to 12 months. The purpose should be to produce proof of concept and results that provide competitive advantage when pursuing funding opportunities.