Title: Human-Centric Knowledge Discovery and Decision Optimization
Humans are both producers and consumers of Big Data. Modeling the role of humans in the process of mining Big Data is thus critical for unleashing the vast potential of mined knowledge in various important domains such as health, education, security, and scientific discovery.
The objective of this research is to build a human-centric learning framework, which harnesses the power of human-generated Big Data with computational models. This research project also focuses on incorporating conducted research activities into teaching materials for student training and education in the areas of information retrieval and data mining. In addition, specialized training in data analytics methods and applications is provided to enhance the education of Big Data related techniques for non-STEM and community college students.
This project comprises four synergistic research thrusts. First, joint text and behavior analysis. To exploit as many types of human-generated data as possible and capture the dependencies among them, this project develops a set of novel probabilistic generative models to perform integrative analysis of text and behavior data. Second, task-based online decision optimization. Traditional static, ad-hoc and passive machine-human interactions are inadequate to optimize humans’ dynamic decision making processes. To address this limitation, users’ longitudinal information seeking activities are organized into tasks, where new online learning algorithms are applied to proactively infer users’ intents and adapt the systems for long-term utility optimization.
Third, explainable personalization. Existing personalized systems are black boxes to their users. Users typically have little control over how their information is used to personalize systems. To help ordinary users be aware of how the system’s behavior is customized and increase their trust in such systems, statistical learning algorithms are built to generate both system-oriented and user-oriented explanations. Fourth, system implementation and prototyping. User studies are conducted in a prototype system integrated with all the algorithms developed in this project to evaluate the deployed algorithms. Evaluation and feedback from real users are circulated back to refine the assumptions and design of the developed algorithms. Expected results of the project include: 1) open source tools and web services that will provide joint analysis of human-generated text data and behavior data in various applications, such as search logs, forum discussions, and opinionated reviews; and 2) annotated corpora and new evaluation metrics that will enable researchers to conduct follow-up research in related domains.