Ph.D. Candidate
Office ISI 1038
Information Sciences Institute (ISI), University of Southern California
4676 Admiralty Way, Marina del Rey, CA, 90292
Email: minhpham@usc.edu or minhpham@isi.edu

Biography

I am a Ph.D. Candidate in the Computer Science Department at University of Southern California (USC) in Los Angeles. I am working as a research assistant at the Center on Knowledge Graphs, advised by Prof. Craig Knoblock. Before joining USC in 2015, I earned my B.E. in Computer Science from HCMC University of Technology (honor program) in 2014.

My research focuses on machine learning and AI techniques to reduce human effort in data integration. Particularly, I have been working on unsupervised and semi-supervised approaches for error detection and fact verification for tabular data. More information is available on my resume.

Projects

Robust and Proactive Error Detection in Tables

  • Errors in Web tables are difficult to detect and correct because of its unknownness, heterogeneity and rarity.
  • In this research, we proactively leverage human feedback using a neural-symbolic framework to increase the error detection performance.
  • The framework uses Probabilistic Soft Logic (PSL) to infer the active learning examples and Deep Neural Networks to classify errors based on labeled examples.

Paper Github
Learning Data Transformation with Minimal User Effort

  • Learning transformations to convert data between different formats is challenging since it normally requires extensive user effort to find and label examples in each format.
  • In this research, we proposed an unsupervised approach to transform data by leveraging hierarchical clustering to group string values into multiple format-like clusters and infer the transformations between clusters using semantic labeling techniques without any user labeling. (ISWC 2016 paper).

Paper Github
Domain-independent Semantic Labeling

  • The problem of source modeling is to automatically build semantic descriptions of data for publishing structured sources to knowledge graphs.
  • Semantic labeling is an important step in source modeling where attributes in data sources are mapped with the corresponding properties in ontologies.
  • We use generalized pairwise similarity measures to generate features and binary classification to predict the properties of each attribute.

Paper Github

Selected Publications

Awards