Research Overview

My research interests are to enable "Next-Generation Search" and include:

>> Why having research projects in MiRA group? for CSIE undergraduates. Some quick demos:


CURRENT Projects

Current Research Focus

  • Multimedia Analysis and Detection
    • audio, video, text, etc.
    • boosting intelligent multimedia applications
  • Searching One Billion Images/Videos/Music
    • enabling next-generation information framework

VOLEX (Visual Object-Level Example) Search Framework

  • Goal ¡V effective and efficient content-based search for specific objects in large-scale images and videos
  • Opportunities
    • Emerging technical thrusts in related fields
    • Strong demands for effective visual matching
  • Potential applications
    • Camera phone as an input device for Q&A, for example, greenery info. inquiry in the park, product price/info. inquiry for shopping
    • Trademark or landmark matching
    • One of the cores in surveillance videos and medical images
  • Investigating effective hash-based (e.g., LSH) or inverted-file methods for large-scale (millions or billions) image/video retrieval
  • newsCurrent results -- We had preliminarily built an engine which can do image object search over millions of images and requires less than a second.

  UbiQuery ¡V Camera Phone as an Input Device for Q&A

  • UbiQuery ¡V camera phone as an input device for Q&A
  • Motivations
    • Proliferations of camera mobile phones
    • Availability of high-speed data transmission over mobile devices
  • Snapshot and query by image examples
  • Effective devices for product query, landmark search, or unknown object inquiry
  • newMedia coverage for our mobile image query project on Sept 1, 2009. Some excerpts (in Mandarin) include TV news and newspapers such as Liberty Times (PDF), UDN News (PDF), etc.

Leverage Cloud Computing for Large-Scale Semantic Image Retrieval

  • Goal ¡V organizing image search results in semantic clusters at query time (online, real-time)
  • Intuition ¡V offline graph-based grouping
    • Effective multimodal graph fusion (multiple visual and expanded tag features)
    • Efficient clustering (e.g., 42 min for half million photos) and canonical image selection on image graphs
  • Proposal
    • Leveraging (17-node) Hadoop (MapReduce) and (multiple) sparse features for graph construction
    • Clustering and canonical image selection by the proposed Hadoop-based Affinity Propagation
  • newImpacts: first-ever query-time search result clustering (demo video)
 
 

Attribute-Based People/Car Search in Consumer Photos and Surveillance Videos

  • Investigating new perspectives for organizing people and cars, major objects of interest, in photos and videos.
  • Beyond low-level representations, discovering more semantic descriptions for the media
  • Devising effective (learning, clustering, etc.) algorithms for large-scale dataset
  • Leveraging user-contributed data for collecting supervised training data
  • Defining new applications for attribute-based search

Harnessing Social (User-contributed) Media for Annotation, Visualization, Learning, and Monetization

  • Growing practice of online media (video/photo) sharing (e.g., Flickr, YouTube)
  • Billion-scale magnitude Bringing profound impacts to new applications and user scenarios
  • The technologies do not keep pace with the growth; emerging applications such as search, mining, visualization, and other promising applications
  • Great challenges ahead for efficiency, effectiveness, and scalability.

Keyword-based Visual Search over Reranking Frameworks

  • Improving text-based image/video search with multimodal similarities between documents across sources and domains
  • Formulating the solution as a random-walk framework
  • Requiring no query expansion, search examples, or pre-trained models
  • First work to consider recurrent patterns ¡V improving text-based search up to 40%

Keyword-based Visual Search over Large-Scale Concept Ontology

  • Motivations
    • In light of the strong demands for semantic indexing and search over large-scale consumer photos which generally lack reliable user-provided annotations,
    • Investigating the feasibility and challenges entailed by the new paradigm, concept search ¡V retrieving visual objects by large-scale automatic concept detectors.
  • Focus
    • Investigating effective concept mapping and selection methodologies over large-scale concept ontology;
    • Evaluating the quality and feasibility of the pre-trained concept detectors (e.g., LSCOM) applying on cross-domain consumer data (i.e., Flickr photos)
    • Investigating fusion strategies between automatic concept and low-quality user annotated data (tags). T
  • Through preliminary experiments, we had compared variant concept search techniques and yielded quite promising results for searching consumer photos via automatic concept detectors.

 

  Internet Video Advertisement

  • Online image/video advertising, one of the problems for Internet Monetization ¡V converting internet assets to cash or money
  • Associating relevant ads in the (shared) videos and photos not restricted to text modality only
  • Considering user context and profiles
  • Optimizing system revenues and contextual relevance
  • Example system - MiRA AdVis (patent pending)

  TREC Video Retrieval Evaluation (TRECVID)

  • Online image/video advertising, one of the problems for Internet Monetization ¡V converting internet assets to cash or money
  • Associating relevant ads in the (shared) videos and photos not restricted to text modality only
  • Considering user context and profiles
  • Optimizing system revenues and contextual relevance

  Advanced Surveillance Platform

  • Motivations
    • Strong security/monitoring demands for national, community, and residential safety, or even elderly-care monitoring
    • Proliferation of variant sensors (e.g., video/PTZ camera, laser scanner, microphone array, etc.)
    • Advanced researches in semantic analysis and large-scale video retrieval
    • Frequent technical inquiries from industry partners
    • Team members experienced with rich expertises
  • Focus and Contributions
    • Exploiting multiple sensors
    • Semantic analysis
    • Informative Visualization
    • Effective Retrieval
  • Joint projects with Prof. Bin-Yu Chen, Prof. Yung-Yu Chuang, Prof. Yi-Ping Hung, and Prof. Chieh-Chih Wang.

  Large-Scale Cross-Domain Image/Video Near-Duplicate Detection

  • Image and video available online grows exponentially
  • Challenging problems due to variant distortions caused by image editing, encoding, occlusion and the large number of digital media sources
  • Essential tools for topic tracking, visual search, content-based retrieval, and copyright infringement detection
  • Requiring novel, efficient, and effective methods!!

 

Research Sponsors for MiRA

 

   nsc  cyberlink  ntu

 

iii   chttl   microsoft sneergy MStar

 

Past Projects

 

Ph.D. Thesis, "An Information-theoretic Framework towards Large-scale Video Structuring, Threading, and Retrieval," November 2006.

During the PhD study in Columbia University, my proposed hypotheses are experimented through cross-site and cross-disciplinary projects affiliated with researchers in IBM T. J. Watson Research Center led by John R. Smith. I am deeply involved in two research projects. The first is TRECVID, which has the goal of promoting progress in content-based video retrieval via open metric-based evaluation. The other is "Reconstructing and Mining of Semantic Threads across Multiple Video Broadcast News Sources Using Multi-Level Concept Modeling" funded by Advanced Research and Development Activity (ARDA), which encourages technology thrusts and sponsors high risk, high payoff researches in information exploitation.

Since summer 2003, I had been devoted to TRECVID video indexing and retrieval benchmarks affiliating with Columbia University and IBM T.J. Watson, which had achieved one of the top systems then. I will have continuing researches based on the benchmark.

  Reconstruction and Mining of Semantic Threads across Multiple Video Broadcast News Sources

  • Goal ¡V acquiring open-source intelligence from the media
  • Effective information exploitation (e.g., video story segmentation, video retrieval, threading, automatic annotation, etc.) from hundreds of international broadcast news video channels
  • Funded by US security departments ¡V encouraging technology thrusts
  • Extensive experiments through cross-cite and cross-disciplinary projects affiliating with IBM T.J. Watson Research Center and Columbia University

  LSCOM - Large-Scale Concept Ontology for Multimedia

  • Goal ¡V bridging the semantic gap
    • To support searching, filtering, mining, content-routing, personalization, and summarization
    • In the scale of thousands of concepts (449 annotated)
    • annotated by 30+ Columbia & CMU students
    • 374 detectors of Columbia are online for the public
  • Defined by
    • Intelligence community users
    • Ontology specialists
    • Multimedia analytics researchers

  Topic Tracking for Cross-Domain News Videos with Visual Duplicates and Semantic Concepts

  • Augmenting topic tracking with visual duplicates and semantic concepts, automatically detected from videos of distributed sources
  • Presenting information-theoretic analysis to assess the complexity of semantic topic and determining the best subset of concepts for tracking each topic
  • Improving text-based tracking approaches up to 25%; visual duplicates even outperform text-only approaches in certain topics

  Statistic Framework for Fusing Mid-Level Features for International Broadcast News Video Segmentation

  • Investigating statistic approaches to induce and fuse diverse features from multiple levels and modalities including visual, audio, and text in international broadcast news videos
  • Extending the Maximum Entropy model and invent a novel feature wrapper
  • Proposing novel features such as Mandarin syllable cue terms and significant pauses (pitch-related)
  • One of the best systems in TRECVID 2003

 

Honors and Awards