Research Overview
My research interests are to enable "Next-Generation Search" and include:
- Large-scale image/video retrieval -- searching one billion photos and videos
- Semantic event and concept detection in photos and videos
- Mobile and cloud-based image/video applications
- Exploiting social media for knowledge acquisition
- Internet monetization (e.g., advertising)
- Multimedia testbed and evaluation (e.g., LSCOM, TRECVID)
- Investigating machine learning and information retrieval theoretical approaches
- Realizing advanced researches towards business deliverables
Why having research projects in MiRA group? for CSIE undergraduates. Some quick demos:
- Effective and efficient product query by mobile phone

- Interactive video (Question and Answering)

- Realtime image retrieval over million-scale image collections

- Million-scale image graph construction and clustering by cloud computation (MapReduce over 18 Hadoop servers).

- Flora Project: Flower Retrieval and Social Network

- Multimodal Fusion for Mobile Annotation – GPS, Compass, or Camera

- Indexing million-scale faces in sub-second response

- Real-time video retrieval and mobile question and answering

CURRENT Projects
Current Research Focus
- Multimedia Analysis and Detection
- audio, video, text, etc.
- boosting intelligent multimedia applications
- Searching One Billion Images/Videos/Music
- enabling next-generation information framework
|
VOLEX (Visual Object-Level Example) Search Framework
- Goal ¡V effective and efficient content-based search for specific objects in large-scale images and videos
- Opportunities
- Emerging technical thrusts in related fields
- Strong demands for effective visual matching
- Potential applications
- Camera phone as an input device for Q&A, for example, greenery info. inquiry in the park, product price/info. inquiry for shopping
- Trademark or landmark matching
- One of the cores in surveillance videos and medical images
- Investigating effective hash-based (e.g., LSH) or inverted-file methods for large-scale (millions or billions) image/video retrieval
Current results -- We had preliminarily built an engine which can do image object search over millions of images and requires less than a second.
|
UbiQuery ¡V Camera Phone as an Input Device for Q&A
- UbiQuery ¡V camera phone as an input device for Q&A
- Motivations
- Proliferations of camera mobile phones
- Availability of high-speed data transmission over mobile devices
- Snapshot and query by image examples
- Effective devices for product query, landmark search, or unknown object inquiry
Media coverage for our mobile image query project on Sept 1, 2009. Some excerpts (in Mandarin) include TV news and newspapers such as Liberty Times (PDF), UDN News (PDF), etc.
|
Leverage Cloud Computing for Large-Scale Semantic Image Retrieval
- Goal ¡V organizing image search results in semantic clusters at query time (online, real-time)
- Intuition ¡V offline graph-based grouping
- Effective multimodal graph fusion (multiple visual and expanded tag features)
- Efficient clustering (e.g., 42 min for half million photos) and canonical image selection on image graphs
- Proposal
- Leveraging (17-node) Hadoop (MapReduce) and (multiple) sparse features for graph construction
- Clustering and canonical image selection by the proposed Hadoop-based Affinity Propagation
Impacts: first-ever query-time search result clustering (demo video)
|
| |
| |
Attribute-Based People/Car Search in Consumer Photos and Surveillance Videos
- Investigating new perspectives for organizing people and cars, major objects of interest, in photos and videos.
- Beyond low-level representations, discovering more semantic descriptions for the media
- Devising effective (learning, clustering, etc.) algorithms for large-scale dataset
- Leveraging user-contributed data for collecting supervised training data
- Defining new applications for attribute-based search
|
Harnessing Social (User-contributed) Media for Annotation, Visualization, Learning, and Monetization
- Growing practice of online media (video/photo) sharing (e.g., Flickr, YouTube)
- Billion-scale magnitude Bringing profound impacts to new applications and user scenarios
- The technologies do not keep pace with the growth; emerging applications such as search, mining, visualization, and other promising applications
- Great challenges ahead for efficiency, effectiveness, and scalability.
|
Keyword-based Visual Search over Reranking Frameworks
- Improving text-based image/video search with multimodal similarities between documents across sources and domains
- Formulating the solution as a random-walk framework
- Requiring no query expansion, search examples, or pre-trained models
- First work to consider recurrent patterns ¡V improving text-based search up to 40%
|
Keyword-based Visual Search over Large-Scale Concept Ontology
- Motivations
- In light of the strong demands for semantic indexing and search over large-scale consumer photos which generally lack reliable user-provided annotations,
- Investigating the feasibility and challenges entailed by the new paradigm, concept search ¡V retrieving visual objects by large-scale automatic concept detectors.
- Focus
- Investigating effective concept mapping and selection methodologies over large-scale concept ontology;
- Evaluating the quality and feasibility of the pre-trained concept detectors (e.g., LSCOM) applying on cross-domain consumer data (i.e., Flickr photos)
- Investigating fusion strategies between automatic concept and low-quality user annotated data (tags). T
- Through preliminary experiments, we had compared variant concept search techniques and yielded quite promising results for searching consumer photos via automatic concept detectors.
|
Internet Video Advertisement
- Online image/video advertising, one of the problems for Internet Monetization ¡V converting internet assets to cash or money
- Associating relevant ads in the (shared) videos and photos not restricted to text modality only
- Considering user context and profiles
- Optimizing system revenues and contextual relevance
- Example system - MiRA AdVis (patent pending)
|
TREC Video Retrieval Evaluation (TRECVID)
- Online image/video advertising, one of the problems for Internet Monetization ¡V converting internet assets to cash or money
- Associating relevant ads in the (shared) videos and photos not restricted to text modality only
- Considering user context and profiles
- Optimizing system revenues and contextual relevance
|
Advanced Surveillance Platform
- Motivations
- Strong security/monitoring demands for national, community, and residential safety, or even elderly-care monitoring
- Proliferation of variant sensors (e.g., video/PTZ camera, laser scanner, microphone array, etc.)
- Advanced researches in semantic analysis and large-scale video retrieval
- Frequent technical inquiries from industry partners
- Team members experienced with rich expertises
- Focus and Contributions
- Exploiting multiple sensors
- Semantic analysis
- Informative Visualization
- Effective Retrieval
- Joint projects with Prof. Bin-Yu Chen, Prof. Yung-Yu Chuang, Prof. Yi-Ping Hung, and Prof. Chieh-Chih Wang.
|
Large-Scale Cross-Domain Image/Video Near-Duplicate Detection
- Image and video available online grows exponentially
- Challenging problems due to variant distortions caused by image editing, encoding, occlusion and the large number of digital media sources
- Essential tools for topic tracking, visual search, content-based retrieval, and copyright infringement detection
- Requiring novel, efficient, and effective methods!!
|
Research Sponsors for MiRA
Past Projects
Ph.D. Thesis, "An Information-theoretic Framework
towards Large-scale Video Structuring, Threading, and Retrieval," November 2006.
During the PhD study in Columbia University, my proposed hypotheses are experimented through cross-site and cross-disciplinary
projects affiliated with researchers in IBM T. J. Watson Research Center led
by John
R. Smith. I am deeply involved in two research projects. The first is TRECVID,
which has the goal of promoting progress in content-based video retrieval
via open metric-based evaluation. The other is "Reconstructing and
Mining of Semantic Threads across Multiple Video Broadcast News Sources Using
Multi-Level Concept Modeling" funded by Advanced
Research and Development Activity (ARDA), which encourages technology
thrusts and sponsors high risk, high payoff researches in information exploitation.
Since summer 2003, I had been devoted to TRECVID video indexing and retrieval benchmarks affiliating with Columbia University and IBM T.J. Watson, which had achieved one of the top systems then. I will have continuing researches based on the benchmark.
Reconstruction and Mining of Semantic Threads across Multiple Video Broadcast News Sources
|
- Goal ¡V acquiring open-source intelligence from the media
- Effective information exploitation (e.g., video story segmentation, video retrieval, threading, automatic annotation, etc.) from hundreds of international broadcast news video channels
- Funded by US security departments ¡V encouraging technology thrusts
- Extensive experiments through cross-cite and cross-disciplinary projects affiliating with IBM T.J. Watson Research Center and Columbia University
|
LSCOM - Large-Scale Concept Ontology for Multimedia
|
- Goal ¡V bridging the semantic gap
- To support searching, filtering, mining, content-routing, personalization, and summarization
- In the scale of thousands of concepts (449 annotated)
- annotated by 30+ Columbia & CMU students
- 374 detectors of Columbia are online for the public
- Defined by
- Intelligence community users
- Ontology specialists
- Multimedia analytics researchers
|
Topic Tracking for Cross-Domain News Videos with Visual Duplicates and Semantic Concepts
- Augmenting topic tracking with visual duplicates and semantic concepts, automatically detected from videos of distributed sources
- Presenting information-theoretic analysis to assess the complexity of semantic topic and determining the best subset of concepts for tracking each topic
- Improving text-based tracking approaches up to 25%; visual duplicates even outperform text-only approaches in certain topics
|
Statistic Framework for Fusing Mid-Level Features for International Broadcast News Video Segmentation
- Investigating statistic approaches to induce and fuse diverse features from multiple levels and modalities including visual, audio, and text in international broadcast news videos
- Extending the Maximum Entropy model and invent a novel feature wrapper
- Proposing novel features such as Mandarin syllable cue terms and significant pauses (pitch-related)
- One of the best systems in TRECVID 2003
|
Honors and Awards
- Microsoft Research Award 2009 in Multimedia Search
- Top-5 (adImage) in Microsoft Taiwan Imagine Cup 2008 with Kuan-Ting Chen and Wei-Shing Liao
- Nominated for Best Paper Award in ACM Multimedia 2006, the most prestigious multimedia conference
- Awarded ACM Multimedia 2006 Travel Grant.
- Named in the Watson Emerging Leaders in Multimedia Research Workshop 2006,
organized by IBM Research and to recognize (8) top senior PhD students in multimedia research. I
delivered research presentations and had interesting discussions with IBM researchers on Oct. 16-17, 2006.
- Awarded "Taiwan
Elite Award," also called "Taiwan Merit Scholarships (TMS) Program," issued by Taiwan government, in 12 selected research areas
vital to future developments. 68 students/post-docs are awarded from 2100
applicants after screening and oral presentations (3.2% acceptance rate), 2005.