Rohan Wadhawan

Life is like a puzzle in progress. In the end, we all wish that the pieces come together into a masterpiece...

I am a Masters's student in the Department of Computer Science at UCLA, advised by Prof. Violet Peng. I am currently engaged as a graduate student researcher and member of the PLUS lab, which is part of the UCLA NLP group.

I specialize in developing next-generation Chatbots, particularly focusing on Multimodal Chatbots. My research interests encompass Multimodal Learning, Generative AI, Large Language Models, Vision-Language Models, Alignment with Human Feeback and Instruction Tuning.

Outside of UCLA, I actively collaborate with Prof. Tapan Gandhi, Professor, Department of Electrical Engineering Indian Institute of Technology Delhi (Head of Neurocomputing Lab). Before my graduate academic pursuits, I gained valuable experience as a Senior Software Engineer in Samsung R&D's Visual Intelligence Team.

I'm enthusiastic about discussing the transformative impact of generative AI on our world today and brainstorming innovative startup concepts that harness this knowledge. Let's connect and dive into these exciting discussions!

Get my Resume

Get in Touch


Upcoming Travel


News

  • Joining Abridge AI as an ML Science Intern for Summer 2024!
  • ConTextual is heading to ICML 2024 🎉
  • ConTextual is accepted for Oral presentation at 3rd Vision Datasets Understanding (VDU) workshop at CVPR 2024!
  • ConTextual is accepted at 1st Evaluation of Generative Foundation Models (EVGENFM) workshop at CVPR 2024!
  • Blog detailing the ConTextual dataset has been published in collaboration with HuggingFace. Thank you Clementine Fourrier for the support!
  • ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models preprint available on Arxiv. Thanks @AK for sharing the work!
  • Graduate Student Researcher, PLUS Lab, UCLA, advised by Prof. Violet Peng: June 2023 - Present.
  • Teaching Associate for Natural Language Processing - CS263 (Graduate) and CS162 (Undergraduate) courses, offered by the Dept. of Computer Science, UCLA: Spring 2024, Winter 2024.
  • Teaching Assistant for Data Structures and Object Orientation courses offered by Dept. of Computer Science and Dept. of Mathematics, UCLA: Fall 2022, Winter 2023, Spring 2023
  • AI studio Tutor for Break Through Tech AI program, offered by Dept. of Computer Science at UCLA, Fall 2022.
  • Started my Graduate Studies in the Computer Science Department at University of California, Los Angeles (UCLA), September 2022.

Education

University of California, Los Angeles (UCLA)

Masters of Science in Computer Science
Los Angeles, CA
Expected December 2024

GPA: 3.96/4.0

Coursework: Natural Language Generation, Natural Language Processing, Big Data Analytics, Large Scale Machine Learning, Reinforcement Learning, Human-AI Interaction, Neural Networks & Deep Learning

Graduate Student Researcher and PLUS Lab (UCLA NLP Group) member: Summer 2023 - Present

Teaching Positions:

  • Teaching Associate - Natural Language Processing: Spring 2024, Winter 2024.
  • Teaching Assistant - Data Structures and Object Orientation: Fall 2022, Winter 2023, Spring 2023.
  • AI Tutor - Break Through Tech AI: Fall 2022.
  • Master's Thesis title: Advancing Vision-Language Models: Benchmark Dataset, Instruction Tuning and Aligning with Human & AI Feedback - Work in Progress!



    Netaji Subhas Institute of Technology, University of Delhi

    Bachelors of Engineering in Computer Engineering
    New Delhi, India
    July 2020

    Admission: Secured 2520 rank in all India Joint Engineering Entrace (JEE) Main exam, 2016
    Placed in Top 0.2% out of 1.2M candidates

    CGPA: 8.94/10.00 (1st division with Distinction, 89.4%)

    Relevant Coursework: Mathematics (Linear Algebra, Multivariate Calculus), Discrete Structres (Logic, Counting Principles, Probability, Graph Theory), Algorithms, Artificial Intelligence, Neural Networks, Big Data and Analytics

    Bachelor's Thesis title: Face Synthesis using Descriptions Extracted from Unstructured Text


    Research Experience

    University of California Los Angeles

    Graduate Student Reseacher, Advised by Prof. Violet Peng (PLUS lab)
    • ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
      • Established a novel benchmark comprising instructions designed explicitly to evaluate the context-sensitive reasoning of LMMs on the text and visual elements in text-rich images.
      • Performed exhaustive quantitative (human & automatic evaluation like GPT4 eval) and qualitative analysis of 14 foundation models: closed source LMMs (GPT-4V(ision), Gemini-Pro-Vision), open-source LMMs (LLaVA-Next, Instruct-Blip-2, etc.) and augmented LLMs (GPT4 + layout OCR + image caption)
      • GPT-4V(ision), the best performing LMM, has a performance gap of 30.8% to human performance. Go to Publication
    • Reducing Hallucinations in Vision Language models via Preference Optimization (Work in Progress):
      • Analyzed the impact of preference optimization techniques (PPO, DPO, KTO), scaling data & model scale, and hyperparameters on reducing hallucinations while maintaining informativeness.
      • Addressed the limitations of current preference data by synthetically creating high-quality negative samples.
      • Fine-Tuned LLaVA using PEFT (LoRA) and evaluated performance on 10 informativeness and hallucination benchmarks.
    • Skills used: PyTorch, HuggingFace, Lightning AI, WandDB, OpenAI, Amazon MTurk, Prompt Engineering
    Los Angeles, CA, USA
    June 2023 - Present

    Indian Institute of Technology, Delhi (IIT D)

    Research Assistant, Advised by Prof. Tapan K. Gandhi (Neurocomputing Lab)
    • Architected a human-inspired, landmark-aware ensemble Facial Expression Recognition Network that improved the current benchmark on the CK+ and JAFFE datasets by 0.51% and 5.34% and required only 3.28 MFLOPs for inference. Go to Publication
    • Invented a spatio-temporal deep learning pipeline for water stress phenotyping of Chickpea plant that achieved a ceiling level classification performance of 98.52% on JG-62 and 97.78% on Pusa-372 chickpea plant shoot image dataset and outperformed the best reported time-invariant technique by at least 14%, being robust to noisy input, with a less than 2.5% dip in average model accuracy and a small standard deviation. Go to Publication
    • Designed and carried out neural network simulations for multiple lab projects on physiological signal processing.
    • Skills used: Keras, TensorFlow, PyTorch, MATLAB, OpenCV, Google Colab, Overleaf
    New Delhi, India
    July 2020 - January 2021

    Netaji Subhas Insititute of Technology, University of Delhi

    Undergraduate Researcher, Department of Computer Science & Engineering
    New Delhi, India
    August 2018 - July 2020

    Work Experience

    Samsung R&D Institute

    Senior Software Engineer, Visual Intelligence Team
    • Enhanced tone and saturation of images rendered by the Expert Raw Application via integration of a new deep learning module to its pipeline in flagship models (S22).
    • Commercialized 5 camera solutions like VideoStabilization, Hyperlapse, and Single Take on >10 smartphones and tablets like S22, A22e, M53 and Tab S8.
    • Developed a toolkit for fine-grained latency analysis of Samsung's propriety interface layer in the camera software stack.
    • Skills used: C++, Python, Computer Vision, Deep Learning, Android, Version Control, Object Orientation
    Bengaluru, India
    January 2021 - June 2022
    Software Engineer Intern, Visual Intelligence Team
    • Programmed an android application that automated and reduced the testing time of the camera module by 90% .
    • Skills used: Android, Java
    Bengaluru, India
    May 2019 - July 2019

    Nable IT Consultancy Services, a computer vision startup

    Software Engineer - Artificial Intelligence Intern
    • Improved the edge-based facial recognition system's accuracy and efficiency and made it agnostic to facial sizes.
    • Modularized facial recognition pipeline and Developed various standalone facial recognition applications on top of it.
    • Skills used: Python, Computer Vision, Deep Learning, Databases, Object Orientation, Project Management
    New Delhi, India
    January 2020 - March 2020

    Teaching Experience

    University of California, Los Angeles (UCLA)

    Teaching Associate - CS263: Natural Language Processing (Graduate)
    • Facilitating regular hands-on sessions and contributing to the development of assignments and examinations for a cohort of 150 Graduate students.
    Los Angeles, CA
    April 2024 - June 2024
    Teaching Associate - CS162: Natural Language Processing
    • Spearheaded the creation and ongoing management of the Fact or Flawed? course project, aimed at evaluating the Factuality and Fairness of Large Language Models like LLaMA-2, Phi-2, Gemma for a cohort of 130 undergraduate students.
    • Facilitating regular hands-on sessions and contributing to the development of assignments and examinations.
    January 2024 - March 2024
    Teaching Assistant - Data Structures and Object Orientation - CS 32, PIC 10A
    • Conducted weekly hands-on sessions on Data Structures and Object Orientation in C++ for 150 undergraduate students
    • Created an automation script that significantly reduced grading errors and workload of teaching assistants.
    September 2022 - June 2023
    AI Studio Tutor - Break Through Tech AI, Computer Science Department
    • Mentored 26 students belonging to underrepresented groups in tech.
    • Delivered workshops on data analysis, modeling & evaluation, monitored team progress, and helped teams complete their respective ML challenges.
    September 2022 - December 2022

    Projects

    Empirical analysis of pruning strategies in Federated Learning

    Large Scale Machine Learning Course Project
    • Investigated the effect of pruning on model generalization and accuracy vs. efficiency tradeoff in federated setup.
    • Proposed protocol applied to the Iterative Magnitude Pruning method achieved an improvement of 6% on FMNIST and 8% on MNIST for a 99% prune ratio when compared with one iteration of standard federated training.
    • Skills used: Python, PyTorch, Google Colab, Latex, Overleaf
    Los Angeles, CA, United States
    October 2022 - December 2022

    Spotify music recommendation using Reinforcement Learning

    Reinforcement Learning Course Project
    • Simulated Spotify music recommendations, precisely predicting skips in Spotify streaming data as a Reinforcement Learning Problem.
    • Employed Deep Deterministic Policy Gradient framework and Offline Reinforcement learning techniques to recommend a diverse list of songs that reduced the average skip rate by 12% from the baseline. Go to Project Page
    • Skills used: Python, TensorFlow, Google Colab, Git
    Los Angeles, CA, United States
    October 2022 - December 2022

    Face Synthesis using Descriptions Extracted from Unstructured Text

    Undergraduate Head Researcher - Bachelor’s Thesis
    • Invented a novel pipeline to generate faces from their corresponding textual description. The motivation was to augment the reading experience for young children, especially those with reading difficulty, by animating characters through facial cues. Go to Publication
    • Developed a crowdsourcing platform and consolidated our Multi-Attributed and Structured Text-to-face (MAST) dataset consisting of structured textual descriptions for face images.
    • Performed text classification to filter out descriptive sentences from a textual data consolidation of Gutenberg and Face2Text datasets using Bi-LSTM with attention mechanism and achieved 98.5% accuracy and 0.97 F1 score on the test set
    • Devised an algorithm for fast transformation of an unstructured facial description to a structured one; it has linear complexity with respect to the number of words in the sentence
    • Trained an Attentional Generative Adversarial Network to synthesize faces from structured descriptions and reported benchmark scores of 54.09 Freechet' s Inception Distance, 1.080 Facial Semantic Distance, and 60.42% Facial Semantic Similarity on our MAST dataset
    • Skills used: Keras, TensorFlow, PyTorch, OpenCV, Google Colab, Microsoft Cognitive Service, Angular Framework, MongoDB, NodeJS, Heroku, Overleaf
    August 2019 - July 2020

    Project ViSTARa - India Winner UN Reboot the Earth Hackathon

    United Nations Technology Innovation Labs, India
    • Designed a web-based Learning Management platform to Educate rural women through India's Self-Help Group network and Empower them to lead green climate initiatives. Coverage
    • Devised a recommender system to suggest crops that require minimum supplementary irrigation based on rainfall patterns and Employed predictive analytics to detect deforestation at the district level in India
    August 2019

    GRiD Flipkart Machine learning challenge for Large Scale Object Localization

    Flipkart, Bangalore
    • Architected a ResNet-34 inspired model to perform object localization on Flipkart’s large and diverse items dataset.
    • Trained the model on images of size 128x96 (downscaled from VGA to a 1MP camera resolution)
    • Achieved an IoU score of 90.05% on the private test set.
    • Skills used: Keras, TensorFlow, OpenCV, Google Colab.
    January 2019 - March 2019

    Skillset recommendation system to aid engineering aspirants in securing an Internship

    Undergraduate researcher - Soft Computing Semester Project
    • Proposed a skill set recommender system to aid engineering aspirants in securing an Internship. Go to Project Page
    • Consolidated a small dataset of various skills an aspiring intern may have and categorized them as generic, company-specific, and domain-specific.
    • Modeled the skill selection problem as a combinatorial optimization problem with multiple objectives and employed a Genetic algorithm (GA) to solve it.
    • Established a Fitness Function to evaluate the fitness of each individual in the chromosome population.
    • Formulated an Objective Function to combine opposing goals of finding the best skillset while minimizing the time to achieve it.
    • Implemented a modular GA pipeline in C++ to evaluate and select the optimum set from the possible combinations of GA operations: population initialization, parent selection, crossover, mutation, survivor selection, and termination.
    • Skills used: C++.
    August 2018 - December 2018

    Game Playing Agents - Tic Tac Toe AI

    Undergraduate researcher - Course Project
    • Simulated adversarial games between AI agents on a 3x3 and a 4x4 Tic Tac Toe board. Go to Project Page
    • Observed 1-move lookahead provided the best tradeoff between win-draw-loss ratio and time to decide the optimum move, irrespective of boardsize.
    • Skills used: Python.
    October 2018 - November 2018

    Book My Flight - Database Management Semester Project

    Department of Computer Engineering, Netaji Subhas Insititute of Technology, an affiliate of Delhi University
    • Implemented a Flight Booking Management system for domestic flights in India.
    • Modeled a MySQL database system with a complex database trigger and recovery mechanism.
    • Designed Java-based user interface. Go to Project Page
    • Skills used: Java, MySQL, ERDPlus.
    August 2017 - December 2017

    Publications

    * Co-First Authors


    Awards & Honors


    Blogs

    HuggingFace 🤗

    • Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

      Go to HuggingFace Article

    Byte-size Information to Chew on

    Paper Synopsis Blog Series

    • Synopsis: Multi-Attributed and Structured Text-to-Face Synthesis

      Go to Medium Article

    • Synopsis: Intelligent Monitoring of Stress Induced by Water Deficiency in Plants Using Deep Learning

      Go to Medium Article

    • VQ-GAN & Transformer — Taming Transformers for High-Resolution Image Synthesis: Synopsis

      Go to Medium Article

    Project Blogs


    Skills

    Programming Languages
    Research Tools and Frameworks
    Development Tools and Frameworks
    Cloud Platforms

    Hobbies


    Contact Me

    Email:
    rohanwadhawan7[AT]gmail[DOT]com

    Academic Email:
    rwadhawan7[AT]g[DOT]ucla[DOT]edu

    LinkedIn:

    GitHub: