Ratul Ghosh

Ratul Ghosh

Grad Student in CS | Researcher | Former Applied Scientist at Amazon & Société Générale

University of California, Irvine

About Me

Currently a Graduate student in Computer Science at the Donald Bren School of Information and Computer Sciences at University of California, Irvine advised by Prof. Sameer Singh.

Previously worked as part of the Innovation team at Société Générale. Works involved developing business transforming product suites focused on automatic KYC remediation, extracting context-based information from scanned documents, knowledge graphs, trade finance automation, semantic search and multilingual chatbots using AI.

Before that, I got my Bachelors degree in Electronics and Communication from Indian Institute of Information Technology Allahabad.

During my bachelors, I was fortunate to work with some amazing mentors in different research labs. I have worked with Prof. Kuntal Ghosh and Prof. Sanjit Maitra at the Center for Soft Computing Research of Indian Statistical Institute. I have done my bachelor thesis at the TCS Innovation Labs working with Dr. Brojeshwar Bhowmick.

Apart from my academic endeavours, I like playing video games, sketching, cooking, reading about economics and history.


  • Deep Learning
  • Natural Language Processing
  • Computer Vision
  • Software Development
  • Cooking
  • Photography
  • Sketching


  • Master's degree in Computer Science, 2022

    University of California, Irvine

  • BTech in Electronics and Communication, 2018

    Indian Institute of Information Technology, Allahabad



Applied Scientist Intern


Jun 2021 – Aug 2021 Seattle, Washington

  • Semi-Supervised learning for Risky Vendor Identification.

Graduate Student Researcher

Donald Bren School of Information and Computer Sciences

Nov 2020 – Present Irvine, California

  • Advisors: Prof. Sameer Singh
  • Working on Multimodal attributes extraction.
  • Scientific claims and facts verification.
  • UCI Machine Learning Repository.

Data Scientist

Société Générale

Aug 2018 – Sep 2020 Bangalore & Paris

  • Worked on the creation of a chatbot(MAX) that provides customers a more natural way of interacting with information systems. The product won the NASSCOM Top 50 AI Game Changers award.
  • Developed and deployed Insta KYC solution in France, Romania & Serbia. Brought in a savings of 100 FTE & reduced the remediation time from 20 minutes per client (manual) to 40 seconds (automated) by automating the KYC remediation process using Machine Learning.
  • Developed a language-agnostic search tool for the extraction of business knowledge trapped inside thousands of documents.
  • Worked on the automation of loan document verification for sub-Saharan Africa. The solution reduced the previous verification time from 2 weeks per client (manual) to around 30 minutes by automating the entire process using Machine Learning.
  • Currently working on automation of trade finance verification.

Research Intern

TCS Research & Innovation

Jan 2018 – Jul 2018 Bangalore & Kolkata
Worked on Content-Based Image Retrieval and Clustering for Collaborative Slam at the Machine Vision division of TCS Innovation Lab. Research work published at IEEE Winter Conference on Applications of Computer Vision (WACV) and led to a patent.

Software Developer(Machine Learning) Intern

ThirdEye Data

May 2017 – Aug 2017 Kolkata, India

  • Worked on a cloud-based ETL and Data Analytics system.
  • Deployed image classification, image retrieval models using TensorFlow and data transformation from different sources like S3, redshift, etc. using AWS.

Project Reviewer and Mentor


Mar 2017 – Present Remote

  • I review Machine Learning, Deep Learning, Artificial Intelligence, AI for Trading, Data Engineering and Data Analyst projects for Udacity Nanodegree programs. Help give specific and actionable feedback to students.
  • Technologies & Algorithms: Pytorch, Keras, Tensorflow, Search, Planning, Constraint Satisfaction Problems, Advance Game Playing, Python, Sklearn, Numpy, AWS, Neural Networks, RNN, CNN, GAN, Kafka, Docker, Cassandra etc.
  • Review and test contents for new Nanodegree programs.

Data Science Intern

Busigence Technologies

Dec 2016 – Feb 2017 Gurgaon, India

  • Designed and implemented applications to support distributed processing and transformation of raw data using Apache Spark. Performance increased by 10X from previous solution.
  • Used Spark MLlib and TensorFlow for performing machine learning and associated tasks on massive datasets.

Recent Posts

Demystifying Confidence Interval and Margin of Error

What does saying I'm 95% confident really means statistically? I tried to find more about the Confidence Interval and Margin of Error but they seem to be quite hard to understand for someone without in-depth knowledge of statistics. The challenge is most of the...

What distinguishes a neural network that generalizes well from those that don't?

The results look quite interesting as the model can perfectly fit the noisy Gaussian samples. It also perfectly fits the training data with completely random labels although it takes a bit more time. This shows that a deep neural network with enough parameters could completely memorize some random inputs.

Using Deep Learning to Analyse Movie Posters for Gender Bias

co-author: Aiyaz Miran, Mohammad Shahebaz It has been historically documented how pop culture and societal norms relevantly play out each other, often mirroring the behavioral trends in the population at large.

Hypothesis Testing along with Type I & Type II Errors explained simply

Hypothesis Testing along with Type I & Type II Errors explained simply. How to select the right test for an Experiment and make a decision based on statistical evidence?.


I am a project mentor and reviewer for the following Nanodegrees at Udacity:

Served as the Coordinator of my Institute’s technical society. In this role, I have to take special programming tutorial classes and workshops of my junior batches.

Was a part of IEEEE IIITA Student Chapter, where my responsibilites was to coordinate all technical publications and event reports.


Young Data Scientist Challenge

3rd rank in the Young Data Scientist Challenge organized by ZS Associates.


1st global rank in the Machine Learning hackathon organized by Société Générale.

Indeed Machine Learning CodeSprint

10th rank in the international level machine learning contest organised by Indeed.

WalmartLabs CodeSprint

21st rank in the international level machine learning contest organised by WalmartLabs.


Won the second prize in the hackathon organized by the Indian Institute of Science and Strand Life Science in March 2016.



Gender Bias in Posters

Using Deep Learning to Analyse Movie Posters for Gender Bias.


InstaKYC is an awarding winning deep learning based platform designed to transform & automate KYC processes

Loan Document Verification

Using Machine Learning to automate loan document verification.


A conversation UX that provides customers a more natural way of interacting with information systems.

Movie Stream

Python script for streaming movies directly from torrent to yours vlc media player.


An AutoML platform, for conducting AI/ML related experiments.

Security & Surveillance

IoT based general purpose Security and Surveillance system for home, office, traffic etc.


Language agnostic search tool for private documents.


A novel approch to conserve electricity using smart grid.


Using Machine Learning to automate trade finance verification. The solution involves extracting critical information across various documents like Bill of lading, Letter of credit, etc, and making sure those documents comply with ICC UCP & ISBP rules.


Android app for the annual cultural and technical festival of IIIT Allahabad.

Image Segmentation using CNN

Segmentation of White Blood Cells and Brachial Plexus from Ultrasound Images using CNN based UNet architecture.

Mutation Level Prediction

Predicting the level of the mutation using machine learning and signifying whether it is Benign, Likely Pathogenic, Pathogenic or something else. The dataset was provided by Strand Life Sciences.


(2019). Applications of Deep Learning in Medical Imaging. In Handbook of Deep Learning Applications by Springer.

PDF DOI Chapter Link

(2019). Deep representation learning characterized by inter-class separation for image clustering. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

PDF DOI Paper Link

(2017). Automatic detection and classification of diabetic retinopathy stages using CNN. In 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN).

PDF DOI Paper Link


Please feel free to reach to me for any query