Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Posts

Precomputed Embeddings vs. Real-Time Retrieval (RAG)

2 minute read

Published: March 01, 2025

Large Language Models (LLMs) rely on efficient retrieval strategies to generate accurate, context-aware responses. The two primary approaches are:

Fine-Tune GenAI Models

7 minute read

Published: February 27, 2025

Fine-tuning Generative AI (GenAI) models allows us to adapt pre-trained models for specific tasks, styles, or datasets while maintaining efficiency. Instead of training large models from scratch, fine-tuning enables customization with lower computational costs and faster adaptation to new domains.

GenAI Models Quality Evaluations: Text and Image

4 minute read

Published: February 23, 2025

Evaluating Generative AI (GenAI) models is challenging due to their complex and diverse outputs across different modalities (text, image, and multimodal generation). Unlike traditional supervised learning models, where direct comparison with ground truth labels is feasible, GenAI models often require implicit evaluation techniques to assess quality, coherence, and usability.

From Text Transformer to Vision Transformer Model

4 minute read

Published: February 21, 2025

Transformer models have revolutionized large language models (LLMs) and are now widely used across multimodal AI applications, including text generation, conversational AI, and vision-based models. These models have set a new standard for natural language understanding, reasoning, and content generation by leveraging attention mechanisms to capture long-range dependencies and contextual relationships.

Delve into the Attention Mechanisum

6 minute read

Published: February 15, 2025

The Attention Mechanism is the core idea behind modern large language models (LLMs). It allows models to focus on important words in a sentence while ignoring irrelevant details.

Understanding K-Means Clustering: A Step-by-Step Guide

2 minute read

Published: February 01, 2025

1. What is K-Means Clustering?

K-Means is an unsupervised learning algorithm used for clustering data points into groups based on similarity. It is widely used in data segmentation, customer profiling, and image compression.

Simple Decision Trees and Its Implementation

4 minute read

Published: January 30, 2025

A decision tree is a machine learning model that makes decisions by recursively splitting data based on the best feature, forming a tree-like structure.

Reinforcement Learning with the Snake Game

4 minute read

Published: January 24, 2025

Reinforcement Learning (RL) is like training a pet: the agent (learner) explores its environment, takes actions, and gets rewards or penalties. Over time, it learns which actions lead to better rewards and avoids actions that result in penalties.

Classification Evaluation: ROC and AUC calcuation

3 minute read

Published: January 23, 2025

ROC AUC is a key evaluation metric for binary classification models. It measures how well a model distinguishes between positive and negative classes.

Naive Bayes Theory and Example on Spam Email Detection

6 minute read

Published: January 20, 2025

In this blog post, we’ll explore Naive Bayes, a simple yet powerful algorithm used for classification tasks like spam detection. We’ll break down the theory, provide intuitive examples, and show you how to implement it from scratch in Python. Whether you’re new to machine learning or preparing for an interview, this guide will help you understand Naive Bayes in a simple and concise way.

Understand the Poission Distribution

3 minute read

Published: January 13, 2025

The Poisson probability distribution is used to model the number of times an event happens in a fixed period of time or space. It is useful when events occur independently and at a constant average rate. This distribution is widely applied in areas like call centers, traffic flow, biology, and machine learning.

Understanding Stochastic Gradient Descent (SGD)

2 minute read

Published: January 11, 2025

Stochastic Gradient Descent (SGD) is an optimization algorithm used in machine learning and deep learning to minimize the loss function. Unlike standard Gradient Descent, which computes the gradient using the entire dataset, SGD updates the model one sample at a time, making it more efficient for large datasets.

Selection of the Loss Functions for Logistic Regression

5 minute read

Published: January 10, 2025

When training a machine learning model, choosing the right loss function is critical to ensuring effective learning. In linear regression, Mean Squared Error (MSE) is commonly used, but for logistic regression, it becomes problematic.

Interviewing and Networking Tips for MLE New Grads

6 minute read

Published: October 23, 2020

As a Ph.D. candidate with three internship experiences and closely related research experience, job hunting still took more than my expected time especially under the impact of the Pandemic.

portfolio

A cross-library image augmentation module for Deep Learning Training

Published: April 26, 2025

A versatile image augmentation framework incorporating 300+ operations from 8 popular libraries
”

Feature Reduction to Classifiers

Published: April 01, 2016

Performance studies of PCA, LDA, and their kernel versions to SVM, ML, KNN, GMM

Compressive Image Recovery

Published: December 14, 2019

Low-cost and high-efficient seismic image recovery and optimal sampling recommendation

Deep Eraser

Published: January 20, 2021

An object-oriented “eraser” for images and videos

Pick-up Drop-off Design

Published: September 14, 2021

Use reinforcement learning to design a route for delivery man

Zero-human-effort Segmentation

Published: October 01, 2022

A fully automatic iterative deep learning framework for cell segmentation on noisy Label

Pixel Translator

Published: April 26, 2025

Convert gray images of border/vein to RGB leaf images using cGAN

Hierarchical Spatial Pattern Analysis on Neuronal Neighborhood

Published: April 26, 2025

A robust method to detect & profile injury-caused alterations to brain tissue at the multi-cellular scale

Large Scale Image Registration

Published: April 26, 2025

Accelerated large-scale image alignment by 10× with uniform keypoint control and multiprocessing

Multiplex Channels Denoising and Deblurring by Wavelet Transform

Published: April 26, 2025

Wavelet analysis for recovering useful information from damages with as noise and blurs

publications

A Simplified Normalization Operation for Perfect Reconstruction from a Modified STFT

Published in IEEE International Conference on Signal Processing (ICSP), 2014

A improved version of short-time-fourier-transform.

Phasetime: Deep Learning Approach to Detect Nuclei in Time Lapse Phase Images

Published in Journal of clinical medicine, 2019

A Mask RCNN approach of nuclei segmentation in time lapse time in nanowells.

Attenuating Random Noise in Seismic Data by a Deep Learning Approach

Published in arXiv preprint, 2019

Attenuate Gaussian noise by residual neural networks.

Swell-noise attenuation: A deep learning approach.

Published in The Leading Edge, 2019

The full manuscript of Swell-noise attenuation by residual neural networks.

Seismic Compressive Sensing by Generative Inpainting Network: Toward An Optimized Acquisition Survey

Published in The Leading Edge, 2019

The full manuscript of compressive image recovery and non-uniform sampling recommendation of my summmer intern project at Anadarko.

Generative Inpainting Network Applications on Seismic Image Compression and Non-Uniform Sampling

Published in Workshop on Neural Information Processing Systems (NIPS): Solving Inverse Problems with Deep Networks, 2019

The preliminary results of compressive image recovery and non-uniform sampling recommendation of my summmer intern project at Anadarko.

Few Is Enough: Task-Augmented Active Meta-Learning for Brain Cell Classification

Published in Medical Image Computing and Computer Assisted Intervention(MICCAI), 2020

An active meta-Learning approach of cell classification use a very few tranining data.

Comprehensive Cell Phenotyping Method for Whole-Brain Tissue Mapping Using Highly Multiplexed Immunofluorescence Imaging

Published in Nature Communications, 2021

Our lab’s complete pipleline for whole brain analysis, including my main thesis topic of nuclei cell segmenetation.

ARIA: Adversarially Robust Image Attribution for Content Provenance

Published in CVPR, 2022

Internship mentoring project at Adobe related to Coalition for Content Provenance and Authenticity (C2PA) initative, setting up industrial standard to address misleading information, lead by Adobe

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

Rebecca Li

Sitemap

Pages

Posts

1. What is K-Means Clustering?

portfolio

publications

talks

teaching