Β Mastering Backtracking for LeetCode Success
1192 words ~6 mins

#Computer Science #Algorithms #Leetcode
Goal of Today’s Post When you’re done reading my blogpost, my goal is for you to: To leave with a generic scaffolding to set up Backtracking Problems Have a framework to reason about most Backtracking Problems Worked through 5 Backtracking Examples (Just give me the examples!) Introduction At the time of this writing, I currently find myself at 450+ leetcode problems solved. I tricked myself into liking the grind! It’s bad.

What are Transformers - Understanding the Architecture End-to-End
5828 words ~28 mins

#Machine Learning
Alejandro Armas and Sachin Loechler have been hard at work on a project that involves developing streaming workloads. The project has the goal to process realtime data and support real-time traffic prediction! In order to make sense of the enormous quanity of unstructured video data, we employed Foundational models that perform video tracking, bounding box, depth estimation and segmentation to extract information from video data. Many of these foundational models relied on an artificial Neural Network Architecture called a Transformer.

Authenticating Data for Experimentation Environment
953 words ~5 mins

#Programming #Data Engineering
Introduction In this post, I will explain how I streamlined team decision-making by building a cloud experimentation solution, secured by AWS IAM roles/policies, then optimized dataset network transmissions by 35x In this post, we will explore how to leverage Terraform, a popular Infrastructure as Code (IaC) tool, to automate the setup and management of AWS IAM. We will walk through creating IAM roles, policies, and users, and demonstrating how to attach policies to these entities.

Driving a Data Product - Uncovering Insights and Laying out Assumptions with Exploratory Data Analysis
2142 words ~11 mins

#Programming #Data Engineering
Alejandro Armas and Sachin Loechler have been hard at work on a project that involves developing streaming workloads. The project has the goal to process realtime data and support real-time traffic prediction! However, before I was able to begin with that. I had to demonstrate the viability of this initative. It was critical to communicate and achieve consensus on my understanding of the data with the team. In addition to learning about the data, I testing hypothesis I had and laying out the assumptions I had.

Enabling a Reproducible Data Experimentation Environment
1987 words ~10 mins

#Programming #Data Engineering
Introduction In this post, I will explain how I streamlined team decision-making by building a cloud experimentation solution, secured by AWS IAM roles/policies, then optimized dataset network transmissions by 35x I prioritized building this tool, because a streamlined process was critical for Sachin and I to achieve consensus on datasets and features for a ML model. Exploratory Data Analysis is a process driven very different from software engineering. It involves lots of trial and error, so reproducibility was top of mind.

Getting Started with PyFlink: My Local Development Experience
1823 words ~9 mins

#Programming #Data Engineering
Background A hobby project I am working on, involves developing streaming workloads. We want to process realtime data and support traffic prediction! Often at the start of tool adoption and especially when working in a multi-tool ecosystem, I was finding myself at a familiar roadblock: As the engineer responsible for creating the streaming workloads, I was having a hard time weighing the tradeoffs in what language to use for our data pipeline’s tooling.

Winning 3rd place at MLOPS LLM Hackathon: Question & Answer for MLOps System
773 words ~4 mins

#Programming
This post describes the experience of team RedisCovering LLMs, as we developed a Question & Answer system specialized on MLOps community slack discussions, armed with GPT-3.5 for precise answers and verifiable references to slack threads, guarding against misinformation. 1. Introduction Last weekend, I had the opportunity to participate in a 12-hour hackathon organized by the San Francisco Bay Area MLOps Community. It was my third hackathon experience, and the first one I attended through the MLOps Community.

Unveiling Dimensionality Reduction - A Beginner's Guide to Principal Component Analysis
2139 words ~11 mins

#Probability #Mathematics
Introduction Imagine for a second you were transplanted into Olvera Street in LA. It’s a Tuesday, but today is a little different. Theres a spark in the air. You’re not quite sure what to make of it, but you know that today, something great is going to happen. You walk around aimlessly for awhile, until your mind begins to get distracted by this huge sense of hunger. “Dang – if only I could have some tacos”, you think to yourself.

What is the Difference Between Covariance and Correlation?
738 words ~4 mins

#Probability #Mathematics
Working with data will almost always begin with a data exploration phase. We listen to its heartbeat and ask lots of questions. As we begin this phase, one might ask themselves ‘what are the tools we can leverage?’. What do we do to define a linear measure of a relationship between two random variables? In other words, how do we measure the amount of ‘increasing X increases Y’-ness, or ‘- decreasing Y’-ness and vice versa in a joint probability distribution?

Unlocking the Power of Joint Distributions - How to Analyze Multiple Random Variables
1124 words ~6 mins

#Probability #Mathematics
The concept of joint distribution is useful when studying the outcomes and effects of multiple random variables in statistics. Joint distribution allows generalizing probability theory to the multivariate case. Let me paint a story for you. Joint Distributions Today, the weather is nice. Its a fresh summer morning. You’re out at a restaurant having breakfast with your in-laws and you want to impress. You’re such a nice person, you think to yourself.