I'm David Teather 👋

Engineer @ CrowdStrike

Ever evolving as a software engineer

GitHub's Logo

Popular Repos

stars
YouTube's Logo

YouTube Tutorials

views
LinkedIn's Logo

LinkedIn Courses

learners

Recent Blog Posts

Adding an Astro Search Bar

Adding an Astro Search Bar 7/26/2024

I recently reworked my portfolio website to focus heavier on blogs, and this included adding a search bar to search across across my site. In this blog post I'll walk you through how you can add your own! If you're just interested in the code here it is as a template website. Adding Search For search, I wanted a locally powered search engine so I don't have to pay or rely on third party APIs as my site only has a handful of pages and is small enough to handle in-browser search fast. To make our lives easier we'll be using astro-pagefind which uses the pagefind as a search engine. Installing First install the package into your astro project. After that's installed we have to add the integration into our Astro config. Next, let's make a basic search page. This is all you need for the most basic search functionality. Next, we need to generate the search index which pagefind will use to efficiently query your site. You can do this manually but I modified my package.json to do this automatically for me. :::note You might get some typescript warnings since the pagefind might have some unused variables Now if you go to your search page at http://localhost:4321/search you should be able to use the search bar. !Basic Search I wanted to change the component's functionality a little bit, and by following this blog I added: the search query as a url parameter, and focusing the input when the page loads. Next, let's make a new astro component slightly modified from the component that we're already using in our code that implements these features. The changed new code is highlighted. ``astro title="components/AstroSearch.astro" {48-53, 57, 63-70} --- import "@pagefind/default-ui/css/ui.css"; export interface Props { readonly id?: string; readonly className?: string; readonly query?: string; readonly uiOptions?: Record<string, any>; } const { id, className, query, uiOptions = {} } = Astro.props; const bundlePath = ${import.meta.env.BASE_URL}pagefind/; --- <div id={id} class:list={[className, "pagefind-init"]} data-pagefind-ui data-bundle-path={bundlePath} data-query={query} data-ui-options={JSON.stringify(uiOptions)} > </div> <script> // @ts-ignore import { PagefindUI } from "@pagefind/default-ui"; function initPageFind() { const allSelector = "[data-pagefind-ui]"; for (const el of document.querySelectorAll( ${allSelector}.pagefind-init )) { const elSelector = [ ...(el.id ? [#${el.id}] : []), ...[...el.classList.values()].map((c) => .${c}), allSelector, ].join(""); const bundlePath = el.getAttribute("data-bundle-path"); const opts = JSON.parse(el.getAttribute("data-ui-options") ?? "{}"); new PagefindUI({ ...opts, element: elSelector, bundlePath, }); el.classList.remove("pagefind-init"); var query = el.getAttribute("data-query"); // Check if the current URL has any query params const url = new URL(window.location.href); const params = new URLSearchParams(url.search); if (params.has("q")) { query = params.get("q"); } const input = el.querySelector<HTMLInputElement>(input[type="text"]); input?.focus(); if (input) { input.value = query; input.dispatchEvent(new Event("input", { bubbles: true })); // Add Listener to update the URL when the input changes input.addEventListener("input", (e) => { const input = e.target as HTMLInputElement; const url = new URL(window.location.href); const params = new URLSearchParams(url.search); params.set("q", input.value); window.history.replaceState({}, "", ${url.pathname}?${params}); }); } } } document.addEventListener("astro:page-load", initPageFind); if (document.readyState === "loading") { document.addEventListener("DOMContentLoaded", initPageFind); } else { initPageFind(); } </script> ` Then to use our new custom search component in pages/search.astro, we just import our custom component instead of the pre-defined one. Configuring Search Information All of the search information is controlled by pagefind, the documentation on configuring the search information is good, but I'll briefly cover what I found useful. :::note After making changes to how pagefind finds data about your site, you'll need to rebuild the index either manually with the pagefind --site dist/ command, or re-run the modified npm run dev from earlier. Ignoring Pages By default, it will search all pages. For me, this led to duplicate information matches as it would not only find the data for each post. It would also capture the same information on the page listing all of my blogs. To get around this on my page listing the blogs I added data-pagefind-ignore as an html attribute. Like this This prevents pagefind from indexing data within that tag. Showing Images I also wanted images to show up for each post in the search bar, as each of my articles have a thumbnail attached to them. First, set showImages: true By default, pagefind will try to find images on the page, but I found this pretty unreliable. Instead you can manually specify where the images are with an attribute on an img tag. This also does work on astro's <Image> component. :::note If you're using optimized images, you might notice that the images on the search results break after optimizing them. Don't worry! They'll work once deployed, and if you want to verify locally that it still works you can do so with astro preview, however they break while using astro dev` due to the search index being built on the optimized image paths which change. With these changes we get a search with images !Search With Images Styling Pagefind has a section about customizing the styles of the search bar. However, again I'll cover what I did here. By default it adds these css variables. I'm using DaisyUI and I wanted my colors to automatically switch when my theme does, so I ended up settling on the following styling. Then just import your styles into your search component, making sure it's after the default css import. After these changes (and some restyling of the default astro blog template), we end up with !Search with DaisyUI light And it automatically also works on other DaisyUI themes, dark shown. !Search with DaisyUI dark And that's all you have to do to get search working on your site! If you're looking to customize the search bar more definitely make sure to take a look at the pagefind docs yourself as it's fairly configurable. Here's a template website with optimized images and a search bar.

Astro Optimized Images With Markdown

Astro Optimized Images With Markdown 7/26/2024

I recently upgraded my site to a newer version of Astro that supported image optimization and I wanted to use that feature. I'll walk you through it, and it's a fairly quick change to improve the load times and the sizes of images you're serving to end users. If you're just interested in the code here it is as a template website. First, we need to move all of our images that we're serving into src instead of in the public folder. The public folder contents are all directly copied over in the build folder without pre-processing. Anywhere in your code that you have statically defined links to images like After moving the image from public/assets into src/assets we can replace it with the following. By using <Image> Astro will automatically optimize your image for the web at build-time, and that's all you have to do for statically linked images. With Dynamic Images From Blogs This is where it gets a little bit tricker as we can't dynamically import images from blogs, however Astro has a feature to make this possible. In your src/content/config.ts you might have something like this. For us to be able to use the image in an <Image> component, we have to update the schema to be of the image type. Next for each of your blogs update your metadata to point to the image now in the src folder. Then update any references to images in your markdown itself. After that, as we did before, update any references that were using <img> to use <Image>. Then, Astro will automatically optimize all of your images! Here's a template website with optimized images and a search bar.

IMC Prosperity 2

IMC Prosperity 2 5/3/2024

I recently competed in the IMC Prosperity 2 competition and I wanted to write a blog post about my experience as a first time competitor in a trading competition in addition to writing about some of the strategies I used for each round and how overall it turned out. What is IMC Prosperity 2? IMC Prosperity 2 is a competition where you trade virtual assets in a simulated market. There are multiple rounds and each round has a different set of rules and assets to trade. The goal is to make as much money as possible by the end of the competition. The two main types of trading are algorithmic trading and manual trading. In algorithmic trading, you write a program that automatically trades assets based on certain rules. In manual trading, you manually make trades based on the information you have. Round 1 Algorithmic Trading We had two good one was an amethyst which was considered relatively stable at 10K so we just did a strategy to buy below and sell at or above. We also had starfruit which were more unstable and varied often in prices and we just traded based on observed price change with respect to amethysts. Manual Trading The first round of the manual trading was pretty simple. We had to find two values between 900 and 1000 that would maximize our profits. We had an unkown amount of fish that would sell their gear to us at a linear probability from 0% at 900 to 100% at 100 and we were guaranteed to sell the gear at 1000 at the end of the round. The first value you picked was your first bid which was the price you'd pay for any gear that was offered by fish which that willingness to sell or less and the second value was a higher price to capture more profit. Our team calculated the value to be 952 and 978 but then we did some simulations and ran 950 and 980 as our final answers which got us tied for fourth with like 20 other teams but the optimal answer was 952 and 978. At the end of this round we were Overall ~700 Tied for 4th in the manual trading Somewhere in the 700s for the algorithmic trading Round 2 Algorithmic Trading This one was annoying we had to obsrve some conditions of an Orchid to determine the prediction time out of that. Honestly I did not have much time to work on this section so I just had a simple strategy implemented and did not bother understanding how to trade Orchids out. So to maintain a good score I just traded Amethysts and Starfruits again. Manual Trading This one was about finding the right conversion process to get the optimal outcome like converting through a few items to end up with the most seashells so we just wrote a program to iterate through all the possible combinations and find the best one. At the end of this round we were Overall 674, #200 in US Tied for 4th in the manual trading with 10 teams Still ~700 overall for algorithmic Round 3 Algorithmic Trading This one was about trading different bundles. So we had a gift basket made up of 1. Four CHOCOLATE bars 2. Six STRAWBERRIES 3. A single ROSES We could not bundle these items together ourselves or break them apart so we had to just trade the items directly. We ended up using a strategy on each of the items separately, and traded them based on the price jumps in some window period. For the baskets though we estimated the price of the basket based on a weighted sum of the individual items and bought/sell based on that. I mentioned the profit we got from just this round in the Discord and I started getting DMs from people on top 250 to compare strategies so I think we did pretty well in this round. Manual Trading This section was actually really interesting. We were given a list of tiles below. !Treasure Map So the base reward on each tile was 7.5K and the multiplier for each tile is different and the goal was to find the optimal reward when the reward is split between the number of hunters on that tile already. For example the 100x tile's reward to you would be 7.5K * 100 / (1 + 8). So your reward would be 83.3K since it's split between you and the hunters. The reason this is complicated is that for each 1% of the players that pick a given tile the number of hunters on that tile increases by 1. So you have to predict where the other players will go and try to go to a tile that has the least amount of players on it. Some players tried to make spreadsheets to collect responses from other players to try to predict where they would go but we just wrote a program to simulate the game and pick the best tile based on that. Finally the last interesting part is that each search you can do costs money. You can do 3 searches here are their costs. 1. Free 2. 25K 3. 75K There's definitely no right strategy here but here's what we were doing. We were hoping not everyone would act rational and we could hedge our bets a little bit here. We assumed everyone would be playing optimally and then calculated the tiles with the highest EV from that assumption then randomized our picks a little bit. And ended up picking I26 and H29 which were around the 1st optimal and the 6th optimal based on if everyone else played optimally. We thought that people might not play optimally if they assumed other people played optimally they'd pick different tiles and were hedging our bets that maybe people psych'd themselves out of playing optimally and picked the 1st optimal and also a lower optimal tile incase people were trying to pick the optimal one. There's no way to know if this was the right strategy since it relied on other people's decisions. At the end of this round we were Overall #374, #109 in US Unfortunatly we dropped to #390 in the manual trading However we jumped significantly in the algo trading to #492 Round 4 Algorithmic Trading This one was about trading coconut coupons which were analogous to options. We had to predict the price of the coconut at the end of the round and trade based on that. We made a black scholes model to predict the price of the coconut coupons and compared it to the actual prices of the coupons to determine if we should buy or sell. We did end up making a mistake in our model that actually ended up making a good amount of money but only really when the price was going down of the coconut coupon. We implemented a bug free version that was getting less money on the submission as a result of it. Acknowledging that this was a mistake, however we essentially decided to gamble on this round because to have a shot at getting a prize we needed to catch up significantly to the top teams by a few 100Ks and we were not going to do that by playing it safe. The algorithm wasn't losing too much money when it was going up in price because it was being subsidized by our very good result and algorithm from round 3 so we just decided to go for it. Manual Trading This time is was the same as the other manual trade where we had to optimize the price we'd pay for gear from fish. However in this one the first bid would be not changed and would work the same however the second bid the fish knew the total average of the bids and would sell if your bid was higher than the average and if your bid was below it would probablistically sell based on the difference between your bid and the average. We picked 960 and 980. We wanted to get more profit from the first guess because we expected the second guess we'd make less money from because of the average. Oh yeah during this round I also got recognized in the Discord chat from my GitHub profile picture 😭 !Getting Recognized But anyways, at the end of this round we were Overall #364, #102 in US We did rise to #303 in manual We rose to #401 in algorithmic Round 5 I barely participated in this one and it was just giving us access to who had done the trades in the past so you could infer trading side connections and stuff but we didn't end up doing that. Overall Results #381 Overall We ended up getting #103 in the US Conclusion This was my first time competing and learning about algorithmic trading and I had a lot of fun. I did use a lot of knowledge from the previous year's code and writeups to help me out, but I'm most proud that the round that we did the best on and seemed competitive with the top teams was one of the rounds where I ended up writing the code from scratch. I think I learned a lot about how to approach these problems and will likely compete in similar competitions in the future. Hopefully people found this blog post interesting and if you have any questions about my strategy if you're competing in the future feel free to reach out to me on LinkedIn The code is also a mess but if you want to see it you can check it out here

Recent Projects

TikTokAPI Python Package

TikTokAPI Python Package 5/26/2023

With over a million downloads, used by over 250+ companies, cited in research papers by Yale, Northwestern, the United Nations (UNESCO), and more. This project has been my most successful so far. It holds great significance to me as I learned so much from creating this project mostly about web scraping, and due to its popularity I've had tons of great opportunities resulting from this project. Highlights Widespread Adoption 250+ companies are using TikTokAPI or a derivative of it 1M+ Downloads 3K+ GitHub stars Cited use by 10+ university-level research papers from: Yale, Northwestern, the United Nations (UNESCO), and more! Learned the following Web Scraping Forging API requests Avoiding Headless Browser Detection Maintaining An Open-Source Community Create Test Cases CI & CD And So Much More Why This Was Created On the surface this python package may seem fairly trivial to recreate as it just extracts information from the TikTok web app. Web scraping is a very common and (usually) quite easy to do for most websites. TikTok on the other hand, implements a ton of features to try to stop people like me from scraping their website or extracting data from them. This package aims to abstract away these intricate details from end-users of the package. A lot of the difficulties in creating and maintaining this package revolve around the security parameters that TikTok adds to their API endpoints to ensure they're not being tampered with. Over time, TikTok has patched loopholes allowing easier access to their endpoints and made it a lot more challenging to do so. Widespread Community Adoption The support I've received from the community is actually huge. It still blows my mind that people are using my code in academic papers from prestigious universities like Yale and Northwestern. With organizations like the United Nations using it for good. Other companies are also doing socially beneficial research with the library like tracking.exposed. It's absurd that not only are these established academics using my code but actually citing it. Sometimes it can be difficult to see the value you're providing when working on open-source software beyond just some high-level abstract numbers such as downloads. Seeing these papers using my work to help make a difference is an amazing feeling. Avoiding Headless Browser Detection The main thing that was used to generate these security parameters was a headless playwright instance in the background, but playwright has a lot of issues with being fairly easy to detect using javascript. As time passed, it was a cat and mouse game between me versus the engineers at TikTok (who watch the repository closely). With each update of TikTok's code I had to update my package to become more undetectable each time. There weren't many recorded tutorials or guides on how to approach problems like this, so I had to teach myself how to solve the problems I kept running into. Overtime, I learned more and more about this subject and it’s what has allowed me to with relative ease reverse engineer other websites to programmatically access data from them. This project will always be an important part in my career, it’s what’s allowed me to get to where I am today. For that I’m very grateful for the community that developed around the project that supports me ❤️

The Response Times

The Response Times 5/25/2023

My journey into security analysis started with YikYak, a social media app, exposing post GPS locations. I initially only was looking at YikYak to create a python package for interacting with their API but their API was exposing the locations of each post, allowing a potential bad actor to track the movements of users based on their posting activity. As much as I'm dissapointed in the lack of security around protecting user's data, I'm glad I discovered it because it made me start my security blog on The Response Times and help protect user data from malicious actors. Highlights YikYak Featured in Vice & The Verge YikYak were exposing precise GPS coordinates accurate to within 10-15ft to everyone Created a cool little anonymized heatmap YikYak implemented some changes that somewhat improved privacy, then they were bought out LINK.social Had a hugely insecure login flow that allowed anyone to login as any other account The API route took a user id and returned that user's authorization token with no other checks Allowed me to access all users: precise GPS location, phone number, birthday, and if they verified their identity using an ID photo I was given $500 as a bug bounty which was nice :) Luckily the app was only in beta and only had a few hundred users at the time I really enjoy doing this kind of work, although it can be frustrating at times is interesting to me and combines a lot of interests of mine. I'd like to spend more time working on this kind of analysis, but finding targets to analyze is difficult.

Portfolio Website

Portfolio Website 5/24/2023

Well you've already seen it, but I'll go into more detail about the: tech stack, my thought process, and more. The Concept I had a portfolio website before this one, but the home page was a little text heavy and quite cluttered. I wanted something that felt cleaner, I had the following in mind Live statistics, like the number of GitHub stars so I wouldn't have to manually update it A clean and text-light hero section Interactive elements from my old site like hovering over a post for it to grow in size A "chat" popup as a call to action Responsive Tech Stack React.js Astro TypeScript TailwindCSS DaisyUI Cloudflare Workers AWS S3 Live Statistics This was the main feature I wanted to have on this site. Whenever I've had to update my website or my resume it's always annoying to see how out of date the numbers are so I decided to make it live. It uses Cloudflare workers to web scrape these numbers in real-time, which is cached for about an hour. I spent a lot of time laying out the hero section and how to display and animate these statistic cards to appear on screen, but I think the turned out pretty well, although there's definitely room for improvement. Overall Experience Historically, I haven't liked doing web dev that much because I've always found it to be a little too tedious and more of a means to an end than a passion. However, this time around I really enjoyed it, and it was so satisfying doing custom animations and adding other cool functionality to the site. After this refreshing web-dev experience definitely am looking forward to doing more of it in the future, even if I still prefer backend development. If you have any feedback about the site I'd love to hear it! Feel free to reach out to me on LinkedIn, this site is still an on-going project.

My Recent Career

CrowdStrike

CrowdStrike 6/1/2023

This summer at CrowdStrike, I got my hands dirty with some serious coding and problem-solving. As a Software Engineering Intern, I dove into Go, AWS, CI/CD practices, microservices, and Docker. Here’s what I did: Highlights LQL Parser, Lexer, and VM: Tackled the challenging task of developing a parser and lexer for the LogScale Query Language (LQL). My goal was to translate LQL statements into a custom, efficient bytecode, enhancing the speed and efficiency of event manipulation. I also built a Go-based virtual machine (VM) to execute the bytecode, ensuring that LQL queries ran smoothly and reliably in our cloud microservices. Linter Tool for Code Integrity: Designed a linter tool from scratch. The aim was to catch bugs early and ensure the integrity of our codebase before anything went live to clients. This tool became a crucial part of our development cycle, saving countless hours of debugging and fixing post-deployment issues. Custom Testing Framework for LQL: Created a testing framework specifically for LQL, integrating it seamlessly into our CI/CD pipeline. This wasn't just about testing; it was about ensuring reliability and consistency across all code changes, making our deployment process smoother and more robust. Personal Reflections Jumping into the deep end at CrowdStrike was exhilarating. I learned that software engineering is as much about solving problems as it is about writing code. Developing the parser, lexer, and VM for LQL was a brain-teaser that combined theory with practical application. This internship wasn't about doing minor tasks; it was about making a tangible impact on the projects I was assigned to. The satisfaction of seeing my work contribute directly to the product was unmatched. Even if I spent about two or three weeks just living inside of the VS Code debugger to figure out edge cases in LQL and how it works some of which my team members who had written 100s of hours of LQL didn't know. To anyone stepping into a new role or field, my advice is simple: dive in, embrace the challenges, and don't be afraid to get your hands dirty. The skills and insights you'll gain from real-world experiences are invaluable. I’m grateful for the opportunity to work alongside some of the brightest minds in cybersecurity at CrowdStrike, and I'm excited about applying what I've learned in future projects and roles.

LinkedIn Learning

LinkedIn Learning 5/23/2023

In spring 2023, I had the opportunity to work with LinkedIn Learning to create a course on GitHub Codespaces for students! Despite designing the course with school and work, I throughly enjoyed it and I solidified a lot of my skills in content design along the way. Get 24 hours of free access to the short course on my LinkedIn post Highlights Upskilled 1.6K+ learners interested in GitHub Codespaces Designed engaging and practical real-world projects and examples Created a universally applicable cirriculum for all students Conclusion This experience only further solidifed how much I love creating content and teaching others. I'm excited to continue creating content and teaching others in the future!

Collab

Collab 6/1/2022

I joined Collab and began working on adding a second data source to the recently acquired TrendPop's platform. As a somewhat of a content creator myself, it's been really exciting to have worked so closely to the creator economy. Highlights Analyzed the most efficient way to extract data from YouTube Used Golang to write an efficient abstraction for forging YouTube's unofficial API Designed and implemented scalable jobs with Apache Spark to track and discover new entities Extracted and stored over 1 billion total records including: videos, channels, playlists, and comments Automatic reporting of metrics associated with the code sent to Grafana Internship Overview My internship consisted of 3 main parts: creating an engineering proposal plan, implementation, and finally a presentation open to the entire company which included some execs. Engineering Proposal Plan (~4 weeks) The goal of creating this proposal was to align the entire team with what I would be doing over the summer and the tools and methodologies I would use to accomplish it. This part of my internship included Researching and prototyping ways to extract data from YouTube Researching the best technologies to use for the job Analyzing all tradeoffs and risks to my approach The most important thing to research was how to best extract data from YouTube. I settled on using a technique that I call forging API requests, in which you make requests that look identical to what a legitimate client would make to the backend server. Since most websites do use the AJAX approach, this is pretty effective on most websites. This approach has significant tradeoffs compared to a traditional HTML based web scraping approach, if you want to learn more about this approach check out lesson 1 in my everything-web-scraping series. The largest one is the lack of control over changes that the 3rd party makes to their API, I used commit history on youtube-dl to see how frequently the API changed and it seems to be pretty rare where it was acceptable to use this method. Implementation & Productionizing (~6 weeks) Introduced Apache Spark as a new technology used within the platform Intelligent metric reporting for debugging jobs Visualized metrics and created alerts in Grafana Working through dozens of edge cases and improving data parsers While investigating the best tools for the job, it was decided on that I should look into Apache Spark. It fit our use case perfectly. It allows us to easily scale our jobs across multiple threads and in the future if needed across multiple computers/executors if extremely computationally expensive. One important thing was to ensure that there was good visibility onto what these jobs were doing as it's always a challenge to maintain and debug these kinds of programs that are so dependent on third party API responses with countless edge cases. To increase the ease of debugging on all the jobs, I reported metrics around failing API requests, parsing failures, and any postgres errors to telegraf which communicates with an influxdb instance that Grafana pulls data from. Here's a screenshot of one of the dashboards within Grafana I created. !Grafana Dashboard The last thing I'll talk about for this section was the pain of slowly working through dozens of edge cases since the "hidden" YouTube API is not officially documented and returns a lot of different types of structures like compactVideoRenderer vs videoRenderer which is awfully annoying to deal with. Presentation (~1 week) Unfortunately, I can't share the exact slides here. This presentation dove into case studies on how this new YouTube data could help a new potential customer. Next, was how the data could help Collab creator Zhong better understand their audience and how to further optimize their content strategy. To answer all these questions, I spent this entire week relying heavily on my data science skills writing complex SQL queries and running more data-intensive code with python making heavy use of the pandas package to deliver some interesting insights into YouTube. What I Learned How Apache Spark works and why it's great for computationally expensive jobs Scala as a programming language When parsing 3rd-party API responses never assume anything about the structure of the response Even however hard you try to not assume anything you'll still run into weird edge cases at scale Postgres isn't ideal for the amount of data we needed to store Engineering time was designated in a later quarter to switch to a distributed database like Cassandra Grafana is really useful to analyzing jobs and exactly what they're doing at scale especially with exceptions are handled and retry-logic is built-into the jobs to ensure that the job just continues running no matter what I used Grafana in the summer of 2021 with Warner Music Group, but I wasn't usually the one using the metrics to debug the jobs. Overall, I had a great time and enjoyed working so closely with the creator economy.