TikTokAPI Python Package thumbnail

TikTokAPI Python Package

May 26, 2023 David Teather View Project

Table of Contents

With over a million downloads, used by over 250+ companies, cited in research papers by Yale, Northwestern, the United Nations (UNESCO), and more. This project has been my most successful so far. It holds great significance to me as I learned so much from creating this project mostly about web scraping, and due to its popularity I’ve had tons of great opportunities resulting from this project.

Highlights

Why This Was Created

On the surface this python package may seem fairly trivial to recreate as it just extracts information from the TikTok web app. Web scraping is a very common and (usually) quite easy to do for most websites. TikTok on the other hand, implements a ton of features to try to stop people like me from scraping their website or extracting data from them.

This package aims to abstract away these intricate details from end-users of the package. A lot of the difficulties in creating and maintaining this package revolve around the security parameters that TikTok adds to their API endpoints to ensure they’re not being tampered with. Over time, TikTok has patched loopholes allowing easier access to their endpoints and made it a lot more challenging to do so.

Widespread Community Adoption

The support I’ve received from the community is actually huge. It still blows my mind that people are using my code in academic papers from prestigious universities like Yale and Northwestern. With organizations like the United Nations using it for good. Other companies are also doing socially beneficial research with the library like tracking.exposed. It’s absurd that not only are these established academics using my code but actually citing it. Sometimes it can be difficult to see the value you’re providing when working on open-source software beyond just some high-level abstract numbers such as downloads. Seeing these papers using my work to help make a difference is an amazing feeling.

Avoiding Headless Browser Detection

The main thing that was used to generate these security parameters was a headless playwright instance in the background, but playwright has a lot of issues with being fairly easy to detect using javascript. As time passed, it was a cat and mouse game between me versus the engineers at TikTok (who watch the repository closely). With each update of TikTok’s code I had to update my package to become more undetectable each time.

There weren’t many recorded tutorials or guides on how to approach problems like this, so I had to teach myself how to solve the problems I kept running into. Overtime, I learned more and more about this subject and it’s what has allowed me to with relative ease reverse engineer other websites to programmatically access data from them.

This project will always be an important part in my career, it’s what’s allowed me to get to where I am today. For that I’m very grateful for the community that developed around the project that supports me ❤️

Back to projects