With over a million downloads, used by over 250+ companies, cited in research papers by Yale, Northwestern, the United Nations (UNESCO), and more. This project has been my most successful so far. It holds great significance to me as I learned so much from creating this project mostly about web scraping, and due to its popularity I’ve had tons of great opportunities resulting from this project.
Highlights
- Widespread Adoption
- 250+ companies are using TikTokAPI or a derivative of it
- 1M+ Downloads
- 3K+ GitHub stars
- Cited use by 10+ university-level research papers from: Yale, Northwestern, the United Nations (UNESCO), and more!
- Learned the following
- Web Scraping
- Forging API requests
- Avoiding Headless Browser Detection
- Maintaining An Open-Source Community
- Create Test Cases
- CI & CD
- And So Much More
Why This Was Created
On the surface this python package may seem fairly trivial to recreate as it just extracts information from the TikTok web app. Web scraping is a very common and (usually) quite easy to do for most websites. TikTok on the other hand, implements a ton of features to try to stop people like me from scraping their website or extracting data from them.
This package aims to abstract away these intricate details from end-users of the package. A lot of the difficulties in creating and maintaining this package revolve around the security parameters that TikTok adds to their API endpoints to ensure they’re not being tampered with. Over time, TikTok has patched loopholes allowing easier access to their endpoints and made it a lot more challenging to do so.
Widespread Community Adoption
The support I’ve received from the community is actually huge. It still blows my mind that people are using my code in academic papers from prestigious universities like Yale and Northwestern. With organizations like the United Nations using it for good. Other companies are also doing socially beneficial research with the library like tracking.exposed. It’s absurd that not only are these established academics using my code but actually citing it. Sometimes it can be difficult to see the value you’re providing when working on open-source software beyond just some high-level abstract numbers such as downloads. Seeing these papers using my work to help make a difference is an amazing feeling.
In Academic Papers
As I mentioned above, it’s been pretty cool to see it cited and used in academic papers. I’ve decided to compile a list of the ones that I could find on Google Scholar. If you know of any additional ones let me know!
- The TikTok Self: Music, Signaling, and Identity on Social Media (Yale)
- #TulsaFlop: A Case Study of Algorithmically-Influenced Collective Action on TikTok (Northwestern)
- History under attack: Holocaust denial and distortion on social media (Book, United Nations UNESCO)
- The recontextualisation of Multicultural London English: Stylising the ‘roadman’ (Book, University of Edinburgh)
- From #Dr00gtiktok to #harmreduction: Exploring Substance Use Hashtags on TikTok (Drexel University)
- #Pragmatic or #Clinical: Analyzing TikTok Mental Health Videos (University of Minnesota)
- The Platformization of TikTok: Examining TikTok’s Boundary Resources (The University of Toronto)
- When Kids Mode Isn’t For Kids: Investigating TikTok’s “Under 13 Experience” (University of California, Irvine)
- User Experiences with Abortion Misinformation on TikTok: Encounters, Assessment Strategies, and Response (DePaul University)
- Counting How the Seconds Count: Understanding Algorithm-User Interplay in TikTok via ML-driven Analysis of Video Content (University of Illinois Urbana-Champaign)
Undergraduate Papers
- Undergrad Paper Using Social Media to Predict which TV Shows will be Popular (Worchester Polytechnic Institute)
- Does Tiktok show viewers the content relevant to them? (University of California, Berkeley)
Avoiding Headless Browser Detection
The main thing that was used to generate these security parameters was a headless playwright instance in the background, but playwright has a lot of issues with being fairly easy to detect using javascript. As time passed, it was a cat and mouse game between me versus the engineers at TikTok (who watch the repository closely). With each update of TikTok’s code I had to update my package to become more undetectable each time.
There weren’t many recorded tutorials or guides on how to approach problems like this, so I had to teach myself how to solve the problems I kept running into. Overtime, I learned more and more about this subject and it’s what has allowed me to with relative ease reverse engineer other websites to programmatically access data from them.
This project will always be an important part in my career, it’s what’s allowed me to get to where I am today. For that I’m very grateful for the community that developed around the project that supports me ❤️