![]() |
![]() |
I've been doing some webscraping. I wanted to be able to run this in parallel, so I needed to add some proxies to avoid hitting the same site with too many requests from the same IP. I found that I needed some logic for webscraping, but also a managed proxy pool (as proxies may go bad), and a managed cache. I decided to implement this, then, as a gRPC service hosted on my local network.
I implemented various monitoring and logging to help keep this healthy as I schedule things to run every day.
With lots of design decisions, I wrote a devlog, which is represented here. The code remains private for security.
Change db to write-then-read with cache before returning response. Later will have a different worker do this.
2024LinkAfter a break, added some extra documentation and tweaked some existing commands.
2024LinkCreate backend worker. Played more with logging, trying fluent-bit. Switched from LevelsDB to sqlite to fix parallel access.
2024Link