• Members of the previous forum can retrieve their temporary password here, (login and check your PM).

AI local model - Dmt specialist local RAG agent

Caminante

Established member
Donator
Joined
Mar 27, 2024
Messages
57
Merits
130
Good day, Mr. Administrator! I'd like to ask you the following question: Lately, I've become very enthusiastic about using local AI models, RAGs, etc. I had the idea of using a local AI model to create an AI agent specializing in DMT chemistry, and to create the RAG documentation stack, I thought it would be possible to download the literature shared on the forum (and/or the old forum). I also thought it could "scan" the entire forum to scrape the text, then analyze it with the local agent, and ask it to summarize, observe, etc. The idea is to "clean up" the "noise" of valuable knowledge, organize it, summarize it, and then share it with the entire forum. Does this sound good to you? Is it possible? I don't want to do anything without your authorization, as is appropriate. And if others are enthusiastic about the idea, even better! Thank you!.
 
AFAIK you can scrape the forum without issues, just make sure your scraper is not too aggressive. There are quite a few bots and scrapers connected at any moment.

Scraping the old forum shouldn't be necessary, as all its threads were migrated to the new forum.

I don't recommend using any kind of LLM for scraping, it's just not the right tool for the job. It will easily lose track, download already downloaded pages, and miss a lot that weren't downloaded. There is already a lot of scraping software, from GUI clients to libraries to use in scripts. I can help you find one that suits your needs if you don't know where to begin. It will be much faster than using an LLM, less resource intensive, and much more reliable.
 
Thanks! I didn't want to start without announcing it on the forum. I actually started all this just a few days ago; I'm a complete beginner, not a programmer, but very enthusiastic! My idea was to do some web scraping with a tool I haven't looked for yet (any help is welcome here and in general), download everything to my PC, and then run a local AI model to analyze, organize, remove noise, summarize, etc., all the accumulated knowledge, organized in a way that's useful and convenient, if that's technically possible. Any help and suggestions are welcome!
 
Thanks! I didn't want to start without announcing it on the forum
Yes, that's always a good idea. I'm not an admin, so if I'm wrong about it being fine to scrape, please @dreamer042 or @The Traveler say so. But I don't think so, based on the fact that there are active and known scrapers that aren't being blocked.

As for a tool, for a non-programmer this is probably a good choice: HTTrack Website Copier - Free Software Offline Browser (GNU GPL) It's relatively old software but scraping is a solved problem anyways, at least for websites that don't try to prevent you from scraping them. It runs on both Linux and Windows.
 
Yes! I was planning to wait for their feedback. Thanks for the tool suggestion! I've been using Linux for many years, so I'll download the Linux version! Thanks for your help!
 
Scraping the public DMT-Nexus pages is never an issue as long as the scraping is not too intense.

With that I mean that not too many scrapers should be used simultaneously, and that you set your own scraper to slowly scrape our pages and not to a setting where it will try to scrape as fast as possible.

All this to keep a sane respect towards our resources.


Kind regards,

The Traveler
 
Thanks, Traveler! Before I do anything, I'll check with you first, so as not to overload the server. I'm inexperienced, so I'll do my research before starting. Thanks!
 
Back
Top Bottom