Coded it from the depthsGuide to launching open AI models from the deep GitHubIn the development of AI, a trend has emerged in which decentralization and open-source code allow us to go beyond popular commercial solutions. Local LLMs let you work with data privately and flexibly customize the system to suit your needs

Froklog

2026-05-22 09:17:58

# The bottom was hacked

Guide to launching open AI models from deep GitHub

In AI development, a trend has emerged where decentralization and open source allow breaking beyond popular commercial solutions. Local LLMs enable private data processing, flexible system customization for specific tasks, and independent control over the environment. At the same time, deploying such models requires understanding basic tools—from repositories and model weights to cloud environments and technical specifications.

In this new material, ForkLog will explain how to start exploring autonomous AI models without costs, which resources to use for beginners, and what OS solution developers offer.

First Introduction

For developers of open AI models, there are two main platforms—GitHub and Hugging Face. The first is traditionally used for publishing source code, documentation, and setup scripts; the second has become a global hub for model weights, datasets, and ready-made ML solutions. Hundreds of thousands of trained neural networks are published on Hugging Face, from tiny language models for smartphones, alternative media content generators, to specialized algorithms for scientists and enthusiasts.

Community activity metrics help choose the necessary model. On GitHub, these are represented by the number of stars, update frequency (commits), and issue resolution speed.

It’s also important to verify the origin of the product and the authenticity of the repository. Popular OS builds often become bait for cybercriminals distributing malicious code disguised as well-known AI tools.

The next step in exploring local AI models is to test their functionality in practice. For users without powerful hardware, there are free and semi-free cloud platforms

The most popular solution is Google Colab—a cloud environment providing access to graphics processors (GPU) directly from the browser. The free subscription allows working on a system with an Nvidia Tesla T4 accelerator for an average of two to four hours depending on load. Alternatives include Kaggle Notebooks and Hugging Face Spaces. The latter allows interaction with models via ready-made web interfaces like Gradio or Streamlit.

When working with federated solutions, legal aspects should also be considered. Many popular projects are available under classic licenses like MIT or Apache 2.0, which permit their use—including commercially—with minimal restrictions.

However, there are also specific approaches. Meta distributes its flagship models under the Llama 3.1 Community License, which requires special permission if the service’s monthly audience exceeds 700 million users.

Strict copyleft licenses like GNU General Public License are also encountered, requiring open-sourcing all derivative products.

My Personal ChatGPT Analog

Among the vast number of general-purpose autonomous LLMs (analogues of ChatGPT or Gemini), independent ratings based on blind testing and performance metrics like Open LLM Leaderboard and Chatbot Arena help select the needed model.

Open LLM dashboard. Source: llm-stats. The gold standard in the segment is the family of models Llama by Meta and Qwen by Alibaba. These models handle long contexts well, manage multi-step queries, and are suitable for tasks like VYB coding and programming. Thanks to the open framework Ollama, their installation is reduced to a single command.

During testing conducted for this material, the qwen3.5:2b model was successfully launched on a laptop without a discrete GPU, based on an Core i7 with 8 GB RAM and SSD, while closing heavy applications like messengers and browsers.

Source: Ollama. “2b” means 2 billion parameters. The higher the value, the more complex connections the neural network can grasp. For example, a 2b model learns basic grammar and simple commands, while 122b memorizes facts from quantum physics, nuances of legal documents, and can plan tasks ten steps ahead.

Each parameter takes up physical space on the disk and, importantly, in RAM. The 2b model used about 4-5 GB of RAM and was the maximum that could run on such a machine. Yet, for a simple greeting like “hello!”, the model took almost three minutes to generate a response.

Screenshot: ForkLog. Approximate model size gradation:

0.5b-2b. Fast, can run on old laptops and smartphones. Ideal for simple tasks (command routing, basic summarization, auto-completion of short code lines). Prone to hallucinations on complex queries;
3b-4b. Balance of speed and quality. Good for mobile devices, smart home, and automation tasks. For example, a chatbot can be asked to turn off the lights, turn on the air conditioner, or lift a barrier;
7b-9b. Require about 6–8 GB of free RAM. Powerful models with context understanding and deep logic, suitable for programming and working with large texts.

In a recent Web3 VYB coding study, Vladimir Sliper found that models like qwen2.5-coder:7b, qwen3:8b, llama3.2:3b, deepseek-r1:8b are suitable for a MacBook Air with 16 GB RAM. More powerful models require investments in a high-end PC with top-tier GPUs or deployment on rented servers.

Private Data Processing, 3D Printing, and User Protection

Interaction options with open AI models depend on the user’s skill level and hardware. Some projects are packaged into convenient installers (.EXE files) or mobile apps that work “out of the box.” Others are abandoned GitHub repositories where installation turns into a multi-hour struggle with outdated library conflicts.

Today, AI models are used far beyond text generation. Even a superficial ecosystem analysis reveals dozens of specialized tools for specific tasks

Video and 3D work:

CogVideoX. Open model from Zhipu AI for video generation from text descriptions. Creates realistic short clips, has open weights, and can be deployed in environments like Jupyter or Colab if sufficient video memory is available;
DepthCrafter. Tool for extracting depth-of-field information from videos. Useful for VFX specialists and 3D modeling. Creates high-precision depth maps for each frame of a dynamic scene;
TRELLIS (Morfx 3D). Advanced system for generating 3D assets. Allows creating high-quality 3D models from images or text prompts, optimized for use in game engines.

Turning a train photo into an object for processing and 3D printing using Morfx 3D web version. Screenshot: ForkLog.Sound and Recognition:

CosyVoice. Multilingual speech synthesis model supporting voice cloning. Generates realistic audio with intonations and emotional coloring of the original speaker;
Whisper-WebGPU. OpenAI’s speech recognition model implementation rewritten for direct browser operation using WebGPU API. This means audio transcription occurs locally, ensuring full privacy without transmitting audio files to third-party servers;
BirdNET-Analyzer. Neural network from Cornell University for bird species identification by their song. Unlike the popular Merlin Bird ID app, which relies heavily on cloud processing for some functions, BirdNET-Analyzer provides full control over the analysis process locally and can be used for processing gigabytes of field recordings.

Source: BirdNET.Programming and User Protection:

Screenshot-to-Code. Utility for converting a screenshot of a webpage or mobile app into clean HTML, Tailwind, or React code. Although often working with paid APIs (Claude, GPT-4), its architecture allows connecting open multimodal models;
MinerU/Magic-PDF. Project for precise extraction of structured data from PDF documents. Recognizes text, mathematical formulas, and tables, converting complex layouts into Markdown format;
Fawkes. Makes invisible-to-the-eye changes in images, confusing facial recognition systems from identifying individuals. Runs locally on PC via a .EXE file and can be used for social media avatars;
Nightshade. “Poisoning” image pixels to confuse AI training algorithms if done without permission. For example, a “dog” query might produce an image of a cat.

Source: US Library of Congress, portrait of Donald Trump before using Fawkes.
After processing by Fawkes algorithms. Screenshot: ForkLog.

Fighting Libraries and First Success

After installing AI models with understandable UI/UX, it was necessary to see how easily a heavy repository could be deployed in the cloud, for free.

FLUX.1 from startup Black Forest Labs is one of the leading image generation models, competing with corporate Midjourney and Nano Banana. With the right hardware, software can run autonomously offline and bypass censorship.

The test used the lightest free version—FLUX.1 Schnell. To facilitate interaction with open solutions, developers create target frameworks like Ollama. Popular image generation interfaces include ComfyUI and Forge.

During attempts to install the Forge implementation — cagliostro-forge-colab — a whole Google Colab GPU session was spent. The problem was a classic beginner mistake—version mismatch of Python, cloud environment, and the model itself. After four hours of VYB coding, success with Gemini 3 Flash was not achieved.

Eventually, the framework installation was abandoned, and deployment of FLUX.1 was attempted in the next free session on another day.

In practice, free Google Colab is more convenient on weekends: during this time, the platform often provides longer access.

The model occupied about 34 GB of cloud SSD disk space. However, all processes related to installation ultimately used about 86 GB.

Resources used by Google Colab cloud machine. Screenshot: ForkLog.
Initially, FLUX.1 Schnell ran out of video memory on Nvidia Tesla T4. The unadapted configuration hit GPU limits until a series of simple code experiments with Gemini 3 Flash helped make adjustments, using staged memory loading and clearing. As a result, about 3 GB of the available 16 GB VRAM was used during generation.

Screenshot: ForkLog.
Creating one image took about seven minutes. Considering it’s a free version of an open model, the result was pleasantly surprising.

Generated image using FLUX.1 Schnell. Source: ForkLog.
Trying multiple times to generate an image of Marilyn Manson in Victorian style with FLUX.1 Schnell probably didn’t recognize the reference to a specific person and produced only a generalized visual template.

Generated image of a performer based on the prompt “draw Marilyn Manson in Victorian style” with FLUX.1 Schnell. Source: ForkLog.## Complex and Incredible

Open neural networks have long been used not only for text and image generation but also for more niche and unusual tasks. A vivid example of unconventional AI application is the GameNGen model, capable of recreating the gameplay of the classic shooter DOOM in real time.

Source: GameNGen/Github. GameNGen doesn’t simulate the game in the usual sense but generates video sequentially: the model predicts what the next frame should look like after user actions (e.g., movement or shooting). Because of this, enemies, objects, and scene changes are not “calculated” by the engine but visually reproduced as the most probable outcome.

Among autonomous systems, the Voyager project stands out—a neural agent for Minecraft. It explores the game world, gathers resources, and continuously self-trains.

The scientific community also actively adapts open AI for their needs, for example, using algorithms to decode history. Researchers from Tel Aviv and Munich universities trained the Akkademia model to directly translate ancient Akkadian cuneiform into English. It allows processing thousands of damaged clay tablets, accelerating archaeologists’ work by dozens of times.

No less interesting is the MinD-Vis project. This system analyzes functional MRI data and attempts to reconstruct images the subject observes during scanning. It generates an interpretation of what the person sees based on brain activity patterns.

Such initiatives prove that artificial intelligence has become a universal tool for understanding and modeling reality. The shift from closed corporate APIs to open source forms a completely new paradigm of technological development. Today, any researcher, developer, or enthusiast can deploy infrastructure that a few years ago would require multi-million-dollar investments in server farms.

The ecosystem’s development is inevitably accompanied by improved user experience: complex scripts are replaced by intuitive interfaces and automated deployment environments. Tools like Ollama and Forge demonstrate that privacy, censorship resistance, and high performance can coexist harmoniously in one software solution. The future of the AI industry today largely depends on how strong, scalable, and independent the open ecosystem remains.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
TradfiTradingChallenge
267.35K Popularity
#
PlatinumCardCreatorExclusive
79.05K Popularity
#
DailyPolymarketHotspot
1.03M Popularity
#
GateSquarePizzaDay
607.93K Popularity
#
SpaceXOfficiallyFilesforIPO
554.34K Popularity

Pinned

Sitemap

From the bottom, naвайбкодили— ForkLog: cryptocurrencies, AI, singularity, the future

First Introduction

My Personal ChatGPT Analog

Private Data Processing, 3D Printing, and User Protection

Fighting Libraries and First Success

Trending Topics

TradfiTradingChallenge

PlatinumCardCreatorExclusive

DailyPolymarketHotspot

GateSquarePizzaDay

SpaceXOfficiallyFilesforIPO

Pinned