In recent days, MiniMax's abab-video-1 model has captured the attention of audiences both domestically and internationally. Not only have domestic netizens been enjoying it, but international users have also given it high praise.

Overall, the experience with the abab-video-1 model is highly user-friendly. Simply input a sentence and a smooth video is generated. The video's motion is stable, and the characters' movements appear very natural.

A blogger named "Ryan Morrison" on X stated that the abab-video-1 model is the most natural he has seen for hand movements.

MiniMax1.jpg

Another blogger used the abab-video-1 model to create a "Star Wars" video. The color and aesthetics of the footage already have a "Hollywood movie" feel. The camera movements are smooth, the action is dynamic, and the video does not fall apart.

MiniMax3.jpg

Some bloggers have even conducted comparisons, stating that the abab-video-1 model surpasses Kelin in aesthetics and video performance. Is this really the case? AIbase tested with several complex prompts to compare:

Prompt 1: Show a modern city gradually returning to its past state. Skyscrapers slowly disappear, replaced by ancient buildings; cars turn into carriages, and people on the streets revert to past attire. The entire city seems to travel through a time machine, allowing viewers to experience the features of different eras.

It can be seen that the abab-video-1 model understands the prompts quite well, and the transition between ancient and modern buildings is relatively natural. The performance is quite impressive.

For the same prompt, Kelin's generated effect is as follows:

Kelin seems to have some difficulty understanding complex and lengthy prompts. From the video, it only understood the first half and did not generate the transformation from modern to ancient buildings. If using image-to-video, Kelin should perform better.

Prompt 2: Five people sitting at a bar, showing their emotional fluctuations through color changes. The scene focuses on portraits, with the background color changing according to their expressions. Bright and warm during joy, deep and cold during sadness, and vivid and intense during anger. The flow of colors shows subtle emotional changes, allowing the audience to directly feel the characters' inner world.

AIbase set several complex details in Prompt 2, such as "emotional changes" and "background color following emotional changes," and also increased the number of people in the scene, making the challenge more difficult.

So far, the abab-video-1 model has not been stumped. The emotional transitions of the characters are natural, and their faces are intact. The background color also changes accordingly with the emotions, showing strong expressiveness.

For the same prompt, Kelin generated one less character, and their faces are distorted, and the background color did not change according to the prompt.

Perhaps the prompt was too long and complex, let's try a simpler one~

Prompt 3: A couple holding hands, walking under a starry sky, with the Milky Way slowly moving in the background

The video generated by the abab-video-1 model is flawless in terms of motion, composition, and aesthetics. The couple's walking movements are also very natural.

Kelin's generated starry sky is fine, the only minor flaw is that the couple at the bottom is too small in the frame, easily overlooked, and their appearance is slightly distorted but still acceptable.

From the above tests, at least in text-to-video, the abab-video-1 model does surpass Kelin. Currently, Kelin's advantage lies in its richer functions, including image-to-video and key frames, which perform much better than text-to-video. Since the abab-video-1 model only supports text-to-video, image-to-video cannot be compared.

In summary, the main advantages of the abab-video-1 model are as follows:

Aesthetic Level: The videos generated by the abab-video-1 model have significantly improved in visual beauty, with more harmonious color schemes and more exquisite compositions.

Camera Movement: Compared to Kelin, the abab-video-1 model excels in camera techniques, presenting smoother and more natural scene transitions and camera movements.

Expression Rendering: The abab-video-1 model creates more detailed and rich character expressions, better conveying emotions and stories.

Text Presentation: In scenes requiring text display, the abab-video-1 model performs more出色, with more aesthetically pleasing text layout and design, making it easier to read.

Video Coherence: The videos generated by the abab-video-1 model are more coherent in terms of plot development and scene transitions, with stronger storytelling capabilities.

Creative Space: The abab-video-1 model provides creators with a broader space for creative expression, better realizing unique visual effects and narrative methods.

Interested individuals can try it themselves: https://top.aibase.com/tool/hailuowenwen