Anyone who has ever played FIFA video game can easily point one of its most hated features, the commentary. Indeed, its commentary is repetitive, mundane, and easily predictable. Unlike commentary in a real broadcasted soccer game, FIFA video game's commentary is seemingly hardcoded and sounds utterly unnatural. In lieu of the commentary, people often choose to have loud music or mute the game whatsoever.
However, I strongly believe this problem can be solved through the extensive use of Deep Learning techniques.
Step 1: Dataset
Using style transfer, we can simulate real broadcasted soccer game into a FIFA game. For video, the simulated FIFA game can be recorded. For sound, the original commentary can be used. The video and sound can be mashed together in a video that will serve as a training example in our dataset. The dataset will contain a collection of such videos.
Step 2: Training
In games like soccer, the relative position of the ball and players is essential for the commentary. In other words, the context of the commentary depends on what is going on in the field.
We can use ConvNet to find patterns in the field, like the relative position of the ball, the ball holder, and other players, and register what words or sentences were generated from the audio as a target vector. Recurrent Neural Network can be stacked on top of the ConvNet to generate words in sequence for the commentary.
During the training, the words can be generated at random. However, after certain epochs, the aim is that the model will learn to generate words/sentences that match the context of the game.
- How do we keep track of the player who has the ball or players playing in the game? During training, the ball holder will be called by many names. For example, in time 20:00 of first training example, Ronaldo has the ball. In other instances, at the same time, it can be Messi. All in all, at different times, different players have the ball. And, commentators are often found commenting using players name for a specific purpose. Thus, during training, it is essential that the model does not learn a specific name of the player. However, during the FIFA game, it is very essential that the commentary include the player's name.
- Use of style transfer to simulate real broadcasted soccer game into a FIFA game is a hard problem in and of itself.
- In a real game, the commentator sometimes narrates a story that is completely out of context. How do we make model ignore such astray? Or, the model learns itself to ignore such commentaries?
- In a real game, players fight. Spectators run on the field. The camera zooms spectators for a significant time. Players can have emotions. As of now, FIFA does not support such features. How do we make our model ignore such details?
Why solve this problem?
The aim of this project will be to understand the context of the FIFA game, and use natural language to explain what is happening in the field. However, looking at the bigger picture, this project aims to understand the context of the source, and use natural language to explain its content. The first obvious example is an extension of such feature to other video games. Similarly, extension of this project can help create natural language dialogue for cartoons. The animator can animate a video. And, the model can create natural language dialogue for the video. In a similar fashion, it can be used to introduce real-time natural commentary in real games, not just soccer. One might create a silent movie someday, and AI can help create dialogues based on the context of the video. How awesome would that be?
NOTE: This is an independent research. Any pointers, guides, and suggestions would be greatly valued. Additionally, I am actively looking for a mentor to help complete this project. Please forward the words who you think can be of help.