Real-Time Strategy Documentation Print E-mail

This document describes most of what you need to know for the RTS problem. If there are any questions regarding installation or use of this code, please use the discussion boards on the web site. 

Build Instructions

The first thing you will have to do is build RL-Glue and the competition software. This is done by entering the main directory (rl-competition/) and typing: 

      make all     

What follows is specific to the RTS problem domain. Detailed installation instructions are found in the INSTALL file in domains/realTimeStrategy.  


Testing Your Agent

The following instructions assume that you have read the INSTALL file in domains/realTimeStrategy, have installed any necessary prerequisites, and have properly initialized the RTS problem as per the contents of the INSTALL file. These instruction describe how to build the environment and agents, and most of the commands below assume that the current directory is domains/realTimeStrategy. 

  • To make the RL-Glue environment:
        make rlgenv
  • To make the RL-Glue C++ agent
        make rlgagent 

These will build the binaries in the default mode (MODE=dbg) which compiles with debug symbol information and no optimization. You may prefer to compile with MODE=opt to produce faster, optimized code without debug symbols. Please see the Makefile for more information on how to do this.

To make the Java agent, please see rl-competition/agents/realTimeStrategyAgentJava/README

These will make the executables and place them in bin/ . These executables are two components of the RL-Glue framework.

To run an experiment, you must do two things:

  • Start the RLViz App (This also launches the RTS environment). In trainers/guiTrainerRealTimeStrategy,
        bash run.bash

           or

           Start the console trainer (This also launches the RTS environment). In trainers/consoleTrainerRealTimeStrategyJava:

        bash run.bash
  • Connect the agent, either (Java) in agents/realTimeStrategyAgentJava:
        bash run.bash

          or

          (C++) in domains/realTimeStrategy:

        ./bin/rlagent

Implementation Details

The most relevant RL-Glue C/C++ agent file is:

  apps/rlgagent/src/rlglue_agent.C

The relevant RL-Glue Java agent files are in:

  agents/realTimeStrategyJava

In GUI visualization for the RL Competition:

  • Mineral patches are yellow
  • Terrain is black
  • Visible regions are grey
  • Blue units belong to the agent
  • Red units belong to the opponent
  • Filled circles are marine units and the base (the base is much larger and stationary)
  • Empty circles are workers
To change parameter settings, modify MiniGameParameters() constructor in the domains/realTimeStrategy/libs/minigame/src/MiniGameState.H file, and remake the environment.

For more information on the structure, layout, or behavior of the implementation, please refer to the documentation in the domains/realTimeStrategy/doc/ directory.

Problem Description and Game Mechanics 

The game description relies on a paremeter set that includes the width and height of the game field, maximum hit points of units, speeds of units, costs of units, radii of units, sight and attack ranges, etc.. These parameters will always be fixed from episode to episode in a given phase of the competition, but they may change from training to testing. The logic for the opponent AI (the "bot") may change as well.

Your task is to build a learning algorithm that learns how to play effectively and generally using Reinforcement Learning techniques. This is a single-agent problem: the environment includes a "bot opponent" which employs a fixed policy.

The RTS problem is built on the RL-Glue framework. The environment and communication are implemented. You have to build a learning agent. There are default Java and C++ agents included in this distribution, but you are free to implement an agent in any language supported by RL-Glue: Java, C, C++, or Pyhton. The communication protocol is described in doc/rlg_protocol.txt.

Game dynamics take place simultaneously in simulated (discrete) lock-step motion. At each time step, the bot and learning agent each submit an action. The action is a composition of all individual unit actions (orders). All actions are executed and resolved simultaneously, and then the loop starts over. There is no collision detection, so units are free to roam everywhere inside the boundaries. Each unit has a circular geometry; its size is described by its radius attribute.

Observations describe the world as the agent sees it. An observation is simply some global information (amount of minerals) and a collection of units that are visible from all of the agent's units. Note, in particular, that in general the agent is not given the full view of world state unless its units' sight ranges cover every spot on the field.

Each episode starts with a fixed number of mineral patches randomly scattered on the field, and two workers (one belonging to each player) randomly placed in the field. The first thing these workers must do is build the base, but they can wander around the world before doing so (at great risk!). When building a base, the worker is temporarily unavailable.

There are, in fact, 4 types of units: workers, marines, bases, and mineral patches. Workers can mine minerals, marines are used for greater attack strength, bases are used to train more units, and mineral patches are mined by workers to build funds to train more units. The profiles of each unit (for the training period) are:

 

 Workers
Marine
Bases
 Radius
 4 4 16
 Sight Range
 64 64 96
 Attack Range
 16 48 0
 Attack Value 1 3 0
 Maximum Hit Points 30 50 100
 Armor 0 0 0
 Cost 30 50 
 Training Time 3 5 
 Mineral Capacity
 10 0 

 

Units automatically attack enemies unless they are busy (workers mining, attacking someone else). The target unit in the auto-attack is the one with the lowest hit points in the vacinity. This auto-attack can be overridden by issuing a specific attack action for the unit.

Attacks work like so: if the target unit is within the attack range of an attacking unit, and the attacking unit's attack value is greater than or equal to the target unit's armor, then the hit points of the target unit is decreased by the attack value.

The goal is to take out the opponent's base or to take out all of the opponent's units. The reward for winning is 100 - 15*ts/maxt, where ts is the current time step and maxt is the maximum number of time steps per episode. A tie happens if the episode reaches the maximum number of time steps (10 000), both bases are taken out at the same time, or there are no units left on both sides (simultaneously). In the case of a tie, the agent with the highest score gets 55, lower score 45, unless the scores are equal in which case both get 50. The score is determined by the sum of:

  • half of number of minerals left
  • total cost of units remaining
  • total cost of units destroyed during game

The winner will be the agent with the highest cumulative return.

Good Luck!


 

Login to Message Boards

Separate username & password from team login.





Lost Password?
NOTE: Registration for message boards has been DISABLED because of SPAM. Please e-mail brian@rl-competition.org for an account.