Job 60560

Job ID	`60560`
submission	`13284`
user	András Kalapos 🇭🇺
user label	real-v1.0-3092-363
challenge	`aido5-LF-real-validation`
step	eval2
status	aborted
up to date	yes
evaluator	`33`
date started	2020-12-11 22:26:21+00:00
date completed	2020-12-11 22:33:25+00:00
duration	0:07:04
message	Operator message: '' [...] Operator message: '' Logs: DEBUG:commons:version: 6.1.7 * INFO:typing:version: 6.1.8 DEBUG:aido_schemas:aido-protocols version 6.0.33 path /usr/local/lib/python3.8/dist-packages INFO:nodes:version 6.1.1 path /usr/local/lib/python3.8/dist-packages pyparsing 2.4.6 2020-12-11 22:29:03.403854: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2020-12-11 22:29:06.557963: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64: 2020-12-11 22:29:06.557996: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303) 2020-12-11 22:29:06.558026: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist DEBUG:ipce:version 6.0.36 path /usr/local/lib/python3.8/dist-packages INFO:nodes_wrapper:checking implementation INFO:nodes_wrapper:checking implementation OK DEBUG:nodes_wrapper:run_loop fin: /fifos/ego0-in fout: fifo:/fifos/ego0-out INFO:nodes_wrapper:Fifo /fifos/ego0-out created. I will block until a reader appears. INFO:nodes_wrapper:Fifo reader appeared for /fifos/ego0-out. INFO:nodes_wrapper:Node RLlibAgent starting reading fi_desc: /fifos/ego0-in fo_desc: fifo:/fifos/ego0-out INFO:nodes_wrapper:4dd9d5a4eb8d:RLlibAgent: init() WARNING:config.config:Found paths with seed 3092: WARNING:config.config:0: ./models/PPO-RLlib-AIDO5_FrameSkip3_NewMaps_StartAngle30_AIDOWrapper_DomainRand_3092/Dec10_00-31-47/config_dump_3092.yml WARNING:config.config:Found checkpoints in ./models/PPO-RLlib-AIDO5_FrameSkip3_NewMaps_StartAngle30_AIDOWrapper_DomainRand_3092/Dec10_00-31-47: WARNING:config.config:0: ./models/PPO-RLlib-AIDO5_FrameSkip3_NewMaps_StartAngle30_AIDOWrapper_DomainRand_3092/Dec10_00-31-47/PPO_0_2020-12-10_00-31-48u8cipgyq/checkpoint_363/checkpoint-363 WARNING:config.config:Config loaded from ./models/PPO-RLlib-AIDO5_FrameSkip3_NewMaps_StartAngle30_AIDOWrapper_DomainRand_3092/Dec10_00-31-47/config_dump_3092.yml WARNING:config.config:Model checkpoint loaded from ./models/PPO-RLlib-AIDO5_FrameSkip3_NewMaps_StartAngle30_AIDOWrapper_DomainRand_3092/Dec10_00-31-47/PPO_0_2020-12-10_00-31-48u8cipgyq/checkpoint_363/checkpoint-363 WARNING:config.config:Updating default config values by: env_config: mode: inference WARNING:config.config:Env_config.mode is 'inference', some hyperparameters will be overwritten by: rllib_config: num_workers: 0 num_gpus: 0 callbacks: {} ray_init_config: num_cpus: 1 memory: 2097152000 object_store_memory: 209715200 redis_max_memory: 209715200 local_mode: true INFO:nodes_wrapper:4dd9d5a4eb8d:RLlibAgent: === Wrappers =================================== INFO:nodes_wrapper:4dd9d5a4eb8d:RLlibAgent: Observation wrappers <ClipImageWrapper<DummyDuckietownGymLikeEnv instance>> <ResizeWrapper<ClipImageWrapper<DummyDuckietownGymLikeEnv instance>>> <ObservationBufferWrapper<ResizeWrapper<ClipImageWrapper<DummyDuckietownGymLikeEnv instance>>>> <NormalizeWrapper<ObservationBufferWrapper<ResizeWrapper<ClipImageWrapper<DummyDuckietownGymLikeEnv instance>>>>> INFO:nodes_wrapper:4dd9d5a4eb8d:RLlibAgent: Action wrappers <Heading2WheelVelsWrapper<NormalizeWrapper<ObservationBufferWrapper<ResizeWrapper<ClipImageWrapper<DummyDuckietownGymLikeEnv instance>>>>>> INFO:nodes_wrapper:4dd9d5a4eb8d:RLlibAgent: Reward wrappers INFO:nodes_wrapper:4dd9d5a4eb8d:RLlibAgent: === Config =================================== INFO:nodes_wrapper:4dd9d5a4eb8d:RLlibAgent: seed: 3092 experiment_name: PPO-RLlib-AIDO5_FrameSkip3_NewMaps_StartAngle30_AIDOWrapper_DomainRand algo: PPO algo_config_files: PPO: config/algo/ppo.yml general: config/algo/general.yml env_config: mode: inference episode_max_steps: 500 resized_input_shape: (84, 84) crop_image_top: true top_crop_divider: 3 grayscale_image: false frame_stacking: true frame_stacking_depth: 3 motion_blur: false action_type: heading reward_function: posangle distortion: true accepted_start_angle_deg: 30 simulation_framerate: 30 frame_skip: 3 action_delay_ratio: 0.0 training_map: multimap_aido5 domain_rand: true dynamics_rand: true camera_rand: true frame_repeating: 0.0 spawn_obstacles: false obstacles: duckie: density: 0.5 static: true duckiebot: density: 0 static: false spawn_forward_obstacle: false aido_wrapper: true wandb: project: duckietown-rllib experiment_name: PPO-RLlib-AIDO5_FrameSkip3_NewMaps_StartAngle30_AIDOWrapper_DomainRand seed: 3092 ray_init_config: num_cpus: 1 webui_host: 127.0.0.1 memory: 2097152000 object_store_memory: 209715200 redis_max_memory: 209715200 local_mode: true restore_seed: 3091 restore_experiment_idx: 0 restore_checkpoint_idx: 0 debug_hparams: rllib_config: num_workers: 1 num_gpus: 0 ray_init_config: num_cpus: 1 memory: 2097152000 object_store_memory: 209715200 redis_max_memory: 209715200 local_mode: true inference_hparams: rllib_config: num_workers: 0 num_gpus: 0 callbacks: {} ray_init_config: num_cpus: 1 memory: 2097152000 object_store_memory: 209715200 redis_max_memory: 209715200 local_mode: true timesteps_total: 4000000.0 rllib_config: num_workers: 0 sample_batch_size: 265 num_gpus: 0 train_batch_size: 4096 gamma: 0.99 lr: 5.0e-05 monitor: false evaluation_interval: 25 evaluation_num_episodes: 2 evaluation_config: monitor: false explore: false seed: 1234 lambda: 0.95 sgd_minibatch_size: 128 vf_loss_coeff: 0.5 entropy_coeff: 0.0 clip_param: 0.2 vf_clip_param: 0.2 grad_clip: 0.5 env: Duckietown callbacks: {} env_config: mode: inference episode_max_steps: 500 resized_input_shape: (84, 84) crop_image_top: true top_crop_divider: 3 grayscale_image: false frame_stacking: true frame_stacking_depth: 3 motion_blur: false action_type: heading reward_function: posangle distortion: true accepted_start_angle_deg: 30 simulation_framerate: 30 frame_skip: 3 action_delay_ratio: 0.0 training_map: multimap_aido5 domain_rand: true dynamics_rand: true camera_rand: true frame_repeating: 0.0 spawn_obstacles: false obstacles: duckie: density: 0.5 static: true duckiebot: density: 0 static: false spawn_forward_obstacle: false aido_wrapper: true wandb: project: duckietown-rllib experiment_name: PPO-RLlib-AIDO5_FrameSkip3_NewMaps_StartAngle30_AIDOWrapper_DomainRand seed: 3092 2020-12-11 22:29:07,019 INFO trainer.py:428 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution 2020-12-11 22:29:07,036 ERROR syncer.py:39 -- Log sync requires rsync to be installed. 2020-12-11 22:29:07,037 WARNING deprecation.py:29 -- DeprecationWarning: `sample_batch_size` has been deprecated. Use `rollout_fragment_length` instead. This will raise an error in the future! 2020-12-11 22:29:07,037 INFO trainer.py:583 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. 2020-12-11 22:29:07.050348: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-12-11 22:29:07.061066: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2599935000 Hz 2020-12-11 22:29:07.061705: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7972150 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-12-11 22:29:07.061754: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-12-11 22:29:12,040 INFO trainable.py:217 -- Getting current IP. 2020-12-11 22:29:12,041 WARNING util.py:37 -- Install gputil for GPU system monitoring. INFO:nodes_wrapper:4dd9d5a4eb8d:RLlibAgent: Restoring checkpoint from: ./models/PPO-RLlib-AIDO5_FrameSkip3_NewMaps_StartAngle30_AIDOWrapper_DomainRand_3092/Dec10_00-31-47/PPO_0_2020-12-10_00-31-48u8cipgyq/checkpoint_363/checkpoint-363 2020-12-11 22:29:12,106 INFO trainable.py:217 -- Getting current IP. 2020-12-11 22:29:12,106 INFO trainable.py:422 -- Restored on 172.17.0.2 from checkpoint: ./models/PPO-RLlib-AIDO5_FrameSkip3_NewMaps_StartAngle30_AIDOWrapper_DomainRand_3092/Dec10_00-31-47/PPO_0_2020-12-10_00-31-48u8cipgyq/checkpoint_363/checkpoint-363 2020-12-11 22:29:12,106 INFO trainable.py:430 -- Current state after restoring: {'_iteration': 363, '_timesteps_total': 1539120, '_time_total': 110224.9016327858, '_episodes_total': 8614} INFO:nodes_wrapper:4dd9d5a4eb8d:RLlibAgent: Starting episode "episode". ERROR:nodes_wrapper:Error in node RLlibAgent: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 355, in loop handle_message_node(parsed, receiver0, context0) File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 531, in handle_message_node call_if_fun_exists(agent, expect_fn, data=ob, context=context, timing=timing) File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/utils.py", line 21, in call_if_fun_exists f(*kwargs) File "solution.py", line 80, in on_received_get_commands pwm_left, pwm_right = self.compute_action(self.current_image) File "solution.py", line 73, in compute_action action = self.model.predict(observation) File "/submission/model.py", line 63, in predict action = self.model.compute_action(observation, explore=False) File "/usr/local/lib/python3.8/dist-packages/ray/rllib/agents/trainer.py", line 781, in compute_action result = self.get_policy(policy_id).compute_single_action( File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/policy.py", line 150, in compute_single_action [action], state_out, info = self.compute_actions( File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/tf_policy.py", line 268, in compute_actions return builder.get(fetches) File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/tf_run_builder.py", line 42, in get self._executed = run_timeline( File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/tf_run_builder.py", line 89, in run_timeline fetches = sess.run(ops, feed_dict=feed_dict) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 957, in run result = self._run(None, fetches, feed_dict, options_ptr, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1180, in _run results = self._do_run(handle, final_targets, final_fetches, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1358, in _do_run return self._do_call(_run_fn, feeds, fetches, targets, options, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1365, in _do_call return fn(args) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1349, in _run_fn return self._call_tf_sessionrun(options, feed_dict, fetch_list, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1441, in _call_tf_sessionrun return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, File "/usr/local/lib/python3.8/dist-packages/ray/worker.py", line 881, in sigterm_handler sys.exit(signal.SIGTERM) SystemExit: Signals.SIGTERM The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 243, in run_loop loop(node_name, fi, fo, node, protocol, tin, tout, config=config, fi_desc=fin, fo_desc=fout) File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 378, in loop raise InternalProblem(msg) from e # XXX zuper_nodes.structures.InternalProblem: Exception while handling a message on topic "get_commands". \| Traceback (most recent call last): \| File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 355, in loop \| handle_message_node(parsed, receiver0, context0) \| File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 531, in handle_message_node \| call_if_fun_exists(agent, expect_fn, data=ob, context=context, timing=timing) \| File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/utils.py", line 21, in call_if_fun_exists \| f(*kwargs) \| File "solution.py", line 80, in on_received_get_commands \| pwm_left, pwm_right = self.compute_action(self.current_image) \| File "solution.py", line 73, in compute_action \| action = self.model.predict(observation) \| File "/submission/model.py", line 63, in predict \| action = self.model.compute_action(observation, explore=False) \| File "/usr/local/lib/python3.8/dist-packages/ray/rllib/agents/trainer.py", line 781, in compute_action \| result = self.get_policy(policy_id).compute_single_action( \| File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/policy.py", line 150, in compute_single_action \| [action], state_out, info = self.compute_actions( \| File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/tf_policy.py", line 268, in compute_actions \| return builder.get(fetches) \| File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/tf_run_builder.py", line 42, in get \| self._executed = run_timeline( \| File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/tf_run_builder.py", line 89, in run_timeline \| fetches = sess.run(ops, feed_dict=feed_dict) \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 957, in run \| result = self._run(None, fetches, feed_dict, options_ptr, \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1180, in _run \| results = self._do_run(handle, final_targets, final_fetches, \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1358, in _do_run \| return self._do_call(_run_fn, feeds, fetches, targets, options, \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1365, in _do_call \| return fn(args) \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1349, in _run_fn \| return self._call_tf_sessionrun(options, feed_dict, fetch_list, \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1441, in _call_tf_sessionrun \| return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, \| File "/usr/local/lib/python3.8/dist-packages/ray/worker.py", line 881, in sigterm_handler \| sys.exit(signal.SIGTERM) \| SystemExit: Signals.SIGTERM \| Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 355, in loop handle_message_node(parsed, receiver0, context0) File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 531, in handle_message_node call_if_fun_exists(agent, expect_fn, data=ob, context=context, timing=timing) File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/utils.py", line 21, in call_if_fun_exists f(*kwargs) File "solution.py", line 80, in on_received_get_commands pwm_left, pwm_right = self.compute_action(self.current_image) File "solution.py", line 73, in compute_action action = self.model.predict(observation) File "/submission/model.py", line 63, in predict action = self.model.compute_action(observation, explore=False) File "/usr/local/lib/python3.8/dist-packages/ray/rllib/agents/trainer.py", line 781, in compute_action result = self.get_policy(policy_id).compute_single_action( File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/policy.py", line 150, in compute_single_action [action], state_out, info = self.compute_actions( File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/tf_policy.py", line 268, in compute_actions return builder.get(fetches) File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/tf_run_builder.py", line 42, in get self._executed = run_timeline( File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/tf_run_builder.py", line 89, in run_timeline fetches = sess.run(ops, feed_dict=feed_dict) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 957, in run result = self._run(None, fetches, feed_dict, options_ptr, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1180, in _run results = self._do_run(handle, final_targets, final_fetches, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1358, in _do_run return self._do_call(_run_fn, feeds, fetches, targets, options, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1365, in _do_call return fn(args) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1349, in _run_fn return self._call_tf_sessionrun(options, feed_dict, fetch_list, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1441, in _call_tf_sessionrun return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, File "/usr/local/lib/python3.8/dist-packages/ray/worker.py", line 881, in sigterm_handler sys.exit(signal.SIGTERM) SystemExit: Signals.SIGTERM The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 243, in run_loop loop(node_name, fi, fo, node, protocol, tin, tout, config=config, fi_desc=fin, fo_desc=fout) File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 378, in loop raise InternalProblem(msg) from e # XXX zuper_nodes.structures.InternalProblem: Exception while handling a message on topic "get_commands". \| Traceback (most recent call last): \| File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 355, in loop \| handle_message_node(parsed, receiver0, context0) \| File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 531, in handle_message_node \| call_if_fun_exists(agent, expect_fn, data=ob, context=context, timing=timing) \| File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/utils.py", line 21, in call_if_fun_exists \| f(*kwargs) \| File "solution.py", line 80, in on_received_get_commands \| pwm_left, pwm_right = self.compute_action(self.current_image) \| File "solution.py", line 73, in compute_action \| action = self.model.predict(observation) \| File "/submission/model.py", line 63, in predict \| action = self.model.compute_action(observation, explore=False) \| File "/usr/local/lib/python3.8/dist-packages/ray/rllib/agents/trainer.py", line 781, in compute_action \| result = self.get_policy(policy_id).compute_single_action( \| File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/policy.py", line 150, in compute_single_action \| [action], state_out, info = self.compute_actions( \| File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/tf_policy.py", line 268, in compute_actions \| return builder.get(fetches) \| File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/tf_run_builder.py", line 42, in get \| self._executed = run_timeline( \| File "/usr/local/lib/python3.8/dist-packages/ray/rllib/utils/tf_run_builder.py", line 89, in run_timeline \| fetches = sess.run(ops, feed_dict=feed_dict) \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 957, in run \| result = self._run(None, fetches, feed_dict, options_ptr, \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1180, in _run \| results = self._do_run(handle, final_targets, final_fetches, \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1358, in _do_run \| return self._do_call(_run_fn, feeds, fetches, targets, options, \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1365, in _do_call \| return fn(args) \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1349, in _run_fn \| return self._call_tf_sessionrun(options, feed_dict, fetch_list, \| File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1441, in _call_tf_sessionrun \| return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, \| File "/usr/local/lib/python3.8/dist-packages/ray/worker.py", line 881, in sigterm_handler \| sys.exit(signal.SIGTERM) \| SystemExit: Signals.SIGTERM \| The above exception was the direct cause of the following exception: Traceback (most recent call last): File "solution.py", line 127, in <module> main() File "solution.py", line 123, in main wrap_direct(node=node, protocol=protocol) File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/interface.py", line 24, in wrap_direct run_loop(node, protocol, args) File "/usr/local/lib/python3.8/dist-packages/zuper_nodes_wrapper/wrapper.py", line 251, in run_loop raise Exception(msg) from e Exception: Error in node RLlibAgent
	Artefacts hidden. If you are the author, please login using the top-right link or use the dashboard.

Highlights

Artifacts

The artifacts are hidden.

Container logs

The logs are hidden.