Commit Graph

438 Commits

Author SHA1 Message Date
Alexandre Beslic e1213384bc Merge pull request #1578 from aluzzardi/rescheduling
[experimental] Simple container rescheduling on node failure
2016-01-12 15:00:27 -08:00
Victor Vieux 14bf4e08b3 add -experimental to enable rescheduling
Signed-off-by: Victor Vieux <vieux@docker.com>
2016-01-12 01:35:39 -08:00
Victor Vieux 31ad0e047f update godeps
Signed-off-by: Victor Vieux <vieux@docker.com>
2016-01-12 00:38:09 -08:00
Victor Vieux fc1e7bbca2 use docker/docker/pkg/discovery
Signed-off-by: Victor Vieux <vieux@docker.com>
2016-01-12 00:38:06 -08:00
Victor Vieux a2018c177c improve eventHandlers locking
Signed-off-by: Victor Vieux <vieux@docker.com>
2016-01-11 17:23:48 -08:00
Dong Chen 8f384b1d40 Address review comments.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2016-01-11 16:08:51 -08:00
Victor Vieux 78008f4d4a add doc
fix tests and keep swarm id
remove duplicate on node reconnect
explicit failure

Signed-off-by: Victor Vieux <vieux@docker.com>
2016-01-11 15:59:44 -08:00
Andrea Luzzardi 13f60212f5 Add support for container rescheduling on node failure.
Add rescheduling integration tests.

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2016-01-11 15:59:44 -08:00
Andrea Luzzardi 56941d02a8 cluster: Support multiple event handlers.
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2016-01-11 15:59:44 -08:00
Dong Chen cf664141b6 Scheduler prefers nodes without connection failures.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2016-01-11 11:42:58 -08:00
Xian Chaobo 1fef59f738 fresh image when receive commit event
Signed-off-by: Xian Chaobo <xianchaobo@huawei.com>
2016-01-08 17:25:30 +08:00
Alexandre Beslic 8b173fd382 Merge pull request #1569 from dongluochen/nodeManagement
Improve node management.
2016-01-07 16:14:36 -08:00
Dong Chen 7e266f18ed Name constants.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2016-01-07 15:55:12 -08:00
Xian Chaobo 3aa302d706 Merge pull request #1587 from vieux/do_not_save_image_aff
do not save image affinity on reschedule
2016-01-07 09:42:16 +08:00
Dongluo Chen b4a6ad2e56 Merge pull request #1585 from jimenez/klaus-jimenez-offer-refuse
Klaus jimenez offer refuse
2016-01-06 13:20:02 -08:00
Isabel Jimenez 5a529d4c4a Adding help for new flag offer_refuse_seconds and renaming
Signed-off-by: Isabel Jimenez <contact@isabeljimenez.com>
2016-01-06 15:50:30 -05:00
Dong Chen 58a0e1719d Update failureCount scenario and test cases.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2016-01-06 10:33:51 -08:00
Dong Chen 9a1584d508 Update integration test. Reduce pending node validation sleep interval. Each pending node has its own validation interval according to failure count. So reducing sleep interval is not increasing validation frequency for unreachable nodes.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2016-01-05 15:56:55 -08:00
Dong Chen 52a7616d99 Add integration test for state machine.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2016-01-05 14:59:30 -08:00
Victor Vieux 2449a352ef add unit test
Signed-off-by: Victor Vieux <vieux@docker.com>
2016-01-05 10:31:47 -08:00
Victor Vieux 5daaecdaa1 do not save image affinity on reschedule
Signed-off-by: Victor Vieux <vieux@docker.com>
2016-01-05 10:29:45 -08:00
Klaus Ma cf78e799fd address review comments
Signed-off-by: Klaus Ma <klaus.ma@outlook.com>
2016-01-05 13:18:51 -05:00
Klaus Ma b68537cc20 correct code style & build error
Signed-off-by: Klaus Ma <klaus.ma@outlook.com>
2016-01-05 12:47:42 -05:00
Klaus Ma a23ce43337 Add MESOS_OFFER_REFUSE_SECONDS environment configuration
Signed-off-by: Klaus Ma <klaus.ma@outlook.com>
2016-01-05 12:47:42 -05:00
Victor Vieux 97f3767618 fix soft affinity reschedule
Signed-off-by: Victor Vieux <vieux@docker.com>
2016-01-05 04:58:36 -08:00
Dong Chen 995866d76c Improve node management.
1. Introduce pending state. Pending nodes need validation before moving to healthy state. Resolve issues of duplicate ID and dead node drop issues.
2. Expose error and last update time in docker info.
3. Use connect success/failure to drive state transition between healthy and unhealthy.

Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2015-12-30 13:25:43 -08:00
Victor Vieux a2380a6c71 update godeps
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-12-22 00:20:04 -08:00
Victor Vieux be0fce961f update code
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-12-22 00:20:04 -08:00
Isabel Jimenez de0e67f571 Merge pull request #1554 from ezrasilvera/mesosFixLock
Change the scheduler lock in Mesos cluster
2015-12-18 10:20:29 -08:00
Ezra Silvera 219f7192d6 Change the scheduler lock in Mesos cluster
Signed-off-by: Ezra Silvera <ezra@il.ibm.com>
2015-12-17 18:20:57 +02:00
Dong Chen 02553d0727 Cover connection failure error reported by dockerclient and by proxy cases.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2015-12-15 19:20:29 -08:00
Dong Chen 9bc6c35321 Use engine connection error to fail engine fast.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2015-12-15 19:13:03 -08:00
Dong Chen ec3b00c484 Reorganize engine failure detection procedure. Change engine option 'RefreshRetry' to 'FailureRetry'.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2015-12-15 19:13:03 -08:00
Dong Chen 4d24256c19 Use failureCount as a secondary health indicator.
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2015-12-15 19:13:03 -08:00
Victor Vieux cdd42a5c6b display all the containers that are part of a global network on inspect
update godeps

Signed-off-by: Victor Vieux <victorvieux@gmail.com>
2015-12-15 17:48:35 -08:00
Victor Vieux ed987b8d85 Merge pull request #1542 from jimenez/slave_to_agent
Name changing slave to agent
2015-12-14 13:57:31 -08:00
Isabel Jimenez 18cccc521c renaming files + change on tests
Signed-off-by: Isabel Jimenez <contact@isabeljimenez.com>
2015-12-14 16:20:38 -05:00
Victor Vieux 81bf5bc067 Merge pull request #1538 from vitan/patch-1
Typo
2015-12-14 13:18:26 -08:00
Isabel Jimenez 60c15834da changing slave to agent
Signed-off-by: Isabel Jimenez <contact@isabeljimenez.com>
2015-12-14 14:46:56 -05:00
Zhou Weitao 72db0bbc04 Typo
Signed-off-by: Weitao Zhou <wtzhou@dataman-inc.com>
2015-12-14 09:55:10 +08:00
Daniel Hiltgen dde577d154 Add token pass-thru for Authconfig
This augments the CreateContainer call to detect the AuthConfig header
and use any supplied auth for pull operations.  This will allow pulling
of protected image on to specific node during the create operation.

CLI usage example using username/password:

    # Calculate the header
    REPO_USER=yourusername
    read -s PASSWORD
    HEADER=$(echo "{\"username\":\"${REPO_USER}\",\"password\":\"${PASSWORD}\"}"|base64 -w 0 )
    unset PASSWORD
    echo HEADER=$HEADER

    # Then add the following to your ~/.docker/config.json
    "HttpHeaders": {
        "X-Registry-Auth": "<HEADER string from above>"
    }

    # Now run a private image against swarm:
    docker run --rm -it yourprivateimage:latest

CLI usage example using registry tokens: (Required engine 1.10 with new auth token support)

    REPO=yourrepo/yourimage
    REPO_USER=yourusername
    read -s PASSWORD
    AUTH_URL=https://auth.docker.io/token
    TOKEN=$(curl -s -u "${REPO_USER}:${PASSWORD}" "${AUTH_URL}?scope=repository:${REPO}:pull&service=registry.docker.io" |
        jq -r ".token")
    HEADER=$(echo "{\"registrytoken\":\"${TOKEN}\"}"|base64 -w 0 )
    echo HEADER=$HEADER

    # Update the docker config as above, but the token will expire quickly...

Signed-off-by: Daniel Hiltgen <daniel.hiltgen@docker.com>
2015-12-11 18:36:55 -08:00
Victor Vieux 67a4d559db Merge pull request #1449 from jimenez/mesos_user_abnormal_error
Improving error output for bad swarm mesos user
2015-12-07 13:43:34 -08:00
Alexandre Beslic f21efa4337 Increase default TTL and heartbeat value
Increases the default ttl and heartbeat value for discovery.
Because the node will still be listed for a long period on
`docker info`, there is now a Status to know if a node is
in the healthy or unhealthy state.

Signed-off-by: Alexandre Beslic <abronan@docker.com>
2015-12-04 17:11:33 -08:00
Victor Vieux de6383c4dd Merge pull request #1448 from jimenez/timeout_default
Changing offers timeout default to prevent other frameworks starvation
2015-11-30 14:35:09 -08:00
Victor Vieux b7ca0e7844 Merge pull request #1450 from jimenez/glog_enable
Enabling glog for mesos
2015-11-30 13:40:06 -08:00
Victor Vieux 24fc1b6909 Merge pull request #1451 from aluzzardi/parallel-affinity-fix
Set labels for pending containers
2015-11-25 15:19:58 -08:00
Alexandre Beslic e82752cace Merge pull request #1363 from dongluochen/refreshConfiguration
add engine options for refresh interval
2015-11-25 14:30:16 -08:00
Andrea Luzzardi 9310a385af Set labels for pending containers.
Fixes docker/compose#2447

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2015-11-25 06:56:57 -08:00
Isabel Jimenez 185a46481a Enabling glog for mesos
Signed-off-by: Isabel Jimenez <contact@isabeljimenez.com>
2015-11-25 04:40:03 -05:00
Isabel Jimenez e71bda76f8 Improving error output for bad swarm mesos user
Signed-off-by: Isabel Jimenez <contact@isabeljimenez.com>
2015-11-25 04:24:08 -05:00
Isabel Jimenez 484edd33cd Changing offers timeout default to prevent other frameworks starvation
Signed-off-by: Isabel Jimenez <contact@isabeljimenez.com>
2015-11-25 04:01:30 -05:00
Dong Chen a150a0d521 Add cli test for engine refresh options
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2015-11-18 13:45:39 -08:00
Xian Chaobo bea12ab8ab add support do with image digest
Signed-off-by: Xian Chaobo <xianchaobo@huawei.com>
2015-11-11 12:11:08 +08:00
Victor Vieux 3b6d9b6820 monitor events just after the info
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-11-02 17:04:01 -08:00
Victor Vieux 3f29299afd refresh volumes after creating a container
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-11-02 16:31:57 -08:00
Victor Vieux 0fa9b97f4e refresh images after a rmi
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-11-02 16:20:24 -08:00
Dong Chen 51d92d4b69 fix time duration in EngineOpts
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2015-11-02 16:13:50 -08:00
Victor Vieux d2c5446ea0 Merge pull request #1340 from jimmyxian/volume-driver
Move VolumeDriver to HostConfig
2015-10-28 15:28:53 -07:00
Dong Chen c9f3471dba add engine options for refresh interval
Signed-off-by: Dong Chen <dongluo.chen@docker.com>
2015-10-28 12:56:48 -07:00
Xian Chaobo 588c29c3cc move VolumeDriver to HostConfig
Signed-off-by: Xian Chaobo <xianchaobo@huawei.com>
2015-10-28 10:58:24 +08:00
Daniel Nephin e001980b5c Add filter by image name support to /images/json
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2015-10-27 11:00:37 -04:00
Alexandre Beslic da1f854462 Merge pull request #1310 from pwnall/image_affinity
Swarm filters support in image building
2015-10-23 14:54:33 -07:00
Alexandre Beslic a7a82bd1ae Merge pull request #1330 from vieux/fix_mesos_timeout_issue
fix issue with timeouts in mesos
2015-10-23 14:13:14 -07:00
Victor Costan e32b3211ae Swarm filters support in image building.
When building an image (POST /build), swarm will extract filters from
buildargs. This is similar to how container creation (POST
/containers/create) extracts filters from environment variables.

Signed-off-by: Victor Costan <costan@gmail.com>
2015-10-23 14:24:42 -04:00
Victor Vieux 21d6fc5378 fix panic when createContainer returns nil,nil
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-10-18 21:22:59 -07:00
Roman Iuvshin 40a22e5a13 Fix log info message
Signed-off-by: Roman Iuvshin <riuvshin@codenvy.com>
2015-10-23 17:22:53 +03:00
Victor Vieux 10d232fe66 fix issue with timeouts in mesos
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-10-18 14:39:16 -07:00
Victor Vieux a2a8596238 improve error message in mesos
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-10-18 10:41:02 -07:00
Alexandre Beslic 975eaa9e73 Merge pull request #1320 from dnephin/support_filter_networks
Support filtering networks by id or name
2015-10-21 17:45:56 -07:00
Daniel Nephin a7550e9e70 Support filtering networks by id or name.
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2015-10-21 14:23:26 -04:00
Alexandre Beslic 93e78ce641 Merge pull request #1314 from aluzzardi/fix-nullptr-pendingcontainer
Fix nullptr panic in pending containers.
2015-10-20 12:14:06 -07:00
Andrea Luzzardi 0399a3c60b Fix nullptr panic in pending containers.
Fixes #1289

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2015-10-19 13:38:06 -07:00
Victor Vieux f9807f561c fix golint
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-10-17 01:32:44 -07:00
Victor Vieux 4e1ae773e2 improve docker network ls and rm
Signed-off-by: Victor Vieux <victorvieux@gmail.com>
2015-10-19 15:42:56 -07:00
Victor Vieux e9c486b046 refresh networks on whole cluster after create and rm
Signed-off-by: Victor Vieux <victorvieux@gmail.com>
2015-10-19 15:42:56 -07:00
Victor Vieux 384c29163a Merge pull request #1299 from dnephin/use_parse_repo_tags
Use ParseRepositoryTag() from engine
2015-10-15 16:43:11 -07:00
Victor Vieux bef2892cee Merge pull request #1271 from jimmyxian/fix-reschedule-with-soft-affinity
Do not try retry with soft-image-affinity when have node constraint
2015-10-15 13:52:29 -07:00
Daniel Nephin 910fec887d Use ParseRepositoryTags from engine.
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2015-10-15 12:30:07 -04:00
Victor Vieux 6d6927d1de fix merge issue
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-10-13 01:24:10 -07:00
Alexandre Beslic 1e30ce215f Merge pull request #1262 from vieux/libnetwork
add 'docker network' support
2015-10-13 11:09:27 -07:00
Xian Chaobo 3fc52aa81d Merge pull request #1276 from aluzzardi/strategy-rankandsort
scheduler: Return a list of candidates rather than a single node.
2015-10-13 09:31:16 +08:00
Jia Mi 660299f749 Engine should refresh the container on container rename event
Signed-off-by: Jia Mi <winters.mi@gmail.com>
2015-10-10 15:17:48 +08:00
Andrea Luzzardi b2b32d979d scheduler now returns the list of ranked nodes rather than the top node.
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2015-10-09 15:32:37 -07:00
Victor Vieux 267d7e6701 Merge pull request #1261 from aluzzardi/parallel-scheduling
Parallel scheduling
2015-10-09 12:57:42 -07:00
Andrea Luzzardi 7c0539c650 cluster: Fix name setting of pending containers.
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2015-10-09 12:54:57 -07:00
Andrea Luzzardi 24394612f5 cluster: Don't lock the scheduler when removing a container.
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2015-10-09 12:54:56 -07:00
Andrea Luzzardi 91279c8256 cluster: Check name uniqueness among pending containers.
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2015-10-09 12:54:56 -07:00
Andrea Luzzardi c64ae5168a Parallel scheduling support for Swarm driver.
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2015-10-09 12:54:53 -07:00
Victor Vieux 3a79038d48 Merge pull request #1268 from aluzzardi/refresh-loop-cleanup
engine cleanup: Don't mess with the global random.
2015-10-09 12:52:39 -07:00
Andrea Luzzardi cb2ceea702 engine: Added a concurrent safe refresh delayer.
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2015-10-09 11:32:03 -07:00
Victor Vieux 78ecf8497c Add network rm
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-10-08 22:36:13 -07:00
Victor Vieux b007cae8b2 Add docker network create
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-10-08 22:35:07 -07:00
Victor Vieux 8559fb0fc6 remove cluster.Network(IDOrName)
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-10-08 22:35:07 -07:00
Victor Vieux 12c2d46dd5 prepend engine name on network name
Signed-off-by: Victor Vieux <victorvieux@gmail.com>
2015-10-08 22:35:07 -07:00
Victor Vieux e634df03a7 add 'docker network ls' support
add 'docker network inspect' suport

Signed-off-by: Victor Vieux <vieux@docker.com>
2015-10-08 22:35:07 -07:00
Xian Chaobo 315ddfeb4d do not try retry with soft-image-affinity when have node constraint
Signed-off-by: Xian Chaobo <xianchaobo@huawei.com>
2015-10-08 05:06:39 -04:00
Andrea Luzzardi f1782fed90 engine cleanup: Don't mess with the global random.
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2015-10-07 17:27:41 -07:00
Victor Vieux f5925f5a1c Fix container matching
Signed-off-by: Victor Vieux <vieux@docker.com>
2015-10-07 14:18:35 -07:00
Andrea Luzzardi 13483451da engine: More robust refresh loop.
- Random heartbeat (between 30 and 60 seconds).
- Requires 3 failures before marking a node as dead.

Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
2015-10-06 19:39:39 -07:00
Daniel Nephin 8abf7d32e9 Support filtering images by labels
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2015-10-02 15:45:52 -04:00