ViewVC logotype

Contents of /nl.nikhef.pdp.dynsched-pbs-plugin/trunk/RELEASE

Parent Directory Parent Directory | Revision Log Revision Log

Revision 2012 - (show annotations) (download)
Fri Oct 8 13:11:04 2010 UTC (11 years, 11 months ago) by templon
File size: 8195 byte(s)
first checkin, copied from old tree

1 This file contains release notes and a change history for the
2 lcg-info-dynamic-scheduler information provider.
3 It also contains release notes and a change history for the
4 PBS/Torque/Maui backend commands. The notes are
5 most recent first.
7 Release 2.2.1
8 The dynamic scheduler was changed to cease printing the GlueCEAccessControlBaseRule.
9 2.2.0 did not work since GIP considers all changes to multivalued attributes (like
10 ACBRs) to be significant.
12 Release 2.2.0
13 The dynamic scheduler was changed in order to deal with the DENY
14 tags being used in the short-term solution (June 2007) for job priorities.
15 The dynamic scheduler does the following with ACBRs placed on VOViews:
16 - it discards any ACBR that does not begin with either "VO:" or "VOMS:"
17 - if there is more than one ACBR left in the list, it only uses the last one
18 in the list, and prints a warning message to standard error and to syslog
19 - it allows multiple DENY tags
20 - there is no checking on the consistency between the ACBR and DENY tags in a view.
22 Release 2.1.0
24 lrms.py was changed in order to support caching of search results.
25 Most of the time spent in lcg-info-dynamic-scheduler was due to
26 queries like "find all jobs from group 'lhcb', in state 'waiting',
27 for queue 'qlong'. Queries like this are now cached for future use,
28 and can also be supplied *before* use, like they now are for
29 lcg-info-dynamic-scheduler. That program now generates slices
30 of the job list for the various combinations queue/group/state
31 that will be needed while running the program.
33 There were previously two different 'return a list of matching job'
34 functions, with different interfaces. These now have a unified
35 interface so that result caching can be supported. This does break
36 backwards compatibility for lrms.py.
38 Release 2.0.0
40 Rather massive changes in parsing logic, to be able to handle VOViews
41 with VOMS FQANs.
43 VOMS FQANs are handled both by the input routines, which know what
44 to do with them when reading the static LDIF file, as well as
45 the group mapping logic, that knows how to associate FQANs
46 with unix groups. To this end, the vomap construct in the
47 lcg-info-dynamic-scheduler config file now supports lines like
49 lhcbsgm:/VO=lhcb/GROUP=/lhcb/ROLE=lcgadmin
51 in addition to the original lines like
53 atlgrid : atlas
55 which would map group 'atlgrid' to "VO : atlas".
57 The parsing of the GlueCEUniqueID and GlueVOView blocks has
58 also changed rather drastically, so that previous problems with
59 numbers, dashes, etc in queue names and hostnames are no longer a
60 problem. Instead of parsing the GlueCEUniqueID field to get the
61 queue name, the program now reads GlueCEName and uses that for
62 the queue name.
64 Also, the file vomaxjobs-generic (documentation) was added,
65 and the rest of the documentation and example files was
66 substantially upated for the new release.
68 Otherwise no changes since 1.6.3.
70 For people using the test suite: the versions of the test output
71 included in 2.0.0 will cause tests of older versions to fail. This
72 is unavoidable since the old parsing logic was based on the order
73 in which blocks appeared in the ldif file, while the new version
74 uses python 'dicts' which have an unpredictable order when
75 iterated. To make the order predictable (for purposes of test
76 harness), the keys are sorted before the program starts to print.
77 The older versions do not sort the output before printing, hence
78 tests of the old versions with the new files will fail.
80 Release 1.6.3
82 Fix for GGUS bug 10155 -- had to do with YAIM adding unnecessary lines like
83 alice:alice
84 to the [vomap] stanza. The program did not expect to get these lines so it
85 of course did something rather silly with them, resulting in the behavior
86 reported in the GGUS bug.
88 Release 1.6.1
90 Bug fix for lcg-info-dynamic-scheduler; fix regexp matching
91 GlueCEUniqueID. the regexp in 1.6.0 missed
92 - CEs with a "-" character in the hostname
93 - queue names with underscores, uppercase letters, and numbers
95 There are examples of each of these classes on the production system,
96 so this upgrade is critical.
98 Release 1.6.0
100 - changes to parsing of static LDIF file to pick up gLite CEs with "blah"
101 instead of "jobmanager". Note this is largely untested!!
102 - added test suite to prevent bug regression
103 - some changes to build system (three targets increases aggravation)
104 - some changes to pbsServer classes to assist in debugging.
105 - some changes to vomaxjobs-maui to assist in debugging/testing;
106 also fixed various unreported bugs discovered during testing.
107 - Change mapping of pbs/torque job states in pbs classes; up til now
108 was either queued (Q) or running (any other states). Now we have:
110 From the qstat (torque 2.0.0p4) man page:
112 C - Job is completed after having run (mapped to 'done')
113 E - Job is exiting after having run. (mapped to 'running')
114 H - Job is held. (mapped to 'pending')
115 Q - job is queued, eligible to run or routed. (mapped to 'queued')
116 R - job is running. (mapped to 'running')
117 T - job is being moved to new location. (mapped to 'pending')
118 W - job is waiting for its execution time (mapped to 'queued')
120 Release 1.5.2:
122 pbs package: Fix to vomaxjobs-maui to deal with cases where there is
123 extra 'warning' output near the top of the command output from diagnose -g.
125 generic package: fix bug with logging; undefined variable caused fatal program
126 exit while trying to print warning message.
128 Release 1.5.1:
130 fix dependency problems with RPMs.
132 Release 1.5.0
134 * add RELEASE (this file) to docs dir in generic package RPM
136 * Minor change to build system to make tag events in ChangeLogs
137 easier to read.
139 * lcg-info-dynamic-scheduler:
141 - It is possible (e.g. by dramatically reducing MAXPROC config in Maui) for
142 a VO to have more running jobs in the LRMS than allowed by MAXPROC.
143 In this case a negative value was reported for FreeSlots. Fixed.
145 - implemented logging to syslog
147 * vomaxjobs-maui:
149 - adapt to handle MAXPROC specifications like MAXPROC=soft,hard
150 The code reports the 'hard' limit, since this is relevant when the
151 system is not full, and this is when it's needed. Maui uses the
152 soft limit on a full system, but in this case the info provider will
153 drop FreeSlots to zero as soon as jobs remain in the queued state
154 instead of executing immediately.
156 Release 1.4.3
158 * lcg-info-dynamic-scheduler:
160 - fix for Savannah bug 14946: overflow of conversion of response time
161 values from float (internal) to int (output representation). Now prints the
162 magic value of 2146060842 as an upper limit.
164 Release 1.4.2
166 * pbsServer.py:
168 - included Steve Traylen's patch to deal with jobs for which the
169 uid/gid printed by 'qstat' is not listed in the in running machine's
170 pw DB. This can happen when the CE is not the same physical
171 machine as the actual LRMS server.
174 Estimated Response Time Info Providers (v 1.4.1)
175 ------------------------------------------------
177 This information provider is new in LCG 2.7.0 and is
178 contained in two RPMs, lcg-info-dynamic-scheduler-generic
179 and lcg-info-dynamic-scheduler-pbs. Sites using torque/pbs
180 as an LRMS and Maui as a scheduler are fully supported by
181 this configuration; those using other schedulers and/or
182 LRMS systems will need to provide the appropriate back-end
183 plugins.
185 For sites meeting the following criteria, the system should
186 work out of the box with no modifications whatsoever:
188 LRMS == torque
189 scheduler == maui
190 vo names == unix group names of that vo's pool accounts
192 Documentation on what to do if this is not the case can be
193 found in the file
195 lcg-info-dynamic-scheduler.txt
197 in the doc directory
199 /opt/lcg/share/doc/lcg-info-dynamic-scheduler
201 There is also documentation in this directory indicating
202 the requirements on the backend commands you will need to
203 provide in the case that you are using a different
204 scheduler or LRMS. Tim Bell at CERN can help for people
205 using LSF.


Name Value
svn:keywords Id URL

ViewVC Help
Powered by ViewVC 1.1.28