/[pdpsoft]/nl.nikhef.pdp.dynsched/trunk/RELEASE
ViewVC logotype

Contents of /nl.nikhef.pdp.dynsched/trunk/RELEASE

Parent Directory Parent Directory | Revision Log Revision Log


Revision 2300 - (show annotations) (download)
Mon May 23 14:55:38 2011 UTC (10 years, 7 months ago) by templon
File size: 8756 byte(s)
update for release 2.4.0

1 This file contains release notes and a change history for the
2 lcg-info-dynamic-scheduler information provider.
3
4 Release 2.4.0
5
6 * change name of RPM to dynsched-generic
7 * fix for <http://savannah.cern.ch/bugs/?79362>
8 - change install path for python modules to system default
9 - remove path append of nonstandard python lib dir
10 * generic lrms python lib moved to own package
11
12 Release 2.3.4
13 Make it possible to build RPM via Makefile (ETICS changed).
14
15 Release 2.3.2 and 2.3.3
16 ETICS compatibility
17
18 Release 2.3.1
19 Fixes for savannah bugs 25031, 25867, 27171, 27172, 38195.
20
21 Release 2.3.0
22 Change dynamic scheduler so that it prints out JobSlots and FreeCPU info in CE views.
23
24 Release 2.2.2
25 lcg-info-dynamic-scheduler no longer prints ACBRs (fixed bug in 2.2.1? unsure)
26
27 Release 2.2.1
28 The dynamic scheduler was changed to cease printing the GlueCEAccessControlBaseRule.
29 2.2.0 did not work since GIP considers all changes to multivalued attributes (like
30 ACBRs) to be significant.
31
32 Release 2.2.0
33 The dynamic scheduler was changed in order to deal with the DENY
34 tags being used in the short-term solution (June 2007) for job priorities.
35 The dynamic scheduler does the following with ACBRs placed on VOViews:
36 - it discards any ACBR that does not begin with either "VO:" or "VOMS:"
37 - if there is more than one ACBR left in the list, it only uses the last one
38 in the list, and prints a warning message to standard error and to syslog
39 - it allows multiple DENY tags
40 - there is no checking on the consistency between the ACBR and DENY tags in a view.
41
42 Release 2.1.0
43
44 lrms.py was changed in order to support caching of search results.
45 Most of the time spent in lcg-info-dynamic-scheduler was due to
46 queries like "find all jobs from group 'lhcb', in state 'waiting',
47 for queue 'qlong'. Queries like this are now cached for future use,
48 and can also be supplied *before* use, like they now are for
49 lcg-info-dynamic-scheduler. That program now generates slices
50 of the job list for the various combinations queue/group/state
51 that will be needed while running the program.
52
53 There were previously two different 'return a list of matching job'
54 functions, with different interfaces. These now have a unified
55 interface so that result caching can be supported. This does break
56 backwards compatibility for lrms.py.
57
58 Release 2.0.0
59
60 Rather massive changes in parsing logic, to be able to handle VOViews
61 with VOMS FQANs.
62
63 VOMS FQANs are handled both by the input routines, which know what
64 to do with them when reading the static LDIF file, as well as
65 the group mapping logic, that knows how to associate FQANs
66 with unix groups. To this end, the vomap construct in the
67 lcg-info-dynamic-scheduler config file now supports lines like
68
69 lhcbsgm:/VO=lhcb/GROUP=/lhcb/ROLE=lcgadmin
70
71 in addition to the original lines like
72
73 atlgrid : atlas
74
75 which would map group 'atlgrid' to "VO : atlas".
76
77 The parsing of the GlueCEUniqueID and GlueVOView blocks has
78 also changed rather drastically, so that previous problems with
79 numbers, dashes, etc in queue names and hostnames are no longer a
80 problem. Instead of parsing the GlueCEUniqueID field to get the
81 queue name, the program now reads GlueCEName and uses that for
82 the queue name.
83
84 Also, the file vomaxjobs-generic (documentation) was added,
85 and the rest of the documentation and example files was
86 substantially upated for the new release.
87
88 Otherwise no changes since 1.6.3.
89
90 For people using the test suite: the versions of the test output
91 included in 2.0.0 will cause tests of older versions to fail. This
92 is unavoidable since the old parsing logic was based on the order
93 in which blocks appeared in the ldif file, while the new version
94 uses python 'dicts' which have an unpredictable order when
95 iterated. To make the order predictable (for purposes of test
96 harness), the keys are sorted before the program starts to print.
97 The older versions do not sort the output before printing, hence
98 tests of the old versions with the new files will fail.
99
100 Release 1.6.3
101
102 Fix for GGUS bug 10155 -- had to do with YAIM adding unnecessary lines like
103 alice:alice
104 to the [vomap] stanza. The program did not expect to get these lines so it
105 of course did something rather silly with them, resulting in the behavior
106 reported in the GGUS bug.
107
108 Release 1.6.1
109
110 Bug fix for lcg-info-dynamic-scheduler; fix regexp matching
111 GlueCEUniqueID. the regexp in 1.6.0 missed
112 - CEs with a "-" character in the hostname
113 - queue names with underscores, uppercase letters, and numbers
114
115 There are examples of each of these classes on the production system,
116 so this upgrade is critical.
117
118 Release 1.6.0
119
120 - changes to parsing of static LDIF file to pick up gLite CEs with "blah"
121 instead of "jobmanager". Note this is largely untested!!
122 - added test suite to prevent bug regression
123 - some changes to build system (three targets increases aggravation)
124 - some changes to pbsServer classes to assist in debugging.
125 - some changes to vomaxjobs-maui to assist in debugging/testing;
126 also fixed various unreported bugs discovered during testing.
127 - Change mapping of pbs/torque job states in pbs classes; up til now
128 was either queued (Q) or running (any other states). Now we have:
129
130 From the qstat (torque 2.0.0p4) man page:
131
132 C - Job is completed after having run (mapped to 'done')
133 E - Job is exiting after having run. (mapped to 'running')
134 H - Job is held. (mapped to 'pending')
135 Q - job is queued, eligible to run or routed. (mapped to 'queued')
136 R - job is running. (mapped to 'running')
137 T - job is being moved to new location. (mapped to 'pending')
138 W - job is waiting for its execution time (mapped to 'queued')
139
140 Release 1.5.2:
141
142 pbs package: Fix to vomaxjobs-maui to deal with cases where there is
143 extra 'warning' output near the top of the command output from diagnose -g.
144
145 generic package: fix bug with logging; undefined variable caused fatal program
146 exit while trying to print warning message.
147
148 Release 1.5.1:
149
150 fix dependency problems with RPMs.
151
152 Release 1.5.0
153
154 * add RELEASE (this file) to docs dir in generic package RPM
155
156 * Minor change to build system to make tag events in ChangeLogs
157 easier to read.
158
159 * lcg-info-dynamic-scheduler:
160
161 - It is possible (e.g. by dramatically reducing MAXPROC config in Maui) for
162 a VO to have more running jobs in the LRMS than allowed by MAXPROC.
163 In this case a negative value was reported for FreeSlots. Fixed.
164
165 - implemented logging to syslog
166
167 * vomaxjobs-maui:
168
169 - adapt to handle MAXPROC specifications like MAXPROC=soft,hard
170 The code reports the 'hard' limit, since this is relevant when the
171 system is not full, and this is when it's needed. Maui uses the
172 soft limit on a full system, but in this case the info provider will
173 drop FreeSlots to zero as soon as jobs remain in the queued state
174 instead of executing immediately.
175
176 Release 1.4.3
177
178 * lcg-info-dynamic-scheduler:
179
180 - fix for Savannah bug 14946: overflow of conversion of response time
181 values from float (internal) to int (output representation). Now prints the
182 magic value of 2146060842 as an upper limit.
183
184 Release 1.4.2
185
186 * pbsServer.py:
187
188 - included Steve Traylen's patch to deal with jobs for which the
189 uid/gid printed by 'qstat' is not listed in the in running machine's
190 pw DB. This can happen when the CE is not the same physical
191 machine as the actual LRMS server.
192
193
194 Estimated Response Time Info Providers (v 1.4.1)
195 ------------------------------------------------
196
197 This information provider is new in LCG 2.7.0 and is
198 contained in two RPMs, lcg-info-dynamic-scheduler-generic
199 and lcg-info-dynamic-scheduler-pbs. Sites using torque/pbs
200 as an LRMS and Maui as a scheduler are fully supported by
201 this configuration; those using other schedulers and/or
202 LRMS systems will need to provide the appropriate back-end
203 plugins.
204
205 For sites meeting the following criteria, the system should
206 work out of the box with no modifications whatsoever:
207
208 LRMS == torque
209 scheduler == maui
210 vo names == unix group names of that vo's pool accounts
211
212 Documentation on what to do if this is not the case can be
213 found in the file
214
215 lcg-info-dynamic-scheduler.txt
216
217 in the doc directory
218
219 /opt/lcg/share/doc/lcg-info-dynamic-scheduler
220
221 There is also documentation in this directory indicating
222 the requirements on the backend commands you will need to
223 provide in the case that you are using a different
224 scheduler or LRMS. Tim Bell at CERN can help for people
225 using LSF.
226
227
228

grid.support@nikhef.nl
ViewVC Help
Powered by ViewVC 1.1.28