ViewVC logotype

Contents of /nl.nikhef.pdp.dynsched/trunk/RELEASE

Parent Directory Parent Directory | Revision Log Revision Log

Revision 2151 - (show annotations) (download)
Mon Jan 17 14:16:11 2011 UTC (11 years, 4 months ago) by templon
File size: 8475 byte(s)
slight correction to text.

1 This file contains release notes and a change history for the
2 lcg-info-dynamic-scheduler information provider.
4 Release 2.3.4
5 Make it possible to build RPM via Makefile (ETICS changed).
7 Release 2.3.2 and 2.3.3
8 ETICS compatibility
10 Release 2.3.1
11 Fixes for savannah bugs 25031, 25867, 27171, 27172, 38195.
13 Release 2.3.0
14 Change dynamic scheduler so that it prints out JobSlots and FreeCPU info in CE views.
16 Release 2.2.2
17 lcg-info-dynamic-scheduler no longer prints ACBRs (fixed bug in 2.2.1? unsure)
19 Release 2.2.1
20 The dynamic scheduler was changed to cease printing the GlueCEAccessControlBaseRule.
21 2.2.0 did not work since GIP considers all changes to multivalued attributes (like
22 ACBRs) to be significant.
24 Release 2.2.0
25 The dynamic scheduler was changed in order to deal with the DENY
26 tags being used in the short-term solution (June 2007) for job priorities.
27 The dynamic scheduler does the following with ACBRs placed on VOViews:
28 - it discards any ACBR that does not begin with either "VO:" or "VOMS:"
29 - if there is more than one ACBR left in the list, it only uses the last one
30 in the list, and prints a warning message to standard error and to syslog
31 - it allows multiple DENY tags
32 - there is no checking on the consistency between the ACBR and DENY tags in a view.
34 Release 2.1.0
36 lrms.py was changed in order to support caching of search results.
37 Most of the time spent in lcg-info-dynamic-scheduler was due to
38 queries like "find all jobs from group 'lhcb', in state 'waiting',
39 for queue 'qlong'. Queries like this are now cached for future use,
40 and can also be supplied *before* use, like they now are for
41 lcg-info-dynamic-scheduler. That program now generates slices
42 of the job list for the various combinations queue/group/state
43 that will be needed while running the program.
45 There were previously two different 'return a list of matching job'
46 functions, with different interfaces. These now have a unified
47 interface so that result caching can be supported. This does break
48 backwards compatibility for lrms.py.
50 Release 2.0.0
52 Rather massive changes in parsing logic, to be able to handle VOViews
53 with VOMS FQANs.
55 VOMS FQANs are handled both by the input routines, which know what
56 to do with them when reading the static LDIF file, as well as
57 the group mapping logic, that knows how to associate FQANs
58 with unix groups. To this end, the vomap construct in the
59 lcg-info-dynamic-scheduler config file now supports lines like
61 lhcbsgm:/VO=lhcb/GROUP=/lhcb/ROLE=lcgadmin
63 in addition to the original lines like
65 atlgrid : atlas
67 which would map group 'atlgrid' to "VO : atlas".
69 The parsing of the GlueCEUniqueID and GlueVOView blocks has
70 also changed rather drastically, so that previous problems with
71 numbers, dashes, etc in queue names and hostnames are no longer a
72 problem. Instead of parsing the GlueCEUniqueID field to get the
73 queue name, the program now reads GlueCEName and uses that for
74 the queue name.
76 Also, the file vomaxjobs-generic (documentation) was added,
77 and the rest of the documentation and example files was
78 substantially upated for the new release.
80 Otherwise no changes since 1.6.3.
82 For people using the test suite: the versions of the test output
83 included in 2.0.0 will cause tests of older versions to fail. This
84 is unavoidable since the old parsing logic was based on the order
85 in which blocks appeared in the ldif file, while the new version
86 uses python 'dicts' which have an unpredictable order when
87 iterated. To make the order predictable (for purposes of test
88 harness), the keys are sorted before the program starts to print.
89 The older versions do not sort the output before printing, hence
90 tests of the old versions with the new files will fail.
92 Release 1.6.3
94 Fix for GGUS bug 10155 -- had to do with YAIM adding unnecessary lines like
95 alice:alice
96 to the [vomap] stanza. The program did not expect to get these lines so it
97 of course did something rather silly with them, resulting in the behavior
98 reported in the GGUS bug.
100 Release 1.6.1
102 Bug fix for lcg-info-dynamic-scheduler; fix regexp matching
103 GlueCEUniqueID. the regexp in 1.6.0 missed
104 - CEs with a "-" character in the hostname
105 - queue names with underscores, uppercase letters, and numbers
107 There are examples of each of these classes on the production system,
108 so this upgrade is critical.
110 Release 1.6.0
112 - changes to parsing of static LDIF file to pick up gLite CEs with "blah"
113 instead of "jobmanager". Note this is largely untested!!
114 - added test suite to prevent bug regression
115 - some changes to build system (three targets increases aggravation)
116 - some changes to pbsServer classes to assist in debugging.
117 - some changes to vomaxjobs-maui to assist in debugging/testing;
118 also fixed various unreported bugs discovered during testing.
119 - Change mapping of pbs/torque job states in pbs classes; up til now
120 was either queued (Q) or running (any other states). Now we have:
122 From the qstat (torque 2.0.0p4) man page:
124 C - Job is completed after having run (mapped to 'done')
125 E - Job is exiting after having run. (mapped to 'running')
126 H - Job is held. (mapped to 'pending')
127 Q - job is queued, eligible to run or routed. (mapped to 'queued')
128 R - job is running. (mapped to 'running')
129 T - job is being moved to new location. (mapped to 'pending')
130 W - job is waiting for its execution time (mapped to 'queued')
132 Release 1.5.2:
134 pbs package: Fix to vomaxjobs-maui to deal with cases where there is
135 extra 'warning' output near the top of the command output from diagnose -g.
137 generic package: fix bug with logging; undefined variable caused fatal program
138 exit while trying to print warning message.
140 Release 1.5.1:
142 fix dependency problems with RPMs.
144 Release 1.5.0
146 * add RELEASE (this file) to docs dir in generic package RPM
148 * Minor change to build system to make tag events in ChangeLogs
149 easier to read.
151 * lcg-info-dynamic-scheduler:
153 - It is possible (e.g. by dramatically reducing MAXPROC config in Maui) for
154 a VO to have more running jobs in the LRMS than allowed by MAXPROC.
155 In this case a negative value was reported for FreeSlots. Fixed.
157 - implemented logging to syslog
159 * vomaxjobs-maui:
161 - adapt to handle MAXPROC specifications like MAXPROC=soft,hard
162 The code reports the 'hard' limit, since this is relevant when the
163 system is not full, and this is when it's needed. Maui uses the
164 soft limit on a full system, but in this case the info provider will
165 drop FreeSlots to zero as soon as jobs remain in the queued state
166 instead of executing immediately.
168 Release 1.4.3
170 * lcg-info-dynamic-scheduler:
172 - fix for Savannah bug 14946: overflow of conversion of response time
173 values from float (internal) to int (output representation). Now prints the
174 magic value of 2146060842 as an upper limit.
176 Release 1.4.2
178 * pbsServer.py:
180 - included Steve Traylen's patch to deal with jobs for which the
181 uid/gid printed by 'qstat' is not listed in the in running machine's
182 pw DB. This can happen when the CE is not the same physical
183 machine as the actual LRMS server.
186 Estimated Response Time Info Providers (v 1.4.1)
187 ------------------------------------------------
189 This information provider is new in LCG 2.7.0 and is
190 contained in two RPMs, lcg-info-dynamic-scheduler-generic
191 and lcg-info-dynamic-scheduler-pbs. Sites using torque/pbs
192 as an LRMS and Maui as a scheduler are fully supported by
193 this configuration; those using other schedulers and/or
194 LRMS systems will need to provide the appropriate back-end
195 plugins.
197 For sites meeting the following criteria, the system should
198 work out of the box with no modifications whatsoever:
200 LRMS == torque
201 scheduler == maui
202 vo names == unix group names of that vo's pool accounts
204 Documentation on what to do if this is not the case can be
205 found in the file
207 lcg-info-dynamic-scheduler.txt
209 in the doc directory
211 /opt/lcg/share/doc/lcg-info-dynamic-scheduler
213 There is also documentation in this directory indicating
214 the requirements on the backend commands you will need to
215 provide in the case that you are using a different
216 scheduler or LRMS. Tim Bell at CERN can help for people
217 using LSF.

ViewVC Help
Powered by ViewVC 1.1.28