SRE-FAST-tracking

From popdata
Jump to: navigation, search
Back to SRE <-- SRE-User_Issues
This page is for keeping track of users suspected of ignoring the rules for Fast machine usage.

Rules

We want to track users who frequently
  1. Fail to logout after an intensive job is finished
    • Allow for reasonable delays, e.g. outside office hours, or process still has file open though finished long ago (?)
  2. Use FAST machines for writing code or documents

We want to avoid

    • Falsely accusing people due to unavoidable delays reconnecting to computer (due to dinner, travel, sleep ...)
    • Being too strict about code and report writing (< 1hour ok?)


Tools

  1. smb-users.pl -a shows all open files on SRE server
    • Directories have a trailing "/" and can be ignored.
    • Files that are over 12 hours old are probably not being written.
  2. sre-recent-local MACHINE USER
    will report recently changed files on C: , and processes running.
  3. sreinfo
    • "CPU Load" 0 generally means inactive; see also Sysplus graph "CPU".
    • quser /server:fast12 from another Windows machine in the Active Directory Domain show login and "idle" time (no mouse no key).
  4. Syspulse "graph" is worthwhile for
    • CPU (100% of 1 core should read 25%. Below 2% is idle) and
    • Net I/O usually indicates file activity to R:

fast1 fast2 fast3 fast4 fast5 fast6 fast7 fast8 fast9 fast10 fast11 fast12

Tracking

For each user, record multiple suspected instances of misuse. Bracket with <pre> ...</pre.

agist-11-c03 [H] MDI

  • 2019-03-26 idle 18 hours. No files, no process. Terminated; Sent email to Alex Gist <gist.alex@gmail.com>

arichardson-18-g04 [H] IDD

  • 2019-03-20 idle 2019-03-19 21:00 15 hours to noon. Login 2019-02-12
  • Last file changed 2019-03-19 16:46 (16k R:\working\tmp\.data_dictionary_pharmanet-january-1-1996-onwards.xlsx_hlth_rpt.A.csv2.swp)
  • 2019-03-20 13:45 no recent files
  • CPU <1% since short spike 40% Mar 19 18:45 and 09:15
  • Net I/O low since Mar 19 16:15
  • Net I/O idle Feb 13-21, 25, Mar 2, 5, 9, 13-15, 17-20
  • CPU very low (<1%) Feb 17-18, Mar 13, 16-17

asafari-17-118 [H]

  • 2019-10-02 logged in Sep 23, idle 2h, no open files modified for 5 days.
    fast16. asafari-17-118 60874 19-09-23,09:18 19-09-27,10:03 5d5h48m /sredata/sre/17-118/asafari-17-118/GW/Girls12_2008/FW_Project.egp
  • sre-recent-local indicates some activity in past hour, could be nothing.
    2019-10-02,15:44 86747 asafari-17-118/AppData/Roaming/SAS/EnterpriseGuide/7.1/ProjectRecovery/FW_Project.egp (1)/ project.rcv

bwilmer-18-g04

  • 2019-04-25 fast13 login May 6, 2019, 12:55 p.m, disconnected May 6, 2019, 2:47 p.m
  • running nothing. Terminated.

cbasham-14-105

  • 2019-10-22 fast11 login 2019-10-21 14:26 idle 21:09 CPU-load=13 no files open
    terminated
  • 2019-10-22 fast10 login 2019-10-21 20:03 idle 25:19 CPU-load=13 only RLIBS files open

dwarburton-18-g03

  • 2019-04-25 fast6 disconnected April 23, 2019, 2 p.m.; no CPU; no Net I/O; no R: files open;
  • Earlier ignored email from Ryoko (had both fast6 and fast7 running.
  • ssh to fast6 fails (authentication with ssh key succeeded, instantly disconnected). RDC tfails "too many users"
  • rebooted
  • after reboot ssh hangs at "connecting to fast6"; nc -zv fast6 22 hangs -- because "cygwin ssh service" needs manual start
  • 2019-06-08,13:30 fast9 login May 6, 2019, 2:47 p.m. disc May 6, 2019, 5:02 p.m.
  • Running python; only file open .libs/PYLIBS/3.5/psycopg2/...

dwarburton-19-g01

  • 2019-05-16&17 simultaneous login to 5 Fast machines: Fast4(16:39) Fast8(18:05) Fast11(19:42) Fast14(19:49) -- 4 kicked off May 17 18:17
    Logged in to fast12 May 17 12:13 to 12:13. Ignored email from David.

ekarim-08-003

  • 2019-01-22 Hello Ehsan Karim,
    according to our logs, ekarim-08-003 logged in to Fast12 Sunday January 20 at 19:53, and the machine has been disconnected and idle for 5 hours.
    Since you had no files open, and Fast machines are a scarce resource that should be left available for other researchers, we have terminated your Remote Desktop session.
    Please note that we are considering removing access to Fast machines from researchers who do not log out when they no longer need a fast machine for resource intensive jobs.
    If you ever need help to do a proper logout from Windows 10 don't hesitate to ask us.

hshulha-14-105

  • Fast6 login hshulha-14-105 May 5, 2019, 11:56 p.m. ; Disconnected May 7, 2019, 1:06 a.m.
    Running nothing. Terminated
  • fast4, 7, 19 + sre43 (mostly malfunction? User urged to notify us of difficulty logging out.
 sre43 	Sept. 9, 2019, 9:41 a.m. 	  	0 	0 	
fast4 	Sept. 8, 2019, 5:18 p.m. 	Sept. 8, 2019, 5:22 p.m. 	0 
fast7 	Sept. 7, 2019, 4:02 p.m. 	Sept. 8, 2019, 5:18 p.m. 	13
fast19 	Sept. 8, 2019, 5:22 p.m. 	Sept. 9, 2019, 12:17 a.m. 	13
    • OK Fast4 rebooted 2019-09-10,11:30
    • OK Fast19 terminated 11:52 (no research processes despite high load)
    • fast7 User logged out after notice. ssh fails; rdc fails (too many connections, despite nobody connected). Rebooted
  • Leaving only sre43.

htavakoli-

Notes 2019-01-16 ~15:30  Obviously active now.
NetI/O shows busy since 9am; idle Jan 14 9pm to Jan 15 8pm
CPU usage shows busy on/off since 9am

[sreinfo]  fast1 	Unavailable 	htavakoli-13-035 	Jan. 12, 2019, 6:42 p.m. 	-  	0
[sre-recent-local fast1]
 2019-01-16,15:36  84591	./htavakoli-13-035/AppData/Local/Temp/SEG8080/fcaa42ae44b74051985684f899bb3c41/  result.srx
 2019-01-16,15:36  83997	./htavakoli-13-035/AppData/Local/Temp/SEG8080/SAS Temporary Files/_TD10008_FAST1_/  #LN00337
        0   10008       0 ?          Jan 13 C:\PROGRA~1\SASHome\SASFOU~1\9.4\sas.exe
        0    7416       0 ?        16:45:59 C:\Program Files\RStudio\bin\rstudio.exe

jhicks-19-g01

  • 2020-02-23 10am Fast2 + Fast18
  • Both disconnected; Both High load both disconnected. There are 3 idle Bruce machines reserverd for project 19-g01
    your account jhicks-19-g01 is currently using two Fast machines (2 and 18).
    Fast machines are a scarce resource and none are available currently for other researchers.
    Only one SRE machine (including Fast) can be used at one time per account.
    For 19-g01 project, you should be using one of the Bruce SRE machines, unless their capacity is too small for your computation job.
    Please arrange to release at least one Fast machine as soon as practical.
  • Jeffrey quickly quit Fast2.

jyu2-18-c01

Ticket#2018121210000028] One user multiple fast machines? [terminate fast14]
Created: 2018-12-12 09:42 by Denis Laplante
2019-03-04 10:45 Login Fast3 2019-02-27 idle 2:46
2019-03-03,20:50  65	jyu2-18-c01/AppData/Local/RStudio-Desktop/pcs/  workbench-pane.pper
No programs running

Hubbard: no SMB files open.
 Jessica Yu <jessica_yu@alumni.ubc.ca>        
  Hello Jessica according to our logs, jyu2-18-c01 logged in to Fast3 February 27, and has done no intensive computing since Sunday 9pm.
       We have terminated your session, as you had no programs running or files open.
       Please note that we are considering removing access to Fast machines from researchers who do not log out when they no longer need a fast machine for resource intensive jobs.
  • 2019-10-02 Fast17 idle 2 days , no files open

lingyi-16-106 [V]

  • 2019-03-12 09:40 idle since 2019-03-11,20:43 review at 10am
  • last C: SAS Temp
    2019-03-11,19:35 97792 lingyi-16-106/AppData/Local/Temp/SAS Temporary Files/_TD7152_FAST14_/ sastmp-000000788.sas7bitm
  • last R: file Mar 11 18:01 [date --date='now - 14hour -10min' -> Mar 11 19:44]
    Lock=19-03-11,19:35 Age=14h10m /sredata/sre/16-106/lingyi-16-106/IBDTNF/~$M71.2_Q1.3_indiv_ATNFfailure_Event5_AllCDUCIBDU_11Mar182.xlsx
  • Response at 11:55 "Hi SRE, I start to use it now. Thanks, Lingyi"

lronald-14-105 [B]

  • 2019-04-02 idle 16 hours. No disk activity for days. Cpu < 1% since Apr 1 17:00. Temporary Files saved on C: 17:09 , nothing on R:

lcheng-15-111

  • 2019-01-29 Fast3 active; login Jan 24. Low CPU and I/O except for brief spikes Jan 28 10am to 17:00
    lcheng-15-111 [Bondar] Lucy (Yan) Cheng <lucy.cheng@ubc.ca>
    Hello Lucy Cheng,
    according to our logs, lcheng-15-111 logged in to Fast4 Thursday January 24, and has had almost no intensive computing since then, except for a brief period Jan 28 10am to 17:00
    We note you currently have no files open on the R: drive,
    Please log out.
    Please note that we are considering removing access to Fast machines from researchers who do not log out when they no longer need a fast machine for resource intensive jobs.

nfox-14-058

  • 2019-10-29 Fast3 login since Oct 23. Load=1. Syspulse shows few spikes of activity, Nothing Fri Sat Sun, a bit Monday, none since 4pm.
    "smb-users.pl -a" on bondar shows no files open. "quser /server:fast1" shows no human input in 13 hours.

nlu-16-106

  • 2019-01-22 Hello Na (Leo) Lu,
    according to our logs, nlu-16-106 logged in to Fast6 Monday at 22:53, and the machine has been disconnected and idle for 12 hours.
    Since you had no files open, and no other SRE machines are available for other researchers, we have terminated your Remote Desktop session to Fast6.
    Please note that we are considering removing access to Fast machines from researchers who do not log out when they no longer need a fast machine for resource intensive jobs.
  • 2019-03-27 Hello Na (Leo) Lu,
    your account nlu-16-106 has been idle and disconnected on machine Fast5 for 19 hours, and has no files open.
  • " We have terminated your session to leave the machine available for other researchers.
    This is not the first time we've had to do this. Please review logout instructions at https://my.popdata.bc.ca/html/SRE/windows/connecting.html#LoggingOut

nnabavi-18-g01

  • 2019-01-22 Hello Noushin Nabavi,
    according to our logs, nnabavi-18-g01 logged in to Fast14 Monday January 21 at 12:53, and the machine has been disconnected and idle for 21 hours.
    Since you had no files open, and Fast machines are a scarce resource that should be left available for other researchers, we have terminated your Remote Desktop session.
    Please note that we are considering removing access to Fast machines from researchers who do not log out when they no longer need a fast machine for resource intensive jobs.
    If you ever need help to do a proper logout from Windows 10 don't hesitate to ask us.

rji-11-c03

  • 2019-09-17/DL Fast9 disconnected Sep 15, no files open. Fast4 disc since today 13:54 (1 hour ago), no files open.
    Hello Xuejun (Ryan) Ji, This is our 5th complaint this year about your misuse of SRE Fast machines
  • 2019-01-22 Hello Xuejun (Ryan) Ji,
    according to our logs, rji-11-c03 logged in to Fast8 Saturday January 19 at 13:08, and the machine has been disconnected and idle for 41 hours.
    Since you had no files open, and no other SRE machines are available for other researchers, we have terminated your Remote Desktop session.
    Please note that we are considering removing access to Fast machines from researchers who do not log out when they no longer need a fast machine for resource intensive jobs.
    If you ever need help to do a proper logout from Windows 10 don't hesitate to ask us.
  • Login Jan 24, idle Jan. 27 6:59 p.m.. No CPU, no Net I/O since 24. Nothing running. Idle 1d19h Xuejun (Ryan) Ji <ryanji329@gmail.com>
    according to our logs, rji-11-c03 logged in to Fast11 Thursday January 24, and has done no intensive computing since.
    We have terminated your session.
    Please note that we are considering removing access to Fast machines from researchers who do not log out when they no longer need a fast machine for resource intensive jobs.

rji-11-s01

  • Login March 28, 2019, 2:14 p.m. disconnected April 1, 2019, 7:34 p.m.
    Terminated session.
  • Login May 6, 4 p.m. Idle May 6, 2019, 4 p.m.
    Terminated session.

semerson-13-037

  • 2019-06-12 11:00 Hello Scott Emerson,
    Your login to Fast14 has been idle for 1 day 17 hours, and the only open files (A_hosp_ALL.sav , GHN_hosp_ALL.sav) were last updated 48 hours ago.
    Also since then you have logged in and out of Fast2.
    We will terminate that Remote Desktop session to Fast14 at noon unless we hear back from you that you still need it.
    Thanks for your consideration

szheng-15-070

  • 2019-04-25 14:00 fast7 idle 3:56, no R: files, no C: files. Terminated.

yzheng-16-106

  • 2019-01-22 Hello Yufei Zheng,
    according to our logs, yzheng-16-106 logged in to Fast6 Friday January 18 at 22:53, and the machine has been disconnected and idle for 18 hours.
    Since you had no files open, we have terminated your Remote Desktop session to Fast6.

Please note that we are considering removing access to Fast machines from researchers who do not log out when they no longer need a fast machine for resource intensive jobs. I see that you also have a Remote Desktop session on Fast2, which is actively writing to file "incidence_2015.sas7bdat" , so we will not terminate that session.

  • 2019-06-12 Hello Yufei Zheng,
    according to our logs, yzheng-16-106 logged in to Fast7 June 1, and the machine has been idle for 16 hours.
  • It looks like the machine was working only Jun 4 9am-5pm; Jun 6 9am-5pm; Jun 7 9am-7pm; jun 10 9am-5pm; jun 11 9am-6pm. In those 10 days the CPU usage never went above 5% except briefly yesterday, and the network I/O extremely low, which suggests you could be using one of the ordinary SRE machines instead.
    The youngest file you have open is "as_nsaid_msm_qtr_weight.xlsx", unchanged since you stopped using Fast7 around 6pm last night.
  • As we told you in Jan 22, we are considering removing access to Fast machines from researchers who do not log out when they no longer need a fast machine for resource intensive jobs.
  • Please explain why your project requires you to use Fast machines.
  • Also please logout at 6pm every evening when you are not using the machine.
  • 2019-06-17 requested to free Fast machine, logged out instead from sre2.
  • 2019-11-20 13:22 Fast19 terminated after idle 20 hours, no open files being updated, no CPU, no Network I/o