ACTIVE
-----------------------------------------------------------------------

When fixed, move to RESOLVED section; the format is:
* quick-summary (priority)
  details

* need short guide on how to implement function X with grpc (medium)
  
* queue name across users (medium)
  User A is currently unable to create a queue X if user B already has
  queue X.  This is confirmed for same type queues, not sure if extends
  to queues of different types.  Maybe we should make triples:
  user/type/queue_name unique and pass them around instead of just queue_name?

* implement lander_queue_is_active_hle (medium)
  Need to be able to tell from shell if the queue is active or not

* improve error handling of nearly all shell functions (medium)
  Almost all shell functions are ignoring or passing incorrect exit status.

* need a function lander_queue_disconnect_hle (medium)
  to disconnect two previously connected/derived queues

* lander_queue_list is broken (medium)
  lander_queue_list currently doesn't seem to work; needs to list queues
  of current user by type; optional 'user' argument can list queues of that
  particular user.

*   For queue quote:
    Maybe a high level function
    lander_queue_user_withinquota(uid) -> true/false

    It is still open because we still need to decide how nfs and hdfs
    will read configuraiton information and in what format it will be.  --aqadeer 2017-07-23




* Potential MariaDB incompatibility between version 5.5 and 5.6
  (minor, unconfirmed).
  In wip/sup/db.py, line:
    "RowEntryEpoch TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP," (v5.6)
  was replaced with:
    "RowEntryEpoch TIMESTAMP NOT NULL DEFAULT 0," (v5.5)
  but not fully tested.
  Expected results: need to find a solution that works both in v5.5
  and 5.6.



* Row-by row garbage collection (low, scalability)
  cleanUP_cronjob.py GC by trying to delete files row-by-row.  This
  may be O(N*log(N)) operation where N is the total number of files
  in ALL queues of the same type.  This *may* take a long time.
  Testing shows:
             50,000 files  takes 42s
          1,000,000 files  takes 210m23s




* Automated set of regression test cases are needed (medium)
  One change somewhere can cause unintended regression bugs elsewhere
  in the code.  We need to have the ability to test for all such cases
  easily.


* Release locks while doing HDFS ops (medium)
  Need to be able to run parallel enqueues/dequeues (parallel hdfs -put/get).
  Currently does not seem to be the case (or maybe it is???)

* Queue names length limitation (low)
  It appears that HLE internally creates users <login>_<queuename>.
  On centos 7, usernames must be <= 16 chars.  Thus, with login such as
  plander, queue names must be no more than 7 characters.  Would be good
  to relax this limitation.



* File reservation order (Medium)
  For API functions like reserveTop, currently file mod time is used that
  was picked from HDFS.  For Lander system, it needs sorting based on time
  embedded in the file names.
  One possible soltuion could be to get a regex for file in some configuraiton
  file and when system records time, if file name is unable to parse as per
  given regex, then it falls back to HDFS based time.
  It is important for some apps, that wait if they see a gap in the incoming
  files, hoping this gap will be filled soon.
  We need to fix this bug such that, it is not Lander specific because other
  systems (e.g. PLUMB) is using this HLE as back end.



* Garbase collection (High)
  System is not cleaning up HDFS files.  Released files need to clean-up else
  storage will be full over time.
  For possible solutions, there was jabbar discussion with Yuri.  See the file
  for discussion: discussions/20180315 



-----------------------------------------------------------------------



RESOLVED
-----------------------------------------------------------------------

* Add API for "faulty" files (medium)
  (new feature)
  Could be either a flag in some table, or a separate queue for files
  for which processing has failed.  We'll need to be able to list/fetch
  those files, etc.  Probably a separate queue would be easier, but
  may need to discuss.


  Fixed.  Please see API functions: markAsFaulty(), markAsHealthy(), listFilesInStateX() --aqadeer 2017-07-23



* Allow queues to have multiple parents (medium)
  (new feature, convenience, unnecessary limitation?)
  Not sure why we need this limitation.  Proposed functions:
	lander_list_parent_queues() - probably good to have for convenience,
      even if it's listing just one parent, so users can keep track.



* Add function for listing queues (medium)
  (new feature, convenience)
    lander_queue_list(username, type)
  if type is empty, list all known queues for the user

  Fixed.  Please see: lander_queue_list() --aqadeer 2017-07-23



* Need common api for quotas (medium)
  I think this means functions like:
    lander_queue_user_du(uid) -> return bytes
    lander_queue_du(qid) -> return bytes
  also need to be able to map users to queues and back
    lander_queue_user2queues(uid) -> list all user qids
    lander_queue_queue2user(qid) -> uid
  Maybe a high level function 
    lander_queue_user_withinquota(uid) -> true/false
  The objectives are:
    1. When distributing data make sure the target queue's user is underquota.
  If this is cheap to do per file, good, but if not
    2. Calculate/update quotas every hour or so and cache the result.
  Thoughts?


  Fixed.  Please see: lander_queue_du(), lander_queue_user_du(), lander_queue_user2queues()
                      lander_queue_queue2user()    --aqadeer 2017-07-23




* lander_list_derived_queues should list ALL derived queues (medium)
  Currently it lists ONLY active queues.  This needs to be fixed,
  because otherwise there is no way to find out which queues are connected
  to a source queue.

    Fixed.   --aqadeer 2017-07-23

  Alternatively, we can drop "active/non-active" and just go by connected/
  disconnected state.  This may be sufficient but requires some thought.
  In this case, we should add a disconnect function:
    lander_disconnect_queue(qid)

    Keeping active/non-active for the time being.  --aqadeer 2017-07-23


* Lander admins permissions to queue_pull API (medium)
  Lander admins cannot list files in a queue, because they don't have
  credentials.  This is inconvenient for the admin and needs to be fixed.

* Repeated calls to createQueue with the same name succeed (medium)
  For example:
      createQueue q1 raw
      createQueue q1 full
  Expected results: second call should fail with a meaningful
  message.


* Per-queue credential files (medium)
  User-per queue credentials are currently saved in a separate files
  $CWD/${USER}_$qname.  The contents of these files needs to be copied
  to conf/usrConInfo.py.  This behavior is both undocumented and
  cumbersome.  A proposed fix is to store this in a database and
  create single set of credentials for a particular user.  Do we even
  need per-queue users?

* Duplicate entries for HDFS top dir (medium)
  HDFS_URI is saved both is sysConfInfo.py and in the database KeyVal
  table (as HDFS_Base_Dir) which can lead to inconsistencies when URI
  changes.

* hdfs filename uniqueness (high, unconfirmed but likely)
  HDFS files for a particular type of queue are stored in <type>
  global directory (not per user).  This means that if user1 has a
  file f1 of type X sitting somewhere in his queue, user2 cannot
  enqueue possibly different file with the same name "f1 into his
  queue.
  
* Document security model (medium)

  Done on hlm wiki page: https://wiki.isi.edu/div7/index.php/User:Aqadeer/HardLinkEmulationOnHDFS#Experimental_Functions_of_the_API --aqadeer 2017-07-23


* Global credentials are used instead of user (medium, unconfirmed)
  Seems suspect that enqueue uses sysConfInfo and not usrConInfo.  Is
  this a bug?


* All system users should be able to call queue push functions (high)
  Ordinary users might like to cascade many of their queues and hence
  need to enqueue and distributeLocal.  Currently these functionalies
  are only doable by admin user.


* Privacy concerns due to ordinary user able to see KeyVal and Queue (low)
  To fix a bug on release queue, select grants were given on KeyVal and 
  Queue tables for ordinary users.  Now they can see content of these tables
  although can't modify it.

* dependency on numpy ripped out

* Only one queue can be accessed at any given moment given set of
  current credentials (high)

* release queue was failing due to ordinary db user not having select
  grant on KeyVal and Queu.  Fixed by giving those grants.  But now
  these tables are visible to ordinary user which might be a privacy
  concern that needs to be taken care of.

