Hacker News new | past | comments | ask | show | jobs | submit login
Fixing Random, part 40 of 40 (ericlippert.com)
28 points by azhenley 3 months ago | hide | past | web | favorite | 3 comments



Thanks for this article!

> the connection between await as an operator and sampleas an operator is deeper than you might think.

Can anyone explain this to me?

Await plays the part of "specialized haskell like `do`" with tasks being a continuation monad (and co-monad). `sample` does the same playing the same role of "do" - where is the "deeper connection"?


> Can anyone explain this to me?

Awaits / yields produce state machines. See [0] for an example of how that works behind the scene (or just decompile your stuff with ILSpy/dotPeek), but basically the code between yields / awaits is split into separate blocks in a new object, all variables used across multiple blocks are stored in that object along side of some state indicating where to continue on next call.

Lets pretend we only use immutable data/vars for now. We could just reset the state machines position to some earlier state and re-run it. Only with our await-like sample operator returning a different value this time. Add another operator and this would make it very easy to implement backtracking while re-using a lot of existing behind the scene infrastructure. But personally I'm not sure the result is that much easier readable than a few nested for loops. It might be, for some complicated code spread across a larger code base. But adding a new language operator is a quite high price to pay...

[0] https://mycodingplace.wordpress.com/2017/07/19/csharp-behind...


> The abstraction that we’re missing is to make “value drawn from a random distribution” a part of the type system, the same way that “function that returns a value”, or “sequence of values”, or “value that might be null” is part of the type system.

In a similar vain to this, I've been wishing that I could express the difference between "value readable by client" and "value internal to server" in the type system.

Currently I manage this aspect manually in my code by having structs for the internal representation of data, intermediate structs for data returned by the methods of the impls of the internal structs, and structs declared in the module for the JSON API endpoint.

Here are some snippets of code in Rust to illustrate what I mean:

Internal to the server I have a representation of user accounts like such:

  pub struct Accounts
  {
    user_ids:                       Vec<UserId>,
    bcrypt_password_hashes:         Vec<BcryptPasswordHash>,
    ts_accounts_created:            Vec<TsAccountCreated>,
    ts_account_data_last_modified:  Vec<TsAccountDataLastModified>,
    // [...]
  }
And then I have this intermediate internal struct:

  pub struct ClientReadablePrivateAccountData
  {
    // user id is excluded
    // password hash is excluded
    pub ts_account_created: TsAccountCreated,
    pub ts_account_data_last_modified: TsAccountDataLastModified,
    // [...]
  }
A user-agent acting on behalf of an authenticated user can retrieve the ClientReadablePrivateAccountData representation of their account through our JSON API provided that they submit a bearer token with "read_private_account_data" scope along with the request. (The value of the bearer token itself, by the way, is a 144 bit random value encoded with Base64. The scope and user id mapping of bearer token values is kept server-side at all times.)

The ClientReadablePrivateAccountData representation includes all the data for the account in question except the internal UserId and BcryptPasswordHash data.

When a client requests this data from the /account/me.json path, the request is handled in three steps (in terms of what my code does – of course there are many more steps if you look at the entirety of what happens from when the TCP/IP packets reach our server, and until the framework that I am using hands data to my code). Common to all requests is the first two steps which are two pieces of middleware that I have written.

The first piece of middleware implements code that makes it possible to put the server in read-only mode, in which new requests are allowed through only if they are of the GET or HEAD method. For any other request method (POST, PUT, PATCH, etc.), the client will get a 503 Service Unavailable response with a JSON object telling it the reason for this ("Service is undergoing maintenance. Please try again in a few minutes."). This allows us to complete handling of the side-effects-having requests (POST, PUT, PATCH, etc.) before restarting the server while minimizing the impact of the service interruption.

The second piece of middleware looks for an Authorization header. If the Authorization header is not provided, the identity of the user is Identity::Anonymous. If the Authorization header is provided then it is decoded resulting in either an error (either malformed data, or invalid/expired bearer token) or Identity::Authenticated(authenticated_uid).

Identity enum:

  pub enum Identity
  {
    Anonymous,
    Authenticated
    {
      authenticated_uid: UserId,
    }
  }
The authenticated_uid UserId is retrieved by the middleware from a hash map that maps each of the non-expired bearer tokens to the corresponding internal UserId.

The middleware attaches a Result<Identity, MiddlewareForwardedError> to the request and then calls the next step in the request handling chain.

The third step is the request handler for the path and method. In this case, the GET request handler for /account/me.json. Each request handler has one of three possible procedural macros relating to authorization attached to them, so that at compile-time the function bodies of all request handlers are wrapped with code that extracts the identity result set by the middleware and then matches on them to either:

1. Immediately reject the request if the result was an error, returning the appropriate HTTP status code (400 Bad Request for malformed data, 401 Unauthorized for well-formed but invalid or expired token) and a JSON encoded message indicating the reason for the error. In the case of 401, a www-authenticate header with value "Bearer" followed by required scope is included in the response.

2. If the result was identity, what happens next depends on which of the three procedural macros are being used (in other words, the three of them insert different code corresponding to how the identity is to be used):

2a. Allow Anonymous access only. Identity::Anonymous matches to execute the wrapped function body. Identity::Authenticated(_) matches to return 400 Bad Request with a JSON encoded message indicating the reason for the error.

2b. Allow Authenticated access only. Identity::Anonymous matches to return 400 Bad Request with a JSON encoded message indicating the reason for the error. Identity::Authenticated(authenticated_uid) matches to execute the wrapped function body.

2c. Allow access both to Anonymous and to Authenticated. The wrapped function body is executed.

Function bodies wrapped 2b make use of the authenticated_uid value when performing their work. In the case of the request handler I am talking about here it passes the authenticated_uid as argument to a method of the impl of the internal struct Accounts, fn get_private_account_data(&self, uid: UserId) -> ClientReadablePrivateAccountData, and the data returned from that method is copied over into another struct by the same name, but which is declared separately in the module of the JSON API endpoint.

JSON API endpoint declaration and impl of ClientReadablePrivateAccountData:

  #[derive(Serialize)]
  pub struct ClientReadablePrivateAccountData
  {
    pub ts_account_created: TsAccountCreated,
    pub ts_account_data_last_modified: TsAccountDataLastModified,
    // [...]
  }

  impl ClientReadablePrivateAccountData
  {
    pub fn from_internal_representation (
      private_account_data_internal: internal_representation::accounts::ClientReadablePrivateAccountData) -> Self
    {
      Self
      {
        ts_account_created:             private_account_data_internal.ts_account_created,
        ts_account_data_last_modified:  private_account_data_internal.ts_account_data_last_modified,
        // [...]
      }
    }
  }
In this manner, data returned by the JSON API endpoint is by convention declared in the JSON API endpoint module, and the impl of the struct transforms data from the internal intermediate struct of the same name and of same fields, where the internal intermediate struct is in turn returned from a public method of the Accounts struct that is holding the internal representation of all data relating to accounts.

The fields of the internal struct are private. Still though, some internal data, such as the internal UserId, is accessed by the middleware and the request handlers, so a bit of attention is needed at all times to ensure that we keep the internal data internal to the server and never end up sending it to the client.

In conclusion, keeping internal data internal is a matter of manual book-keeping and code review. If the property "value internal to server" could be encoded by means of the type system, and the type system could somehow require that any data built from "value internal to server" data would also need to be "value internal to server", then mistakenly including "value internal to server" data in a response would become a compile-time error if we could express that "value internal to server" data could not be returned from any function within a given module, thus decreasing the risk that this could happen. This would be highly desirable.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: